[cairo] New tessellator available for playing

Wed Sep 20 16:40:10 PDT 2006

I've heard that some people think I have disappeared from cairo
development, (and I admit I haven't been extremely active on the list,
and there have been some recent good things submitted that I haven't
even commented on yet[*]---sorry about that!).

Well, I haven't left cairo development---I've just had my head down
for about two weeks straight working on a new tessellator for
cairo. Here's the story on that work.

[Please allow me the indulgence of sharing a lot of details, perhaps
even some personal details, as this piece of code is rather important
to cairo and to me. If you don't want to read all this, just know that
we've got a new tessellator working pretty well now (in the
new-tessellator2 branch of my personal tree) and we hope to merge it
to master soon. Some performance numbers are included at the end of
this message.]

-Carl

[*] I'm thinking here of Robert's various patches, the current_point
bug noticed by Mike and Aivars' patches (with performance test
case---woo-hoo!) to improve matrix transformation performance. There
are likely others as well.

Motivation (what we have right now is awful)
-------------------------------------------
The tessellator is an important piece of cairo. It's software that
sits at the bottom of every fill and every stroke of any spline, (at
least for backends that don't handle fill and stroke natively such as
pdf, ps, and svg). So it's performance-critical and it's also
important for it to be very robust, (it's quite likely to face lots of
numerical and geometric degeneracies and any error can lead to
graphical "leaks" where the "ink" spills outside the fill an over an
arbitrarily large chunk of the output.

Currently, the tessellator we have is really slow. We've never
properly characterized its performance behavior, but we know it
doesn't have the O(NlogN) performance that should be possible in
practice for a tessellator. And I believe mozilla has collected some
tests to show the performance problems with it.

It's also not very robust as evident by things like the leaky-dash
test case (also known as bug #4863). That's a case where what should
be a dashed stroke ends up leaking and looking like the following
mess:

	https://bugs.freedesktop.org/attachment.cgi?id=3738

So, for a long time, we've wanted something faster and more stronger.

Background (Hobby's genius)
---------------------------
The plan for a long time had been to write a Bentley-Ottmann
tessellator, (which performs in time of O((N+k)logN) where N is the
number of edges and k is the number of intersections among the edges),
with an additional "tolerance square" pass as described by John Hobby
in his excellent paper here:

	http://cm.bell-labs.com/who/hobby/93_2-27.pdf

	John D. Hobby, Practical Segment Intersection with Finite
	Precision Output, Computation Geometry Theory and
	Applications, 13(4), 1999.

In addition to describing the tolerance-square fix for
Bentley-Ottmann, this paper also goes into great detail into how to
handle the various degeneracies that occur within the Bentley-Ottmann
algorithm itself, (multiple segments intersecting at the same point,
intersections so near each other they cannot be reliably sorted,
etc.). Hobby also mentions a possibility of doing a multi-pass
Bentley-Ottmann to solve the same problem that the single
tolerance-square pass solves, (with the caveat that there is nothing
to guarantee that the number of additional passes needed will be
small).

History (the nickle days)
-------------------------
A little over two years ago, (July 2004, before Red Hat hired me in
October), I started investigating the tessellation problem and put
together a Bentley-Ottmann prototype in nickle. I had only just started
working on the tolerance square implementation when I got busy with
other things and had to put it aside. (At this time, I was working on
cairo primarily in my spare time, and there wasn't a lot of that, so
cairo got put on hold a lot).

Then, in October 2004, I was hired by Red Hat and tasked with
completing cairo. An early thought was, "Oh good, now I'll have time
to write that tessellator", but somehow other things kept taking
priority over it. Red Hat's investment in cairo was significant---the
fact that I could work on cairo full time, (along with help from
people like Owen Taylor and Kristian Høgsberg), rather than in my
spare time was essential in making cairo 1.0 possible as well as the
PDF/PS/SVG surface work that went into 1.2

And, now, it's also finally meant that I could work on the tessellator
like I'd been itching to do for years.

Recent work (skip lists rock!)
------------------------------
So two weeks ago I dusted some cobwebs off the nickle prototype, (it's
in the "tess" CVS repository hosted at keithp.com for any interested
archaeologists). I ported that to C and started hacking away at what
was necessary to make the code work without the rational arithmetic
available in nickle.

An essential part of making the Bentley-Ottmann implementation achieve
the desired O((N+k)logN) performance is an data structure for keeping
a list of items sorted with logarithmic (or better) performance for
finding or deleting items. If the data structure that comes to your
mind is a balanced tree of some variety, then allow me to introduce
you to skip lists:

	ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf

	William Pugh, Skip lists: a probabilistic alternative to
	balanced trees, Communications of the ACM, 33(6) 668-676,
	June 1990.

Skip lists provide (probabilistically) performance comparable to a
balanced tree but with a much, much simpler implementation. The
implementation difficulty is similar to a simple binary tree. But
unlike a binary tree where degenerate worst-case performance depends
on the data fed to the list, in a skip list the performance depends
only on the quality of your random number generator.

And Keith Packard just happened to have a lovely little implementation
of skip lists lying around, (code he wrote originally for fontconfig I
believe). His code is available here:

	git://keithp.com/git/skiplist

So I grabbed that code, and generalized it for two separate uses in
the tessellator, (much to Keith's dismay who would have preferred I
just copy-and-paste a couple of times), and started trying to get the
whole thing to work.

Testing (one small edge at a time)
----------------------------------
My testing strategy involved throwing random edges at the tessellator
with random integer coordinates no larger than 10, (which in cairo's
sub-pixel coordinates is equivalent to generating edges within a
region with an area smaller than 1 billionth the area that contributes
to a single pixel). This gives a high probability that any two edges
will intersect, so there are a lot of intersections, and the small
values make every bit count in the tessellator's arithmetic.

This was a nice torture test and the random edges contributed a new
geometrical or numerical degeneracy (and exposed a new bug) every 10
edges or so, (resulting in a segmentation fault or assertion
failure). I picked the bugs off one at a time and by the time I got
the tessellator to survive past 50 edges it was able to survive past
2000 edges.

Some of the things I fixed here include:

 * Shifting up all input values by 2 bits to allow for intersection
   results to be stored internally with 1 bit more precision than
   input values, so they will correctly compare with the sweep line
   edge, (the 2nd bit isn't strictly needed until tolerance squares
   are implemented).

 * Implementing double-linkage in the skip list so that we can delete
   from the active edge list without requiring any calls to a compare
   function. This is a huge help as it is extremely difficult to write
   an comparison function that is valid for the entire lifetime of the
   algorithm.

Some of the history of these fixes from that hacking attempt can be seen in the
new-tessellator branch of my personal cairo repository. That's not a
buildable version of cairo, just a standalone rig for testing the
tessellator.

I haven't implemented the tolerance square algorithm yet, but instead
went ahead with the multi-pass Bentley-Ottmann approach, repeatedly
running the algorithm on its own results until no remaining
intersections were found. In testing I then followed this up with an
O(n**2) test to verify that indeed there were no remaining
intersections.

I think the multi-pass approach will work well enough in
practice. Here are the number of edges found from several passes over
an original 772 edges:

	Testing 772 random edges
	Pass 1 found 25852 intersections
	Pass 2 found 6578 remaining intersections
	Pass 3 found 321 remaining intersections
	Pass 4 found 3 remaining intersections
	*****No remaining intersections found after pass 5

That looks severe, but recall that the large number of intersections
here (~30000 among ~1000 edges) is far greater than what is expected
from real-world polygons to be filled, (which usually do not have most
of the edges all intersecting each other). But it is encouraging that
even in this very degenerate case that the multi-pass algorithm
converges quickly.

As Hobby warns, however, there is no guarantee that this approach will
converge quickly, and I did see a case that took several more
iterations:

	Testing 512 random edges
	Pass 1 found 32035 intersections:
	Pass 2 found 2619 remaining intersections
	Pass 3 found 126 remaining intersections
	Pass 4 found 25 remaining intersections
	Pass 5 found 16 remaining intersections
	Pass 6 found 9 remaining intersections
	Pass 7 found 2 remaining intersections
	Pass 8 found 2 remaining intersections
	Pass 9 found 1 remaining intersections
	Pass 10 found 1 remaining intersections
	Pass 11 found 1 remaining intersections
	Pass 12 found 1 remaining intersections
	Pass 13 found 1 remaining intersections
	Pass 14 found 1 remaining intersections
	***************No remaining intersections found after pass 15

Meanwhile, when I've generated random edge sets with many fewer
intersections, (within a factor of 10 of the number of edges, say),
there are no remaining intersections found in the second pass.

So, an interesting open question is whether any "real-world" polygon
might require a large number of passes. One thing I've considered
doing is just putting a print message into cairo triggered by some
threshold number of passes and have it request that the user let us
know about their test case that is stressing out the tessellator so
badly.

And a long-term, reliable fix for the problem would be to just
implement the tolerance square algorithm which never requires more
than one pass.

Integrating with cairo (and finding some more bugs)
---------------------------------------------------
This week I integrated the new tessellator with cairo itself. This
work can be seen in the branch named new-tessellator2 of my personal
tree, (which is a newly built branch---not a descendant of the
previous new-tessellator branch). The easiest way to examine the code
is to first fetch the branch into an existing cairo repository with
(yes, the repeated branch name here is a git annoyance):

git fetch git://git.cairographics.org/~cworth/cairo new-tessellator2:new-tessellator2

and then checkout that branch with:

git checkout new-tessellator2

With this in place I was able to run cairo's test suite, and I was
happy to see that for the great majority of the test suite, the
results of the new tessellator are pixel-identical with the results of
the old code.

There are significant (buggy) differences in three tests (bitmap-font,
close-path, and rectangle-rounding error). In the first case, this is
the case where we use cairo_text_path;cairo_fill on a bitmap font and
cairo traces little squares out around every pixel, (did you know it
did that?). That's actually a great test case as it shoves lots of
coincident edges at the tessellator---and it's exposing a bad bug in
the new tessellator as it's leaking the fill out all over, (oops!).

The other two bugs show cases where a portion of the path that should
be filled is not getting filled (oops!).

So there are still bugs here---and that's why it's not merged to
master yet. But I thought people might want to start looking at it
already even in the current state.

Performance (preliminary comments)
----------------------------------
Before I started working on the new tessellator, I wrote a little
cairo/perf test case to demonstrate how slow the current
implementation is. The test case chooses 16, 64, then 256 random
coordinate pairs within a 100x100 pixel area and connects them into a
polygon. Here are the results of that test case on my laptop with the
old and new tessellators (100 iterations for each test):

Old tessellator:
         test-size   mean ms std dev. 
 tessellate-16-100     0.137  0.11%
 tessellate-64-100    15.752  2.09%
tessellate-256-100  1223.497 10.70%

New tessellator:
 tessellate-16-100     0.491  0.81%
 tessellate-64-100    10.075  2.47%
tessellate-256-100   264.525  0.25%

So, for the 256 edge case we're already seeing a 4x speedup. Now, I
think the tessellator performance will improve much more than that in
practice. Here are some details that should be understood:

1) My testing strategy is unkind and unrealistic. Choosing random
   coordinates within a square makes for edges that intersect much
   more frequently than in real-world polygons. In fact, there are
   7332 intersections found by the new tessellator among the 256
   edges. Recall that in the expected O((N+k)logN) we expect N (number
   of edges) to dominate k (number of intersections) in practice, but
   that's clearly not the case here.

   I don't know how the slowness of the old algorithm depends on N
   relative to k, but regardless, I should definitely write a more
   realistic test case.

   It would also be interesting to have people run their favorite
   tessellator-torturing tests through the new code to see how it
   does.

2) In carefully going over the implementation over the past week, I
   noticed that given 32-bit input values plus the 2 extra bits we
   tack on, the intersection code needs to be precise within 101 bits
   of accuracy. All of this code is in fixed not floating-point and
   the old tessellator was using 64-bit arithmetic while the new
   tessellator is fixed to use 128-bit arithmetic.

   So this correctness fix is penalizing the tessellator, (and yes,
   the 128-bit intersection code tops the resulting profile).

   An easy fix would be to characterize the maximum delta-X or delta-Y
   over all input edges. This value can be used to determine the
   maximum number of bits needed for intersection testing. So, in
   practice, cairo should be often be able to work with 64-bit
   arithmetic instead.

   We could even shift off as many common right-most 0 bits as exist
   in the input coordinates, (such as will be present when integer
   coordinates are given to cairo with the default identity
   matrix). That might even allow 32-bit intersection to be used in
   many common cases.

   So, anyway there are some easy-to-achieve performance improvements
   for the new tessellator here.

Where to from here
------------------
Obviously we need to fix the current failures shown by cairo's test
suite, (and also see if we can't get the old leaky-dash bug to go away
too). Once the new bugs are fixed, the new tessellator can be merged
to master.

Oh, and when I integrated the new code into cairo, I temporarily took
out the multi-pass stuff, since it was making the integration
awkward. So that needs to get put back to.

Even before that merge happens it would be very useful to get some
more performance testing from the new code. If people have collected
interesting tessellation test cases, please send us pointers, or try
running them through the new code and let us know how it goes!

Thanks everyone for your interest in cairo.

Happy hacking,

-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20060920/c38d91a5/attachment.pgp