[cairo] Some performance investigations

Mon Nov 13 15:34:45 PST 2006

On Tue, 14 Nov 2006 00:18:11 +0200 (EET), M Joonas Pihlaja wrote:
> I was looking into the new tessellator, inspired by Chris
> Nuernberger's post re: 128 bit arithmetic,

Fantastic! I really appreciate lots of help in this area.

> I realise that with the move to 24.8 from 16.16 fixed point
> coordinates, this might not really be a high priority anymore,
> though.

It will probably still be important. Regardless of the coordinate
range, we'll still need careful accounting for every bit in the
arithmetic.

> 1. It seems to me that the new tessellator code only works for
> x/y coordinates in the range [-2^29+1, 2^29-1] (when viewing the
> fixed point coordinates as integers) due to the two additional
> guard bits introduced in the new tessellator.  May I assume that
> this restriction is a hard one that isn't going to change in the
> near future?

One thing I'd like to do is to have the tessellator first find the
minimum point at then rebase all the coordinates so that it can deal
with only positive integers. I think that will help a bit.

As for the two guard bits, I threw them in to make a problem go away
that occurred during random testing, (with lots of edges very close to
each other---hundreds of edges with coordinates in the range [0, 10]
when viewed as integers). It may be the reason they were necessary is
that we weren't taking advantage of examining the bits coming out of
the division properly. And as you have noticed below, with the guard
bits in place, we may not need as much precision in the division.

I think the bug I actually hit could have been fixed with a single
guard bit. The reason I threw in a second bit is that we plan to
eventually add an implementation of John Hobby's "tolerance square"
algorithm[*] which requires being able to accurately compare
intersection points to tolerance square edges on half-integer
positions, (relative to the grid of the input coordinates).

> 2.  Similarly, are the
>
>     /* XXX: We're assuming here that dx and dy will still fit in 32
>      * bits. That's not true in general as there could be overflow. We
>      * should prevent that before the tessellation algorithm
>      * begins. */
>
> comments in the code going to be valid for the moment?  These
> comments apply to computing the offset vector from the start of a
> polygon edge to its end.

You can definitely assume that range considerations will be taken care
of up front, (we need to do that for things like deciding between
128-bit and 64-bit paths for example). Any any analysis you can do now
and comments you can add for what input ranges each function will
correctly operate with would be appreciated.

I just pushed out the initial work I did on the 64-bit paths to a new
branch in my personal tree:

	new-tessellator-64-bit

It's totally garbage code in many ways, (breaks most cases of clipping
in the test suite, introduces an inordinate amount of copy-and-pasted
code), but if someone wants to start with that as a basis for doing
some cleanups, there it is.

I think I found that this made world_map about twice as fast, (but
still about twice as slow as the old tessellator).

> 3. In _cairo_quorem128_32_compare(), the function always compares
> a > b if the integer parts are equal and the remainder part of a
> is non-zero.  But couldn't _cairo_int128_divrem()  return a
> negative remainder if the numerator is < 0?  I think that the
> division operation should be normalised so that the remainder is
> always non-negative, and that rounding to integer coordinates
> should do round-to-negative-infinity throughout the new
> tessellator code.  I'm not sure, but it seems to me that the
> current rounding might have symmetry issues around the axes.

Yes, the remainder should always be non-negative and the division
should round toward negative infinity.

> So if the answer to the first or second questions is "no, those
> are temporary assumptions", then the plan below for removing the
> 128/128 bit divisions won't work.

If we can take advantage of them, we can make them permanent.

[snip]

> As _cairo_bo_edge_contains_point_quorem128() can currently never
> return true for intersection points that overflow 32 bits, thanks
> to the range of the edges' coordinates, I believe this would be
> safe.

Yes, something like that would be fine. You've got the crucial insight
here, that when the numbers get really large we're really not going to
care about them anymore.

> Finally, the 128/128 bit divisions in intersect_lines() look like
> they're only ever being called[2] for a < 95+1 bit signed
> numerator and a 63+1 bit signed denominator, so they could be
> placed with a more specialised division that returns a
> cairo_intersect_ordinate_t instead.

In Hobby's paper[*] he goes through analysis showing that with N-bit
integers for input coordinates the division needs to be accurate to
within 1 part in 2**(N*3+1), (and then 2**(N*3+2) for tolerance
squares).

And yes, we can take advantage of the smaller denominator as well.

Another idea in all of this is to carefully characterize the precision
of available floating-point arithmetic, (we've recently decided that
we're going to explicitly not support an FPU in single-precision
mode), so for machines in which there is hardware FPU, we should
probably just use that, (which is one thing the old tessellator is
taking advantage of for some performance benefits).

> Does that sound like a reasonable plan?  I'm not 100% sure I
> caught all the corner cases in the current implementation.

You seem to have a good grasp on it. As should be evident from the
discussion above, some of the current code is based on actual problem
encountered with tiny integers, so some good analysis to back it up is
more than I've done everywhere, and would be greatly appreciated.

As I mentioned above, the only thing to look out for is that in the
future we may want one  extra bit of precision for the tolerance
squares, (but the current guard bits may already have that bit in
place).

Thanks for your help already, I'm looking forward to what you can come
up with.

And please, have fun with cairo!

-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20061113/58f382be/attachment.pgp