[cairo] Survey of polygon rasterization techniques
Kiia Kallio
kkallio at uiah.fi
Tue Jul 31 23:51:21 PDT 2007
>>> "David Turner" <david at freetype.org> 08/01/07 12:02 AM >>>
> On Tue, 31 Jul 2007 23:12:25 +0300, "Kiia Kallio" <kkallio at uiah.fi> said:
> > The problem with AGG is even more evident if you have several edges in a
> > polygon
> > exactly on top of each other. If you have a windows machine, you can get
> > the pre-built
> > binaries for the comparison tests from
> > http://mlab.uiah.fi/~kkallio/antialiasing/SLEFA_1_0.zip and try it
> > yourself. Check
> > "square long edges e/o" and "square long edges n/z" tests (tests 15 and
> > 16) with
> > bin/TestAppAGG.exe: both have bad AA artefacts all the way the outer
> > edges.
> >
> > BTW, David, if you are skeptic about the performance figures in the
> > > paper, you can run
> > the tests yourself to verify them. :)
> >
>
> I did that, and on a Windows XP install running under VMWare, I have
> mixed feelings about the results. For example, AGG beats any of your
> tests if I set the display to 16-bits.
>
> More curiously, your tests run slower on a 16-bit display than a 32-bit
> one, even though the test is not supposed to be dependent on the display
> pixel format (at least from the code), since it should only measure
> rendering time to a buffer.
The reason why AGG is faster in 16-bit mode is that it's rendering directly
to a 16-bit target (since the render target for AGG is created from the
window directly without using an intermediate 32-bit buffer), while I
always render to a 32-bit SDL surface and SDL takes care of the display
conversion. (The conversion time is not included in the results though,
it's just the difference of 16-bit and 32-bit target. I have not implemented
16-bit targets, but would argue that I could beat AGG performance with
those as well if they were there.)
> Frankly, it seems you're comparing whole graphics pipelines instead of just
> different rasterization algorithms. I'll try to write a C version of your
> algorithm that will be used as a replacement to the FreeType anti-aliasing
> rasterizer. I think it would provide a much better comparison of the various
> schemes.
No, I'm comparing the rasterization (as much that I can without ripping
the libraries into parts). For instance:
- I'm measuring just the time spent within the rendering calls, not FPS
(to get rid of buffer conversions and all that stuff in the results)
- I've converted all the bezier paths to polygons, so that subdivison
algortihms in the implementations can't affect the results
- I create all polygons as polygon objects (or whatever concept the
underlying library uses) first and render with those (instead of feeding
my polygon coordinates to the API over and over again in every frame).
This way I'm really measuring the rendering performance, not just
API overhead (like the Zack Rusin's QT tests do...)
- I'm using only opaque paint to avoid performance differences with
alpha blending code (for instance I'm using Blinn's "correct" fixed point
blending method whereas AGG uses the OpenGL-style "incorrect" but
faster one).
- I'm animating the transformation matrix of the path so that the library
being tested can't do caching of edges or images.
- I have created a variety of tests with different characteristics, for
instance some have lots of edges while some have lots of fill area;
some generate lots of antialiased pixels while some generate less etc.
In addition I have a set of real images to test to get tests where the
characteristics are distributed realistically. (Again, not like the QT tests
where everything is just very complex geometry with lots of small
edges).
Of course coordinate transformations, clipping, polygon setup and all that
is included in the results, but that's unavoidable since they are after all
pretty much part of the algorithm, as different algorithms require different
clipping and setup code.
Kiia
> not to say that your algorithm isn't great, simply that I don't feel at the
> moment that the tests you performed prove what you think they do.
>
> Regards,
>
> - David
>
>
> > Kiia
> >
> > > >>> Jeff Muizelaar <jeff at infidigm.net> 07/31/07 10:19 PM >>>
> > > On Tue, Jul 31, 2007 at 09:01:39PM +0200, Matteo Muratori wrote:
> > > > > Could you elaborate on this "should be faster than both" ? I'm interested
> > > > > to know what kind of improvement you made.
> > > > >
> > > > > I just had a quick look at src/gui/painting/qgrayraster.c in Qt 4.3 and
> > > > > I don't see much differences from a relatively old version of the
> > > > > FreeType anti-aliasing rasterizer. Maybe some details escape me ?
> > > > >
> > > > > Also, you may find interesting that the current FreeType rasterizer
> > > > > sources
> > > > > use a scanline-bucket scheme now, instead of QSort, to sort the cells.
> > > > > I've
> > > > > done this because in some relatively common cases, it provided a x1.25
> > > > > speedup
> > > > > in rendering time.
> > > >
> > > > After having some tests, I've seen than AGG (and i suppose Freetype too)
> > > > has some serious bugs with non simple polygons (intersecting edges,
> > > > partially overlapping edges and so on). It sounds very strange to me
> > > > that such high quality rasterizers have this kind of bug, because it's
> > > > particularly sensible for font rendering, and in general the overall
> > > > error results quite high in those pixels where intersections occur.
> > >
> > > Yeah, David wrote about problem in his message about the Freetype
> > > rasterizer. However, that message doesn't seem to have hit any
> > > archives.. I've included it below.
> > >
> > > -Jeff
> > >
> > > On Tue, Jul 31, 2007 at 01:01:29PM +0200, David Turner wrote:
> > > > For your information, the FreeType anti-aliasing rasterizer has the
> > > > following characteristics:
> > > >
> > > > - computes the analytical area/coverage contribution of each
> > > > segment to each edge pixel. this is similar in essence to what
> > > > LibArt does, though the computations are done pretty differently.
> > > > Again, I would like to thank Raph Levien for inspiration.
> > > >
> > > > - the code used to record the contribution of each individual pixel
> > > > in a "cell record", then Q-Sort all of them before generating the
> > > > spans. (like AGG and Qt 4 do)
> > > >
> > > > However, this has been changed to a scanline-bucket sorted list
> > > > since this proves to be faster in many common cases (up to x1.25
> > > > last time I benchmarked it); but keep in mind this is with glyph
> > > > shapes, which are generally very complex but render to small
> > > > amounts of pixels. For a general-purpose graphics library, other
> > > > choices might be good as well
> > > >
> > > > (OBNOTE: my initial implementation used Shell-sort instead of Q-Sort,
> > > > because the test sample fared better with it, but larger benchmarking
> > > > showed Q-Sort was better in the general case)
> > > >
> > > > - the only things recorded are cell areas/coverage. there are no SVPs,
> > > > active edge lists or other auxiliary data like that.
> > > >
> > > > - doesn't perform any malloc-style allocation. Instead, you pass it a
> > > > block of memory (called the "render pool") and its size, and the
> > > > rasterizer does all the work within it.
> > > >
> > > > if the shape is too complex to be rendered directly within the render
> > > > pool, it is recursively sub-divided and the work is re-started. Note
> > > > however that even with a 16 KB render pool, finding shapes that create
> > > > this condition is pretty difficult (and they generally correspond to
> > > > shapes with lots of pixels, which means your bottleneck is likely going
> > > > to be painting, not rasterizing).
> > > >
> > > > in the common case, all cells are recorded and sorted in the render
> > > > pool before any span is sent to the painting layer. A typical CPU cache
> > > > loves this.
> > > >
> > > > - directly decomposes Bezier paths into line segments during the
> > > > rasterization process. this is also to reduce the amount of input
> > > > data to parse during the rasterization.
> > > >
> > > > the decomposition is made through a non-recursive Decasteljau
> > > > subdivision controlled through very simply heuristics which is very
> > > > speedy.
> > > >
> > > >
> > > > Generally speaking, analytical-based algorithms are very fast, and, in
> > > > my experience *much* faster than multi-sampling as soon as you use more
> > > > than 4 samples / pixel. they have one drawback though, is that they do
> > > > not deal correctly with self-intersecting polygons well.
> > > >
> > > > To illustrate this, consider the following diagram describing a single
> > > > pixel cell that contains a self-intersecting trapezoid ABCD:
> > > >
> > > > A (0,0) B (1,0)
> > > > +----------+
> > > > \ /
> > > > \ /
> > > > \ /
> > > > \ /
> > > > \/
> > > > /\
> > > > / \
> > > > / \
> > > > / \
> > > > / \
> > > > +----------+
> > > > C (0,1) D(1,1)
> > > >
> > > > an analytical algorithm will always compute an area of 0, instead of 0.5 here.
> > > > I claim, without proof, that this 50% error is the worst case you can get
> > > > with an analytical algorithm, and it is quite significant, though happens
> > > > on very rare cases.
> > > >
> > > > there are several ways to get deal with them:
> > > >
> > > > A - get rid of all self-intersection in the input polygon before sending
> > > > it to the rasterizer. this is equivalent to performing tesselation.
> > > >
> > > > B - use some sort of super-sampling, e.g. 4 horizontal bands per pixel
> > > > cell, each with it's own area/coverage value. the maximum error in
> > > > each band is then 50/4 = 12.5% which is hardly perceptible in
> > > > real-life cases.
> > > >
> > > > C - use super-sampling or multi-sampling instead of analytical computations.
> > > >
> > > > D - ignore the problem, since most users will never notice the problem. this
> > > > is what is done in FreeType 2, because glyph shapes are supposed to not
> > > > self-intersect by default anyway (some broken font have them, but nobody
> > > > ever complained about their rendering anyway).
> > > >
> > > > E - find a way to quickly spot the pixels containing self-intersection pixels
> > > > from others in the rasterization process, and deal with them differently
> > > > (e.g. with super-sampling). due to the rarity of the cases, this should
> > > > not be a performance problem for 99.9% of pixels, but the "detection"
> > > > code needs to be very fast to not slow down the rest of the algorithm
> > > > too much. false positive are allowed if they're still rare.
> > > >
> > > >
> > > > A is very slow (because tesselation costs a lot), but the most correct you can get
> > > >
> > > > B has a definite performance impact compared to "naive" analytical computations.
> > > > last time I experimented with it, I found it was between x1.3 and x1.4 slower.
> > > > however, the error was really un-distinguishible to the eye (a 12.5% gray instead
> > > > of a 50% one isn't too different in vector graphics)
> > > >
> > > > a 2 band/pixel approach would yeld a 25% worst-case error which may also sound
> > > > acceptable to a lot of people.
> > > >
> > > > C in all my experimentations, super-sampling and multi-sampling proved to be
> > > > significantly slower unless you want to sacrifice quality, I'm speaking of
> > > > orders of magnitudes of at least 2x or 3x. I believe this is mainly due to
> > > > the following facts:
> > > >
> > > > - first, you need to do a *lot* more DDA steps per pixel (unless you
> > > > use a small amount of sub-pixels, in which case the result looks crap)
> > > >
> > > > - bit-counting is slow, unless you have a specialized assembler
> > > > instruction to do it, and it must be done for each pixel. and
> > > > the more samples, the slower.
> > > >
> > > > OBNOTE: The FreeType 1 rasterizer has 2x2 and 4x4 super-sampling modes that
> > > > use pretty wicked tricks to speed things up and use a minimum amount
> > > > of memory. Only the 2x2 one is faster than FreeType 2's anti-aliasing
> > > > rasterizer, the 4x4 one is slower by a wide margin.
> > > >
> > > > however, I admit that it may be possible that ingenious implementations of
> > > > super-sampling might get good performance as well. Color me skeptic about the
> > > > claims made by the "edge-flag" paper though.
> > > >
> > > > D well, I would not recommend it for Cairo, but it could be used as an interesting
> > > > "quick rendering" mode for Cairo, for example when drawing animations where this
> > > > kind of worst cases are totally insignificant...
> > > >
> > > > as for E, I've tried experimenting with it, but never found the time to actually
> > > > implement something very fast :-( I still think it's a promising idea though...
> > > >
> > > >
> > > > hope you'll find all of this interesting
> > > >
> > > > - David Turner
> > > > - The FreeType Project (www.freetype.org)
> > > >
> > > >
> > > _______________________________________________
> > > cairo mailing list
> > > cairo at cairographics.org
> > > http://lists.cairographics.org/mailman/listinfo/cairo
> > >
> >
> > _______________________________________________
> > cairo mailing list
> > cairo at cairographics.org
> > http://lists.cairographics.org/mailman/listinfo/cairo
More information about the cairo
mailing list