[cairo] Survey of polygon rasterization techniques
Jeff Muizelaar
jeff at infidigm.net
Tue Jul 31 12:19:05 PDT 2007
On Tue, Jul 31, 2007 at 09:01:39PM +0200, Matteo Muratori wrote:
> > Could you elaborate on this "should be faster than both" ? I'm interested
> > to know what kind of improvement you made.
> >
> > I just had a quick look at src/gui/painting/qgrayraster.c in Qt 4.3 and
> > I don't see much differences from a relatively old version of the
> > FreeType anti-aliasing rasterizer. Maybe some details escape me ?
> >
> > Also, you may find interesting that the current FreeType rasterizer
> > sources
> > use a scanline-bucket scheme now, instead of QSort, to sort the cells.
> > I've
> > done this because in some relatively common cases, it provided a x1.25
> > speedup
> > in rendering time.
>
> After having some tests, I've seen than AGG (and i suppose Freetype too)
> has some serious bugs with non simple polygons (intersecting edges,
> partially overlapping edges and so on). It sounds very strange to me
> that such high quality rasterizers have this kind of bug, because it's
> particularly sensible for font rendering, and in general the overall
> error results quite high in those pixels where intersections occur.
Yeah, David wrote about problem in his message about the Freetype
rasterizer. However, that message doesn't seem to have hit any
archives.. I've included it below.
-Jeff
On Tue, Jul 31, 2007 at 01:01:29PM +0200, David Turner wrote:
> For your information, the FreeType anti-aliasing rasterizer has the
> following characteristics:
>
> - computes the analytical area/coverage contribution of each
> segment to each edge pixel. this is similar in essence to what
> LibArt does, though the computations are done pretty differently.
> Again, I would like to thank Raph Levien for inspiration.
>
> - the code used to record the contribution of each individual pixel
> in a "cell record", then Q-Sort all of them before generating the
> spans. (like AGG and Qt 4 do)
>
> However, this has been changed to a scanline-bucket sorted list
> since this proves to be faster in many common cases (up to x1.25
> last time I benchmarked it); but keep in mind this is with glyph
> shapes, which are generally very complex but render to small
> amounts of pixels. For a general-purpose graphics library, other
> choices might be good as well
>
> (OBNOTE: my initial implementation used Shell-sort instead of Q-Sort,
> because the test sample fared better with it, but larger benchmarking
> showed Q-Sort was better in the general case)
>
> - the only things recorded are cell areas/coverage. there are no SVPs,
> active edge lists or other auxiliary data like that.
>
> - doesn't perform any malloc-style allocation. Instead, you pass it a
> block of memory (called the "render pool") and its size, and the
> rasterizer does all the work within it.
>
> if the shape is too complex to be rendered directly within the render
> pool, it is recursively sub-divided and the work is re-started. Note
> however that even with a 16 KB render pool, finding shapes that create
> this condition is pretty difficult (and they generally correspond to
> shapes with lots of pixels, which means your bottleneck is likely going
> to be painting, not rasterizing).
>
> in the common case, all cells are recorded and sorted in the render
> pool before any span is sent to the painting layer. A typical CPU cache
> loves this.
>
> - directly decomposes Bezier paths into line segments during the
> rasterization process. this is also to reduce the amount of input
> data to parse during the rasterization.
>
> the decomposition is made through a non-recursive Decasteljau
> subdivision controlled through very simply heuristics which is very
> speedy.
>
>
> Generally speaking, analytical-based algorithms are very fast, and, in
> my experience *much* faster than multi-sampling as soon as you use more
> than 4 samples / pixel. they have one drawback though, is that they do
> not deal correctly with self-intersecting polygons well.
>
> To illustrate this, consider the following diagram describing a single
> pixel cell that contains a self-intersecting trapezoid ABCD:
>
> A (0,0) B (1,0)
> +----------+
> \ /
> \ /
> \ /
> \ /
> \/
> /\
> / \
> / \
> / \
> / \
> +----------+
> C (0,1) D(1,1)
>
> an analytical algorithm will always compute an area of 0, instead of 0.5 here.
> I claim, without proof, that this 50% error is the worst case you can get
> with an analytical algorithm, and it is quite significant, though happens
> on very rare cases.
>
> there are several ways to get deal with them:
>
> A - get rid of all self-intersection in the input polygon before sending
> it to the rasterizer. this is equivalent to performing tesselation.
>
> B - use some sort of super-sampling, e.g. 4 horizontal bands per pixel
> cell, each with it's own area/coverage value. the maximum error in
> each band is then 50/4 = 12.5% which is hardly perceptible in
> real-life cases.
>
> C - use super-sampling or multi-sampling instead of analytical computations.
>
> D - ignore the problem, since most users will never notice the problem. this
> is what is done in FreeType 2, because glyph shapes are supposed to not
> self-intersect by default anyway (some broken font have them, but nobody
> ever complained about their rendering anyway).
>
> E - find a way to quickly spot the pixels containing self-intersection pixels
> from others in the rasterization process, and deal with them differently
> (e.g. with super-sampling). due to the rarity of the cases, this should
> not be a performance problem for 99.9% of pixels, but the "detection"
> code needs to be very fast to not slow down the rest of the algorithm
> too much. false positive are allowed if they're still rare.
>
>
> A is very slow (because tesselation costs a lot), but the most correct you can get
>
> B has a definite performance impact compared to "naive" analytical computations.
> last time I experimented with it, I found it was between x1.3 and x1.4 slower.
> however, the error was really un-distinguishible to the eye (a 12.5% gray instead
> of a 50% one isn't too different in vector graphics)
>
> a 2 band/pixel approach would yeld a 25% worst-case error which may also sound
> acceptable to a lot of people.
>
> C in all my experimentations, super-sampling and multi-sampling proved to be
> significantly slower unless you want to sacrifice quality, I'm speaking of
> orders of magnitudes of at least 2x or 3x. I believe this is mainly due to
> the following facts:
>
> - first, you need to do a *lot* more DDA steps per pixel (unless you
> use a small amount of sub-pixels, in which case the result looks crap)
>
> - bit-counting is slow, unless you have a specialized assembler
> instruction to do it, and it must be done for each pixel. and
> the more samples, the slower.
>
> OBNOTE: The FreeType 1 rasterizer has 2x2 and 4x4 super-sampling modes that
> use pretty wicked tricks to speed things up and use a minimum amount
> of memory. Only the 2x2 one is faster than FreeType 2's anti-aliasing
> rasterizer, the 4x4 one is slower by a wide margin.
>
> however, I admit that it may be possible that ingenious implementations of
> super-sampling might get good performance as well. Color me skeptic about the
> claims made by the "edge-flag" paper though.
>
> D well, I would not recommend it for Cairo, but it could be used as an interesting
> "quick rendering" mode for Cairo, for example when drawing animations where this
> kind of worst cases are totally insignificant...
>
> as for E, I've tried experimenting with it, but never found the time to actually
> implement something very fast :-( I still think it's a promising idea though...
>
>
> hope you'll find all of this interesting
>
> - David Turner
> - The FreeType Project (www.freetype.org)
>
>
More information about the cairo
mailing list