[cairo] Survey of polygon rasterization techniques
David Turner
david at freetype.org
Fri Aug 17 14:17:56 PDT 2007
Thanks Kiia (and Jeff) for your suggestions.
I have an updated tarball at the following address:
http://david.freetype.org/rasterizer-shootout/raster-comparison-20070813.tar.bz2
the changes are:
- a new "kiia8" algorithm that gets rids of some indexing operations.
instead, offsets are hard-coded in the code, which results in a
significant speedup in many cases.
- Jeff "tor9" algorithm, previously submitted here as a patch. this
one uses a linked list to speed up rendering for large sizes, but
slows down small ones (in more important ways, I might add)
- another variant, named "tor10", derived from tor9, that uses
an area+coverage computation. this allows us to use twice less nodes
per span in the linked list. this results in a consistent speedup
compared to tor9, but still nothing to be proud of compared with
tor8 at small sizes.
I have not implementing proper clipping-before-rasterization yet.
Regarding char/word access in Kiia's algorithm, I had already tested both
approaches and found the word access to be faster in practice. this is even
truer in kiia8, since most of the complicated shifting and masking is done
mostly by the compiler now.
I'll keep on hacking, I had plenty of interesting ideas lately, and of course
not a lot of time to implement them. Keep on providing insight and suggestions
thanks,
- David
On Fri, 10 Aug 2007 22:28:33 +0300, "Kiia Kallio" <kkallio at uiah.fi> said:
> David,
>
> This is really great work that you ported all these algorithms into one
> framework.
>
> I haven't had time to go through the implementation properly yet, but it
> seems that
> the edge plotting loop is much more inefficient than the one I have
> implemented.
> I profiled this, and the implementation seems to consume 25% of time in
> the edge
> plotting.
>
> There are for instance compares for the x coordinate in the inner loop.
> If the
> clipping is properly implemented, these are not needed. Also, the
> plotting is done
> to 32-bit chunks with masking, while it can be implemented directly to
> char
> data much more efficiently. There is also some additional indexing math
> not
> present in the original implementation, and the code doesn't use
> immediate
> data for offsets but fetches them from an array.
>
> There seems to be also some issues with clipping, as the program crashes
> with
> some configurations (especially if dumping to files is in use) due to
> memory
> trashing.
>
> I will investigate this more and send you a patch when I have time...
> This will be
> sometime next week earliest though.
>
> Kiia
>
> >>> "David Turner" <david at freetype.org> 08/08/07 7:28 PM >>>
> Thanks Philipp,
>
> I spotted and fixed the problem (a stupid typo), and this quite changes
> the performance
> of the "tor7" algorithm (slowing it down, but making it correct). I
> modified the program
> to dump the content of rasterization in a large .pgm image file for each
> rasterizer, and
> this helped me spot many other problems that I hadn't to deal when using
> the rasterizer
> in ft2demos. Mostly were related to horizontal clipping. I also found a
> few optimizations
> to write "tor8", which is mostly the same than "tor7" with loop unrolling
> in a few critical parts.
>
> the new sources are at:
>
> http://david.freetype.org/rasterizer-shootout/raster-comparison-20070808.tar.bz2
>
> you can now use the "-d" flag to dump rasterization images in the current
> directory
> (warning, these will be *large* .pgm files).
>
> or specify a given test with the "-t" option (useful for profiling)
>
> here are the new benchmarks (omitting most non-optimized versions now):
>
> "ftrasters -r50 -s20
> /usr/share/fonts/truetype/msttcorefonts/Times_New_Roman.ttf"
>
> loading 1320 glyphs = 0.030000 s
> rasterizer time throughput global
> relative
> ================================================================================
> with dummy : 0.200 s ( 330000 glyphs/s)
> with analytical : 0.770 s ( 85714 glyphs/s) x 1.00
> x 1.00
> with mono1 : 0.440 s ( 150000 glyphs/s) x 0.57
> x 0.42
> with tor15x17_8 : 1.550 s ( 42580 glyphs/s) x 2.01
> x 2.37
> with tor16x16_8 : 1.590 s ( 41509 glyphs/s) x 2.06
> x 2.44
> with tor16x16_7 : 1.670 s ( 39520 glyphs/s) x 2.17
> x 2.58
> with kiia32_7 : 1.550 s ( 42580 glyphs/s) x 2.01
> x 2.37
> with tor8x32_8 : 1.180 s ( 55932 glyphs/s) x 1.53
> x 1.72
> with tor8x8_8 : 1.170 s ( 56410 glyphs/s) x 1.52
> x 1.70
> with kiia16_7 : 1.190 s ( 55462 glyphs/s) x 1.55
> x 1.74
> with kiia8_7 : 0.910 s ( 72527 glyphs/s) x 1.18
> x 1.25
>
> "ftrasters -r5 -s600 /usr/share/fonts/truetype/msttcorefonts/Arial.ttf"
>
> loading 1320 glyphs = 0.030000 s
> rasterizer time throughput global
> relative
> ================================================================================
> with dummy : 0.070 s ( 94285 glyphs/s)
> with analytical : 1.460 s ( 4520 glyphs/s) x 1.00
> x 1.00
> with mono1 : 0.690 s ( 9565 glyphs/s) x 0.47
> x 0.45
> with tor15x17_8 : 4.170 s ( 1582 glyphs/s) x 2.86
> x 2.95
> with tor16x16_8 : 4.810 s ( 1372 glyphs/s) x 3.29
> x 3.41
> with tor16x16_7 : 5.330 s ( 1238 glyphs/s) x 3.65
> x 3.78
> with kiia32_7 : 3.300 s ( 2000 glyphs/s) x 2.26
> x 2.32
> with tor8x32_8 : 3.800 s ( 1736 glyphs/s) x 2.60
> x 2.68
> with tor8x8_8 : 3.710 s ( 1778 glyphs/s) x 2.54
> x 2.62
> with kiia16_7 : 2.400 s ( 2750 glyphs/s) x 1.64
> x 1.68
> with kiia8_7 : 1.860 s ( 3548 glyphs/s) x 1.27
> x 1.29
>
> * as you can see the high-quality Kiia and Tor algorithms are now
> pretty equivalent
> for small sizes, while Kiaa's multi-sampler takes the lead for large
> ones.
>
> * for low-quality/speed, Kiaa's multi-sampler takes an even larger
> lead, regardless
> of the size of the glyphs.
>
> I'm still scratching my head to find other variants or optimizations.
>
> thanks a lot
>
> - David
>
>
> On Wed, 08 Aug 2007 12:25:28 +0200, "Philipp Heise" <heise at in.tum.de>
> said:
> > David Turner wrote:
> > > ...
> > > if you feel like it, feel free to look at the code, run it, and improve it, or ask
> > > questions in this thread.
> > >
> > > Hope this helps,
> > >
> >
> > Hi David,
> >
> > your program to compare the performance of various rasterization
> > algorithms is really great ... Thank you!
> >
> > Since there was no output for the resulting glyphs and I wanted to see
> > some results - I put the following hack in your ConverRaster() after
> > raster_render() :
> >
> > {
> > FILE* f;
> > int i;
> > f=fopen("out.pgm","wb");
> > fprintf(f,"P2\n%d %d\n255\n",width,height);
> > for(i=0;i<width*height;i++)
> > fprintf(f,"%d\n",bitmap->buffer[i]);
> > fclose(f);
> > printf("press enter to continue\n");
> > getchar();
> > }
> >
> >
> > The results from the rasterizer of the torXXX_7-variant seem to be wrong
> > for most glyphs (tested with Bitstream Vera)...
> >
> > Thanks Philipp
> _______________________________________________
> cairo mailing list
> cairo at cairographics.org
> http://lists.cairographics.org/mailman/listinfo/cairo
>
>
More information about the cairo
mailing list