[cairo] Survey of polygon rasterization techniques

Kiia Kallio kkallio at uiah.fi
Wed Aug 22 00:46:55 PDT 2007


Attached is a fixed version of the implementation of scanline edge-flag
algorithm. I didn't have much time to spend on it and there still might
be room for improvement, but I'll be traveling next week, so I'll post it
now anyway...

In addition to the obvious fixes I did some magic with the bit count
calculations (replaced the table lookups with a multiply and shift). I
took a shortcut with clipping; the rasterizer requires properly clipped
input edges so that it doesn't have to perform comparisons in the inner
loop. I didn't have time to move my clipper code to the implementation,
so for the few cases where the vertices were out of the target rectangle
I just clamped them in. Having a proper clipper there would not have
any effect on the performance, as the clip case is very rare in the tests
anyway.

Below is a sweep of different sizes with webdings font, as I think it
represents typical vector graphic content better than text-only fonts. As
can be seen from the results, analytical performs better with small sizes,
while scanline edge-flag algorithm performs better with large sizes.
The performance with small sizes can probably be improved by using
a fixed plot area instead of scanline approach. This would remove the
need for edge lists, as edges https://webaccess1.uiah.fi/gw/webacc#
Send can be plotted instantly. This means
also constant memory usage at rendering.

Note also that the even-odd fill rule performs even better than the
non-zero winding one, as there is no need to handle the winding
counters. It might be useless for font rendering, but for instance Flash
uses even-odd fill rule for all fills. SVG and PDF can use either one.

Kiia

ftrasters -r50 -s25 webdings.ttf
loading 227 glyphs  = 0.032000 s
rasterizer                  time        throughput             global  relative
================================================================================

with dummy                :    0.125 s  (   90800 glyphs/s)
with analytical           :    0.297 s  (   38215 glyphs/s)    x 1.00  x 1.00
with mono1                :    0.172 s  (   65988 glyphs/s)    x 0.58  x 0.27
with tor15x17_8           :    0.641 s  (   17706 glyphs/s)    x 2.16  x 3.00
with tor16x16_8           :    0.625 s  (   18160 glyphs/s)    x 2.10  x 2.91
with tor16x16_7           :    0.672 s  (   16889 glyphs/s)    x 2.26  x 3.18
with kiia32_7             :    0.421 s  (   26959 glyphs/s)    x 1.42  x 1.72
with tor8x32_8            :    0.454 s  (   25000 glyphs/s)    x 1.53  x 1.91
with tor8x8_8             :    0.468 s  (   24252 glyphs/s)    x 1.58  x 1.99
with kiia16_7             :    0.344 s  (   32994 glyphs/s)    x 1.16  x 1.27
with kiia8_7              :    0.328 s  (   34603 glyphs/s)    x 1.10  x 1.18

ftrasters -r50 -s50 webdings.ttf
loading 227 glyphs  = 0.046000 s
rasterizer                  time        throughput             global  relative
================================================================================

with dummy                :    0.110 s  (  103181 glyphs/s)
with analytical           :    0.515 s  (   22038 glyphs/s)    x 1.00  x 1.00
with mono1                :    0.250 s  (   45400 glyphs/s)    x 0.49  x 0.35
with tor15x17_8           :    1.094 s  (   10374 glyphs/s)    x 2.12  x 2.43
with tor16x16_8           :    1.063 s  (   10677 glyphs/s)    x 2.06  x 2.35
with tor16x16_7           :    1.140 s  (    9956 glyphs/s)    x 2.21  x 2.54
with kiia32_7             :    0.610 s  (   18606 glyphs/s)    x 1.18  x 1.23
with tor8x32_8            :    0.718 s  (   15807 glyphs/s)    x 1.39  x 1.50
with tor8x8_8             :    0.704 s  (   16122 glyphs/s)    x 1.37  x 1.47
with kiia16_7             :    0.484 s  (   23450 glyphs/s)    x 0.94  x 0.92
with kiia8_7              :    0.422 s  (   26895 glyphs/s)    x 0.82  x 0.77

ftrasters -r50 -s100 webdings.ttf
loading 227 glyphs  = 0.031000 s
rasterizer                  time        throughput             global  relative
================================================================================

with dummy                :    0.094 s  (  120744 glyphs/s)
with analytical           :    0.968 s  (   11725 glyphs/s)    x 1.00  x 1.00
with mono1                :    0.375 s  (   30266 glyphs/s)    x 0.39  x 0.32
with tor15x17_8           :    1.969 s  (    5764 glyphs/s)    x 2.03  x 2.15
with tor16x16_8           :    1.922 s  (    5905 glyphs/s)    x 1.99  x 2.09
with tor16x16_7           :    2.125 s  (    5341 glyphs/s)    x 2.20  x 2.32
with kiia32_7             :    1.046 s  (   10850 glyphs/s)    x 1.08  x 1.09
with tor8x32_8            :    1.282 s  (    8853 glyphs/s)    x 1.32  x 1.36
with tor8x8_8             :    1.265 s  (    8972 glyphs/s)    x 1.31  x 1.34
with kiia16_7             :    0.844 s  (   13447 glyphs/s)    x 0.87  x 0.86
with kiia8_7              :    0.672 s  (   16889 glyphs/s)    x 0.69  x 0.66

ftrasters -r50 -s200 webdings.ttf
loading 227 glyphs  = 0.031000 s
rasterizer                  time        throughput             global  relative
================================================================================

with dummy                :    0.172 s  (   65988 glyphs/s)
with analytical           :    1.719 s  (    6602 glyphs/s)    x 1.00  x 1.00
with mono1                :    0.609 s  (   18637 glyphs/s)    x 0.35  x 0.28
with tor15x17_8           :    3.844 s  (    2952 glyphs/s)    x 2.24  x 2.37
with tor16x16_8           :    3.812 s  (    2977 glyphs/s)    x 2.22  x 2.35
with tor16x16_7           :    4.219 s  (    2690 glyphs/s)    x 2.45  x 2.62
with kiia32_7             :    1.859 s  (    6105 glyphs/s)    x 1.08  x 1.09
with tor8x32_8            :    2.657 s  (    4271 glyphs/s)    x 1.55  x 1.61
with tor8x8_8             :    2.468 s  (    4598 glyphs/s)    x 1.44  x 1.48
with kiia16_7             :    1.454 s  (    7806 glyphs/s)    x 0.85  x 0.83
with kiia8_7              :    1.187 s  (    9561 glyphs/s)    x 0.69  x 0.66

ftrasters -r50 -s400 webdings.ttf
loading 227 glyphs  = 0.031000 s
rasterizer                  time        throughput             global  relative
================================================================================

with dummy                :    0.344 s  (   32994 glyphs/s)
with analytical           :    4.156 s  (    2730 glyphs/s)    x 1.00  x 1.00
with mono1                :    1.265 s  (    8972 glyphs/s)    x 0.30  x 0.24
with tor15x17_8           :    8.797 s  (    1290 glyphs/s)    x 2.12  x 2.22
with tor16x16_8           :    8.407 s  (    1350 glyphs/s)    x 2.02  x 2.12
with tor16x16_7           :    9.250 s  (    1227 glyphs/s)    x 2.23  x 2.34
with kiia32_7             :    4.015 s  (    2826 glyphs/s)    x 0.97  x 0.96
with tor8x32_8            :    5.844 s  (    1942 glyphs/s)    x 1.41  x 1.44
with tor8x8_8             :    5.906 s  (    1921 glyphs/s)    x 1.42  x 1.46
with kiia16_7             :    3.313 s  (    3425 glyphs/s)    x 0.80  x 0.78
with kiia8_7              :    2.750 s  (    4127 glyphs/s)    x 0.66  x 0.63

ftrasters -r50 -s800 webdings.ttf
loading 227 glyphs  = 0.046000 s
rasterizer                  time        throughput             global  relative
================================================================================

with dummy                :    1.875 s  (    6053 glyphs/s)
with analytical           :   11.407 s  (     995 glyphs/s)    x 1.00  x 1.00
with mono1                :    3.781 s  (    3001 glyphs/s)    x 0.33  x 0.20
with tor15x17_8           :   23.437 s  (     484 glyphs/s)    x 2.05  x 2.26
with tor16x16_8           :   21.954 s  (     516 glyphs/s)    x 1.92  x 2.11
with tor16x16_7           :   23.562 s  (     481 glyphs/s)    x 2.07  x 2.28
with kiia32_7             :   10.672 s  (    1063 glyphs/s)    x 0.94  x 0.92
with tor8x32_8            :   16.687 s  (     680 glyphs/s)    x 1.46  x 1.55
with tor8x8_8             :   16.688 s  (     680 glyphs/s)    x 1.46  x 1.55
with kiia16_7             :    9.953 s  (    1140 glyphs/s)    x 0.87  x 0.85
with kiia8_7              :    8.594 s  (    1320 glyphs/s)    x 0.75  x 0.70

>>> "Kiia Kallio" <kkallio at uiah.fi> 08/10/07 10:28 PM >>>
David,

This is really great work that you ported all these algorithms into one framework.

I haven't had time to go through the implementation properly yet, but it seems that
the edge plotting loop is much more inefficient than the one I have implemented.
I profiled this, and the implementation seems to consume 25% of time in the edge
plotting.

There are for instance compares for the x coordinate in the inner loop. If the
clipping is properly implemented, these are not needed. Also, the plotting is done
to 32-bit chunks with masking, while it can be implemented directly to char
data much more efficiently. There is also some additional indexing math not
present in the original implementation, and the code doesn't use immediate
data for offsets but fetches them from an array.

There seems to be also some issues with clipping, as the program crashes with
some configurations (especially if dumping to files is in use) due to memory
trashing.

I will investigate this more and send you a patch when I have time... This will be
sometime next week earliest though.

Kiia

>>> "David Turner" <david at freetype.org> 08/08/07 7:28 PM >>>
Thanks Philipp,

I spotted and fixed the problem (a stupid typo), and this quite changes the performance
of the "tor7" algorithm (slowing it down, but making it correct). I modified the program
to dump the content of rasterization in a large .pgm image file for each rasterizer, and
this helped me spot many other problems that I hadn't to deal when using the rasterizer
in ft2demos. Mostly were related to horizontal clipping. I also found a few optimizations
to write "tor8", which is mostly the same than "tor7" with loop unrolling in a few critical parts.

the new sources are at:

  http://david.freetype.org/rasterizer-shootout/raster-comparison-20070808.tar.bz2

you can now use the "-d" flag to dump rasterization images in the current directory
(warning, these will be *large* .pgm files).

or specify a given test with the "-t" option (useful for profiling)

here are the new benchmarks (omitting most non-optimized versions now):

  "ftrasters -r50 -s20 /usr/share/fonts/truetype/msttcorefonts/Times_New_Roman.ttf"

    loading 1320 glyphs  = 0.030000 s
    rasterizer                  time        throughput             global  relative
    ================================================================================
    with dummy                :    0.200 s  (  330000 glyphs/s)
    with analytical           :    0.770 s  (   85714 glyphs/s)    x 1.00  x 1.00
    with mono1                :    0.440 s  (  150000 glyphs/s)    x 0.57  x 0.42
    with tor15x17_8           :    1.550 s  (   42580 glyphs/s)    x 2.01  x 2.37
    with tor16x16_8           :    1.590 s  (   41509 glyphs/s)    x 2.06  x 2.44
    with tor16x16_7           :    1.670 s  (   39520 glyphs/s)    x 2.17  x 2.58
    with kiia32_7             :    1.550 s  (   42580 glyphs/s)    x 2.01  x 2.37
    with tor8x32_8            :    1.180 s  (   55932 glyphs/s)    x 1.53  x 1.72
    with tor8x8_8             :    1.170 s  (   56410 glyphs/s)    x 1.52  x 1.70
    with kiia16_7             :    1.190 s  (   55462 glyphs/s)    x 1.55  x 1.74
    with kiia8_7              :    0.910 s  (   72527 glyphs/s)    x 1.18  x 1.25

  "ftrasters -r5 -s600 /usr/share/fonts/truetype/msttcorefonts/Arial.ttf"

    loading 1320 glyphs  = 0.030000 s
    rasterizer                  time        throughput             global  relative
    ================================================================================
    with dummy                :    0.070 s  (   94285 glyphs/s)
    with analytical           :    1.460 s  (    4520 glyphs/s)    x 1.00  x 1.00
    with mono1                :    0.690 s  (    9565 glyphs/s)    x 0.47  x 0.45
    with tor15x17_8           :    4.170 s  (    1582 glyphs/s)    x 2.86  x 2.95
    with tor16x16_8           :    4.810 s  (    1372 glyphs/s)    x 3.29  x 3.41
    with tor16x16_7           :    5.330 s  (    1238 glyphs/s)    x 3.65  x 3.78
    with kiia32_7             :    3.300 s  (    2000 glyphs/s)    x 2.26  x 2.32
    with tor8x32_8            :    3.800 s  (    1736 glyphs/s)    x 2.60  x 2.68
    with tor8x8_8             :    3.710 s  (    1778 glyphs/s)    x 2.54  x 2.62
    with kiia16_7             :    2.400 s  (    2750 glyphs/s)    x 1.64  x 1.68
    with kiia8_7              :    1.860 s  (    3548 glyphs/s)    x 1.27  x 1.29

  * as you can see the high-quality Kiia and Tor algorithms are now pretty equivalent
    for small sizes, while Kiaa's multi-sampler takes the lead for large ones. 

  * for low-quality/speed, Kiaa's multi-sampler takes an even larger lead, regardless
    of the size of the glyphs.

I'm still scratching my head to find other variants or optimizations.

thanks a lot

- David


On Wed, 08 Aug 2007 12:25:28 +0200, "Philipp Heise" <heise at in.tum.de> said:
> David Turner wrote:
> >   ...
> >   if you feel like it, feel free to look at the code, run it, and improve it, or ask
> >   questions in this thread.
> > 
> >   Hope this helps,
> > 
> 
> Hi David,
> 
> your program to compare the performance of various rasterization
> algorithms is really great ... Thank you!
> 
> Since there was no output for the resulting glyphs and I wanted to see
> some results  - I put the following hack in your ConverRaster() after
> raster_render() :
> 
>   {
>    FILE* f;
>    int i;
>    f=fopen("out.pgm","wb");
>    fprintf(f,"P2\n%d %d\n255\n",width,height);
>    for(i=0;i<width*height;i++)
>      fprintf(f,"%d\n",bitmap->buffer[i]);
>    fclose(f);
>    printf("press enter to continue\n");
>    getchar();
>   }
> 
> 
> The results from the rasterizer of the torXXX_7-variant seem to be wrong
> for most glyphs (tested with Bitstream Vera)...
> 
> Thanks Philipp
_______________________________________________
cairo mailing list
cairo at cairographics.org
http://lists.cairographics.org/mailman/listinfo/cairo


_______________________________________________
cairo mailing list
cairo at cairographics.org
http://lists.cairographics.org/mailman/listinfo/cairo


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ftgrays_kiia7.h
Type: application/octet-stream
Size: 22216 bytes
Desc: not available
Url : http://lists.cairographics.org/archives/cairo/attachments/20070822/10390429/attachment-0001.obj 


More information about the cairo mailing list