[cairo] text measuring speed

Fri Jun 10 18:43:42 PDT 2005

I did some investigation as to why text measuring speed was so slow in
cairo.

I wrote a simple application that measured text performance, the inner
loop looks like:

    #define FIRST ' '
    #define LAST '~'
    for (ucs4 = FIRST; ucs4 <= LAST; ucs4++)
        ids[ucs4 - FIRST] = FcFreeTypeCharIndex (face, ucs4);

    for (rep = 0; rep < REPS; rep++) {
        for (ucs4 = FIRST; ucs4 <= LAST; ucs4++) {
            glyph.index = ids[ucs4 - FIRST];
            glyph.x = 0;
            glyph.y = 0;
            cairo_scaled_font_glyph_extents (scaled_font,
                                             &glyph,
                                             1,
                                             &extents);
        }
    }

I measure a single glyph because I expect every application using
cairo's glyph API to do the same.

with 'REPS' set to 10000, this makes 950000 measurements.

I did some profiling with gprof and discovered that almost all of the
CPU time was consumed in the conversion of the 
font space units returned by the font API into user space units provided
back to the application.  The four corners of each glyph bounding box
were transformed to user space and used to construct a user-space
bounding box for the string.

Here's the head of the flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 20.35      2.07     2.07   950000     0.00     0.01  cairo_scaled_font_glyph_extents
 16.57      3.76     1.69  5700001     0.00     0.00  cairo_matrix_transform_distance
 13.72      5.15     1.40   950302     0.00     0.00  _cairo_glyph_cache_hash
 11.21      6.29     1.14   950000     0.00     0.01  _cairo_ft_scaled_font_glyph_extents
  9.98      7.31     1.02  5700000     0.00     0.00  cairo_matrix_transform_point

This is about 70% of the total time, another 20% or so involves various
glyph cache manipulations.

So, an obvious optimization would involve caching metrics in user space
and avoiding the whole transformation adventure.
When I did that in the dumbest possible way, I saw a reasonable
improvement:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 63.93      0.90     0.90   950000     0.00     0.00  cairo_scaled_font_glyph_extents
 17.14      1.14     0.24                             main
 16.79      1.37     0.24   950000     0.00     0.00  _cairo_font_scaled_font_glyph

With this many calls, the overhead of -pg is significant, so I
recompiled everything 'normally' and now just measured with 'time'.
Unfortunately, the new code was 'too fast' to reliably measure this
way, so I increased 'REPS' to 100000 (10x):

Before:

 $ /usr/bin/time -p textperf
 real 11.35
 user 11.30
 sys 0.02

After:

 $ LD_LIBRARY_PATH=$HOME/src/cairo/cairo/src/.libs time -p ./textperf
 real 1.40
 user 1.38
 sys 0.01

So, this seems to 'solve' this particular performance problem, at the
cost of another cache.

Adding yet anothe cache seems like a bad plan; we really don't want to
duplicate information like this all over the system.  So, I asked what
the existing cache was used for and came up with some interesting
observations.

The cache is used in three places:

 1)	During extents computation.
 2)	From the gstate code to compute a bounding
	box around the set of glyphs.
 3)	From the rendering code to get the glyph images

With an upper level cache, 1) doesn't need a cache any
longer.  And, when using the Render extension, 3) doesn't need caching
as the glyphs are pushed across to the X server.
2) is used by gstate to create temporary surfaces when the backend
doesn't support glyphs directly and also when clipping to a mask.  It's
also used as the backend API includes these extents, but the Render
backend doesn't use them.

In other words, with the extents cached in user space, we really don't
need a lower level cache at all when using Render.  And, when not using
Render, we have the glyph images hanging around (which are huge), so
it's not a huge additional waste to keep device space metrics around.

The final place we need metrics is in the gstate layer when computing
the size of a temporary mask for mask-based clipping of glyphs.  I
suggest that we could just use the user-space metrics and convert those
to device space in this case; it's only an estimate after all, and
mis-estimating could be limited to the conversion expansion for a single
glyph.

Fixing the caching code to avoid caching glyphs when the upper levels
aren't interested in having glyphs cached would make this all
straightforward.

-keith

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.freedesktop.org/archives/cairo/attachments/20050610/e628dd06/attachment.pgp