[cairo] cairo-gl glyph rendering performance
Alexandros Frantzis
alexandros.frantzis at linaro.org
Tue Apr 19 15:30:58 PDT 2011
Hi all!
I have been investigating the cairo-gl glyph implementation to see if we
can improve the glyph rendering performance.
I have found that one source of performance loss is the overzealous selection
of the "via mask" path when rendering glyphs. When using the "via mask" path,
glyphs are first rendered to a temporary surface which is then used as a mask
to draw the glyphs on the final destination.
In the current code, one of the reasons to use the mask path is because the
glyphs overlap. Is this valid? I haven't figured out the technical reason for
this, so I may be missing something, but the following seems strange: let's
say we have two glyphs that actually overlap and we want to draw them. In the
present situation, if we draw the glyphs using a single glyph group they will
be drawn via a mask. If we draw them separately (in two groups), each will be
drawn using the normal path. Why is one more correct than the other? Do
glyphs belonging in the same group have some special connection?
In any case, the overlap detection test as implemented in
_cairo_scaled_font_glyph_device_extents() is not suited for our needs for two
reasons:
1. The overlap detection algorithm checks the extents of each glyph against the
current total extent of previously processed glyphs. This works fine as long
as the glyph group is limited to a single line and drawn sequentially.
However, for multi-line glyph groups or for groups with out of order glyphs,
a false "overlap" is always detected. The ASCII figure below shows
why:
--------- --------- ---------
|A B C D| => |A B C D| => |A B C D| False Overlap when checking X!
--------- |E | |E X |
--------- ---------
2. Due to font kerning, glyphs extents are often found to be overlapping,
although the glyphs themselves are not actually overlapping.
In real life applications (that use some high level layout library (eg pango))
issue (1) doesn't seem to be very frequent, as the glyph groups that are passed
to cairo are usually limited to single lines with glyphs rendered sequentially.
On the other hand, it is quite common to have at least one overlap per glyph
group because of kerning and therefore the "via mask" path is selected much
more often than it should.
As an experiment, I commented out the "use_mask |= overlap" line in
_cairo_gl_surface_show_glyphs() and measured the performance difference, to get
a feeling of the improvements we can get:
a. For r600g Mesa git (GLX_MESA_multithread_makecurrent):
firefox-talos-gfx gnome-terminal-vim poppler firefox-planet-gnome
overlap 54.073 15.656 9.421 83.339
no-overlap 17.489 13.480 3.051 74.192
b. For i965 Mesa git (GLX_MESA_multithread_makecurrent):
firefox-talos-gfx gnome-terminal-vim poppler firefox-planet-gnome
overlap 37.167 7.556 7.045 44.158
no-overlap 36.439 7.064 3.253 42.432
c. For r600g-gles2 Mesa git:
firefox-talos-gfx gnome-terminal-vim poppler firefox-planet-gnome
overlap 71.671 17.747 13.133 96.790
no-overlap 21.794 16.624 3.693 86.268
d. For i965 Mesa 7.10.2:
firefox-talos-gfx gnome-terminal-vim poppler firefox-planet-gnome
overlap 62.069 10.806 13.451 -
no-overlap 42.275 7.895 4.546 -
Judging from the results above, it seems that one of the main benefits of
avoiding the "via mask" path is reducing the cost of glx/egl context switches
(the ones that are happening because we have to change the target surface to
draw on the mask). The GLX_MESA_multithread_makecurrent helps a lot with this
(as expected) as can be seen by samples (a) and (b). Still, avoiding the "via
mask" path for overlaps (no-overlap) even when taking advantage of this
extension, offers significant improvements in some cases (eg r600g
firefox-talos-gfx, poppler) and smaller but still nice improvements in the
rest.
For EGL/GLES2 and GLX implementations that doesn't currently have an extension
similar to GLX_MESA_multithread_makecurrent, avoiding the "via mask" path is
even more important as can been seen by samples (c) and (d). All in all I think
it is worthwhile to investigate how to minimize the usage of the "via mask" path.
The important question here is how can actually achieve using the "via mask"
path less. Can we remove the overlap factor completely? Assuming that not using
a mask is wrong, how wrong are the results going to be? If the visual
difference is small enough perhaps we can make this compromise to increase
performance (or use an environment variable and leave it to the user to force
the fast behavior).
If we need to keep the overlap test, we have to solve issue (1) and especially
(2). Although I can at least imagine a way to tackle (1), I have no idea of
how to solve (2) in an efficient manner without extra information given to the
backend by eg pango.
I am looking forward to comments and to corrections of any misconceptions
I have!
Thanks,
Alexandros
More information about the cairo
mailing list