[cairo] Pixman glyph performance, and beyond!
Soeren Sandmann
sandmann at daimi.au.dk
Thu Oct 22 18:37:39 PDT 2009
Chris Wilson <chris at chris-wilson.co.uk> writes:
> So I'm reviewing how cairo handles compositing, looking at how we may
> drive cairo-gl more efficiently. As part of that process, I've had the
> opportunity to remove some overhead from within cairo-image. However,
> glyph composition still suffers from substantial overhead since every
> glyph is composited separately.
The X server suffers from essentially the same problem, except worse
because each glyph is actually a new pixman_image_t, so there is
horrible overhead from malloc() etc. for each and every glyph.
> firefox-talos-gfx on a slow Celeron 600MHz:
> # Overhead Symbol
> # ........ ......
> 23.76% [.] _pixman_run_fast_path
> 23.34% [.] sse2_composite_add_n_8888_8888_ca
> 11.82% [.] sse2_composite_over_n_8888_8888_ca
> 6.31% [.] pixman_image_composite
> 4.69% [.] walk_region_internal
> 4.44% [.] pixman_blt_sse2
> 3.18% [.] _pixman_image_validate
> 2.30% [.] sse2_composite_over_n_8_8888
> 2.23% [.] pixman_compute_composite_region32
> 2.19% [.] pixman_fill_sse2
> 1.91% [.] sse2_composite
Are these percentages of the 46.35% below, so it is 23.76% of 46.35% =
11.02% for _pixman_run_fast_path?
Either way, it's clearly not good.
> (And to put it in perspective:
> 46.35% /usr/local/lib/libpixman-1.so.0.17.1
> 28.25% /home/ickle/src/cairo/src/.libs/libcairo.so.2.10905.0
> 14.24% [kernel])
>
> Søren has looked at this problem in the past and begun work on
> fast-path and faster-fast-path branches, looking to cache prior
> fast-path resolutions.
The latest incarnation of that work is the 'flags' branch here:
http://cgit.freedesktop.org/~sandmann/pixman/log/?h=flags
which contains several optimizations in this area.
Here is a summary of what's in it:
- It moves the computation of the various image properties out of the
get_fast_path() loop and replaces them bit masks that are much
faster to check.
- It turns general_composite() and fast_composite_scale_nearest() into
fast paths, so that all compositing goes through that path.
- It eliminates all the composite methods from pixman_implementation_t
- It adds a fast path cache
- It speeds up the operator 'strength reduction' that Antoine added a
long time ago, by storing the table more compactly and doing the
mapping in O(1) time.
I need to clean it up, break it into smaller bits, and send them to
the list for review.
> These are not yet as effective as one would hope.
It might be worthwhile rerunning the benchmark against that branch,
though I suspect there is still some overhead. Almost anything will
show up when the images are as small as glyphs are.
> How insane would it be to push the get_fast_path() to the user and to be
> able to pass in the implementation + composite function instead of
> performing the search every time? This would also be useful for spans.
> And considering how most cairo operations are first performed to a mask,
> cairo could very effectively cache the fast path for its most frequent
> operations.
I really think the fast paths need to be kept an implementation
detail, because exposing them would constrain what information about
the images you could rely on to compute the fast path.
For example, right now pixman does not rely on the alignment of the
image data when it selects the fast path. This means someone could
look up a fast path, then go on to use with several
differently-aligned images, which would mean pixman couldn't later on
add alignment optimizations.
However, I do agree that glyph compositing needs to become much faster
in both X and cairo, but I think that a better way would be to move
the Render glyph management code into pixman and expose a new
pixman_glyph_set_t
along with something like a pixman_composite_glyphs() similar to how
Render works. This would allow both cairo and X to become
substantially faster, while sharing glyph caching code.
For spans, I still think that a polygon image type in pixman is the
way to go, since again this would benefit both X and cairo. There
could certainly be a call to convert it into spans if that is useful
to other cairo backends, so that we wouldn't need to have two
rasterizers.
Thanks,
Soren
More information about the cairo
mailing list