[cairo] Pixman glyph performance, and beyond!

Fri Oct 23 01:49:40 PDT 2009

Excerpts from Soeren Sandmann's message of Fri Oct 23 02:37:39 +0100 2009:
> The latest incarnation of that work is the 'flags' branch here:
> 
>     http://cgit.freedesktop.org/~sandmann/pixman/log/?h=flags
> 
> which contains several optimizations in this area.

[snip]

> It might be worthwhile rerunning the benchmark against that branch,
> though I suspect there is still some overhead. Almost anything will
> show up when the images are as small as glyphs are.

Very effective, Søren, it eliminated the get_fast_path() overhead entirely:

  32.84%  [.] sse2_composite_add_n_8888_8888_ca
  17.13%  [.] sse2_composite_over_n_8888_8888_ca
  15.98%  [.] pixman_image_composite
   5.78%  [.] pixman_blt_sse2
   5.40%  [.] _pixman_image_validate
   3.98%  [.] pixman_compute_composite_region32
   2.12%  [.] pixman_fill_sse2

It looks like it's been absorbed into pixman_image_composite(), but the
runtime improved by over 10% -- indicative that the lookup overhead was
eliminated. Though there is still around 25% to be recovered.

> I really think the fast paths need to be kept an implementation
> detail, because exposing them would constrain what information about
> the images you could rely on to compute the fast path. 
> 
> For example, right now pixman does not rely on the alignment of the
> image data when it selects the fast path. This means someone could
> look up a fast path, then go on to use with several
> differently-aligned images, which would mean pixman couldn't later on
> add alignment optimizations.

I think you've effectively demonstrated that the overhead from selecting
the fast path should be negligible. So we should move on to the question
of how to push large batches of work to pixman efficiently.

> However, I do agree that glyph compositing needs to become much faster
> in both X and cairo, but I think that a better way would be to move
> the Render glyph management code into pixman and expose a new
> 
>         pixman_glyph_set_t
> 
> along with something like a pixman_composite_glyphs() similar to how
> Render works. This would allow both cairo and X to become
> substantially faster, while sharing glyph caching code.
> 
> For spans, I still think that a polygon image type in pixman is the
> way to go, since again this would benefit both X and cairo. There
> could certainly be a call to convert it into spans if that is useful
> to other cairo backends, so that we wouldn't need to have two
> rasterizers.

I'm actually not so convinced that this the direction that pixman should
be going in. From my perspective cairo requires specific path -> backend
geometry converters, and a polygon rasteriser with a span line interface
has quickly become the default method for pushing masks around. Whereas
traps have been relegated to mostly handling boxes, aside from when the
most efficient wire request we have available is CompositeTraps. (Has
anyone else noticed that the RLE mask for curved geometry is often an
order of magnitude smaller than the equivalent set of trapezoids,
almost as small as the original path?) Similarly, I'd rather not add the
overhead of an independent layer of glyph management. With that bias,
I'd prefer that pixman retained its focus on pixel manipulation routines
and we improve the interfaces for performing large sets of similar
operations.

One issue that we will encounter very soon is the pain caused by forcing
the user to emit cairo_show_glyphs() early for each change in font. This
can be fixed up in the backends that batch requests and use a
consolidated glyph atlas (i.e. there is no level state change and so the
geometry is just accumulated onto the previous operation). [There is
still substantial overhead from cairo doing the analysis on the extra
operations.] Similarly we can move away from an immediate mode, direct
access, pixman - and treat pixman more like a GPU, if it is performant.
-ickle
-- 
Chris Wilson, Intel Open Source Technology Centre