[cairo] [PATCH] cairo-gl: Make VBO size run-time settable

Fri Aug 30 02:11:12 PDT 2013

On Thu, Aug 29, 2013 at 06:56:21PM -0400, Behdad Esfahbod wrote:
> On 13-08-29 01:55 PM, Bryce W. Harrington wrote:
> > Chris, btw, sort of an aside question...
> > 
> > As I've been running various performance tests for each of the GL
> > compositors, I am noticing that spans and traps basically have identical
> > performance (any differences are in the noise).  I'm aware of the
> > implementational differences between the two, and I've expected to see
> > spans perform better than traps on at least a few of these tests, but
> > nothing so far.  I'm guessing the tests simply aren't exercising spans'
> > talents, or I'm not running the right tests.

This is what I measured on one of my systems:

old: gl-traps
new: gl-spans
Speedups
========
   gl           firefox-fishtank: 55.16x speedup
   gl             grads-heat-map: 16.03x speedup
   gl             firefox-canvas: 12.33x speedup
   gl         swfdec-giant-steps: 11.99x speedup
   gl       firefox-canvas-alpha: 11.55x speedup
   gl         firefox-chalkboard:  8.96x speedup
   gl          firefox-paintball:  7.02x speedup
   gl               firefox-tron:  6.93x speedup
   gl       gnome-system-monitor:  6.10x speedup
   gl          firefox-particles:  5.63x speedup
   gl           firefox-fishbowl:  5.43x speedup
   gl          firefox-talos-svg:  5.41x speedup
etc.

> Speaking of which, Chris, can you explain to those of us not following cairo
> closely these days how all the various new compositors work?

The difference between the compositors of cairo-1.12 and the single
trapezoid compositor of cairo-1.0 is that are more of them! The surface
backends have to plug directly into the high level surface API (the old
low level compositor API is removed) and explicitly decide how they want
to render each individual operation. We have a few common strategies,
the trapezoid compositor (based on the original Xrender approach), the
spans scanline compositor (efficient for image based software
rendering), and a "mask" compositor (where the backend can render the
various channels separator and the horrible logic of combining mask with
the clip with the source onto the destination is handled by the
compositor).

For example, with cairo-gl it will first use its msaa compositor,
falling back to the spans compositors, and then to a mask compositor
(with a stage for glyphs to use the code from the traps compositor),
with a final fallback to the CPU.

Since each compositor receives the high level state at each stage, we
are not restricted by any of the decisions at an earlier point. (Which
was an issue with the previous lowlevel approach that couldn't use spans
for rendering an image fallback after it hit the trapezoid paths etc).
The compositor pipeline does a few computations upfront (for computing
extents and pattern reductions which were at the time common for all,
but now bypassed for msaa) and then walks a delegate chain of
compositor function tables, calling the operation on each until it is
claimed. (So like the pixman approach we also suffer from static
assignment of the best technique not always being hit first.) Each
compositor function in that chain inspects the state (is the path
rectangular? do we have a complex clip? can I handle the operator?) and
if it finds the operation acceptable begins to process it. For the
common compositors, it will break the operation down into various
lowlevel callbacks that the backend provides (which often then use
another library function to implement with further callbacks e.g. span
rendering.) The stacks build very quickly with obvious overhead for
trivial operations, so typically the backend may check upfront for the
simplest operations and do them directly.

The basic idea is that each backend is free to build a pipeline of how
to render any operation, with the common stages being helper functions.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre