[cairo] Performance of the refactored Pixman

Mon Jun 15 07:59:13 PDT 2009

Jonathan Morton <jonathan.morton at movial.com> writes:

> I've had a unique opportunity to compare the performance of the
> refactored Pixman with an older version, using an identical set of
> blitters and other overhead improvements (some forward-ported, some
> back-ported).
> 
> Simply put, the refactored Pixman is consistently slower.

How much slower, and how did you measure it? Siarhei is right that we
need a correctness and performance test suite as part of pixman itself
so that the performance claims floating around can be quantified in a
reproducible way.

> An extra parameter has been added to this standardised block, and
> several of the others have been doubled in size.  Because these
> parameters are on the stack, they have to be copied for each call.

What do you mean by doubled in size? Does ARM calling conventions not
call for extending 16 bit parameters to 32 bit?

> The hurt is particularly bad on small requests.  Browsers can do a lot
> of one-pixel trapezoids and glyph strings, the latter requiring a pixman
> call for each individual glyph as well as for the whole string.  The
> extra overhead can therefore remove up to 40% of the performance,
> compared to an un-refactored version with the same mallocectomies and
> blitters.

Where does 40% come from? And percent of what, specifically?

I agree that having the ability to composite multiple glyphs in one go
may be worthwhile -- I have certainly seen overhead from the X call
chain show up on profiles.

> My big suggestion is to collapse these huge parameter blocks into a
> structure, which can then be passed by-reference up the chain.  This
> would reduce the call overhead to two parameters, which will fit in
> registers and therefore do not necessarily have to be copied.

If you can demonstrate a performance benefit, I'd probably take a
patch that replaced the parameter block with

        const pixman_composite_args_t *args

or something like that.

> Along related but distinct lines, I'm greatly in favour of a dedicated
> "overlappable, unscaled copy" function in Pixman for scrolling support.
> The call chain overhead is utterly killing performance for XCopyArea at
> the moment.  Failing that, dedicated single-scanline get/put functions
> would probably be an improvement, internally as well as externally.

Which call chain specifically? XCopyArea() sometimes ends up in
pixman_blt(), but never in pixman_image_composite(). Scrolling zoomed
pages with Firefox involves a lot of compositing with scaled/nearest
images; if that's what you are seeing, Siarhei's patches may help.

As I have said before, moving the XCopyArea() implementation along
with the rest of the fb code from X into pixman would make sense for a
number of reasons, so I'd encourage work on that.

Soren