[cairo] Pixman refactoring, ARM and Altivec implementations needed

Sat May 30 08:54:28 PDT 2009

On Friday 29 May 2009 13:18:36 ext Jonathan Morton wrote:
> > And having something like cairo's 'make perf' with utils to diff the
> > results would be nice as well, I have enough beagleboards with compilers
> > installed to do nightly builds and performance tests. If the suite can
> > run pretty much unattended I could dedicate one beagleboard to it and
> > put the results online.
>
> We're using our own mx11mark tool to evaluate performance, but this
> relies on running an X11 server that itself depends on pixman.  For
> project-internal tests it would make more sense to call pixman handlers
> directly.
>
> Observation: blitters can perform very differently on framebuffer
> devices (which are usually uncached or at best write-through cached) and
> pixmaps (which are in cached main memory). 

Framebuffer typically has (or should have) write combining buffer enabled.
Older ARM devices did not have write-allocate cache, so even massive writes
to normal memory were bypassing cache typically. Unless one wants to read
from the framebuffer, performance is quite good even with the write combining
buffer only and no cache.

For ARM9E and ARM11 it was very important to arrange writes to memory so that 
4-register store multiple instructions wrote exactly to 16-byte aligned memory
areas. That ensured the use of burst writes and memory write throughput
increased significantly (roughly doubled).

I'm not quite sure about how to get the best memory performance on Cortex-A8
yet, something seems to be a bit fishy with it (is its L2 write-allocate
behavior spoiling the fun?).

> Any test suite that includes 
> performance evaluations should attempt to check both cases.

Agreed.

> Note that Xorg uses a shadowed framebuffer by default, which attempts to
> move the performance into cached memory, but has some nasty difficulties
> when the damage region becomes complex (this is something that could be
> fixed by preventing it from becoming complex at the expense of strict
> blit efficiency), and also consumes too much RAM to be worthwhile for
> most ARM devices (which would be harder to fix).

Pixman performance tests can use framebuffer directly for allocating some of
the images. That will eliminate the need to use X11 server for running basic
tests. Surely the screen will be screwed, but it is not always critical.

At the very start, even regularly running cairo's 'make perf' would help a
lot.

More interesting tests would include some typical pixman usage patterns.
Adding some hooks to pixman to collect this statistics is not so hard.
Focusing on real use cases will improve overall system performance.

As a random example, XFCE4 terminal (using my favorite 'terminus' font)
is extremely slow and the culprit is pixman. Pixman is just missing optimized
fastpath for ADD 1000x1000 operation. There are lots of similar examples.

Some operations must be specifically optimized for small image sizes (those
which typically work with font glyphs). Number of function calls in the chain
when selecting fast path and the level of unrolling in the implementation
itself makes a big difference for such cases too.

-- 
Best regards,
Siarhei Siamashka