[cairo] pixman: New ARM NEON optimizations

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Jan 14 16:56:40 PST 2010

On Monday 07 December 2009, Koen Kooi wrote:
> On 07-12-09 11:06, Siarhei Siamashka wrote:
> > On Wednesday 02 December 2009, Soeren Sandmann wrote:
> >> Siarhei Siamashka<siarhei.siamashka at gmail.com>  writes:
> >>> As you noticed earlier, software RENDER extension implementation in
> >>> xserver suffers from creating and destroying temporary pixman_image_t
> >>> structures for each operation in fbComposite function (PicturePtr and
> >>> pixman_image_t are practically duplicates of each other). But this is
> >>> not a good excuse to be wasteful regarding CPU cycles in pixman too. If
> >>> anything can be simplified and optimized even a bit with relatively
> >>> little efforts, probably this should be done. Or is it better to fix
> >>> xserver first and then look at pixman performance again?
> >>
> >> As long as the X server is creating and destroying images all the
> >> time, I don't think it makes a lot of sense to optimize pixman for
> >> tiny images.
> >
> > X server is an important pixman user, but it is not the only one. Cairo
> > with image backend is one of the examples.
> >
> > Removing delegates just:
> > 1. makes code smaller
> > 2. makes it a bit faster
> >
> > Here is a branch for delegates removal (for pixman_blt so far)
> > http://cgit.freedesktop.org/~siamashka/pixman/log/?h=no-delegates
> > This can be also easily done for pixman_fill and combiners.
> >
> > This issue definitely starts taking much more time than it is deserving
> > (it's not something critical, but just a kind of low hanging fruit). If
> > it's a no go and delegates are going to stay, then I'm done with it and
> > will stop spamming here.
> I'm still not convinced that runtime detection of cpu features gains us
> a lot on ARM. I would settle for a compile time option that collapses
> the delegate tree if turned on.

Collapsing delegate chain is possible with or without dropping runtime
detection of cpu features, that's what 'no-delegates' branch is doing.

The remaining thing would be a minor overhead introduced by the
check for 'imp' pointer on entry to pixman_blt (and pixman_fill,
pixman_image_composite too), but 'no-delegates' branch tries to take
care of it. But there is still one extra hop due to a call to the required
function ('pixman_blt_neon' or 'general_blt' on ARM at the moment) via a
pointer. This can be really handled by some ifdefs in the code if the
reduction of one indirect call will pay off the extra cruft.

> People wanting one-binary-to-rule-them-all can turn that option off and have
> runtime delegates.

People who don't want such build can also use --disable-arm-simd
and --disable-arm-neon options and reduce library size :)

Now it's mostly just a matter of selecting what configure options are used
by default. Surely defaults can be changed so that the people who want
one-binary-to-rule-them-all would have to explicitly use --enable-arm-simd
and --enable-arm-neon options. But for this to happen, we need a patch to
introduce such change and a reasonable explanation why it should differ from
the way this is done for x86 and ppc.

Best regards,
Siarhei Siamashka

More information about the cairo mailing list