[cairo] pixman: New ARM NEON optimizations

Siarhei Siamashka siarhei.siamashka at gmail.com
Wed Nov 4 08:36:21 PST 2009

On Tuesday 27 October 2009, Soeren Sandmann wrote:
> > I would probably even go as far as removing old NEON optimizations
> > completely. They are available in 0.16.x versions of pixman and can be
> > taken back into action if needed. Feedback from the users of Windows
> > Mobile, Symbian and maybe some other systems running on ARM would be
> > welcome.
> I agree. It makes sense to remove them at least in the 0.17 series,
> and if people complain, we can pull them back in.


> > Yes, and even 24bpp can be supported, though it will add a lot more
> > of conditionally compiled parts and clutter the code a bit.
> >
> > 24bpp may be quite useful for accelerating some of the GDK stuff. I
> > actually think about the idea of also reusing NEON graphics optimizations
> > in different libraries like GDK, SDL and maybe something else in order to
> > improve overall performance of the software in linux in general.
> At some point it makes sense to make the pixman API available for
> general consumption so that people can use it in cases like
> those. This may require an API break first though, since parts of the
> API are currently somewhat embarrassing (pixman_set_static_pointers(),
> and pixman_blt()/fill() come to mind, but the 16 bit regions should
> also eventually go away).

It might be not API break but extension. A function which would get 
information about source, mask and destination image formats and some
other hints (like solid mask or source) and just return a pointer to
a simple directly callable function would do this job. It can return
NULL if and optimized implementation is not available, or even generate
function code using JIT if it is able to do this.

In any case, simple functions which use only basic data types as arguments
(nothing like pixman_image_t) and has no checks for clipping or whatever
else can be used from the other libraries efficiently.

> Some of those cases should probably just use cairo.

Yes, but not in all cases. If some library does not support image clipping and
expects the client application to provide valid data, doing this validation
just wastes CPU cycles.

> > > If possible, I think it would be useful to break down the
> > > composite_composite_function macro into smaller bits, that mirror the
> > > structure of the generated code, and then add a comment to the top of
> > > each sub-routine. For example, a sub-macro for the initialization of
> > > the registers plus setup, one for the left unligned part of the
> > > scanline, one for the middle part, one for the right unaligned part,
> > > one for moving to the next line, and one for doing the small rectangle
> > > part.
> > >
> > > I think this would make it easier to grasp what is going on.
> >
> > I was considering to add more comments there, but was not sure whether it
> > makes much sense before all the features are implemented (like 24bpp
> > support).
> Breaking it down into submacros that mirror the structure of the
> generated code would be useful before pushing this, I think.

Did it. At least splitted some parts of code which are more or less
logically independent.

> > > * If prefetch_distance is 0, shouldn't this macro not generate
> > >   anything at all?
> >
> > Well, this behavior is undefined at the moment :)
> But in pixman_composite_src_n_8_asm_neon(), you do set it to 0?

This is also addressed now.

Here is an update for pixman NEON optimizations based on the comments:

Changes also include:
* support for 24bpp pixel formats
* choice between 'none', 'simple' and 'advanced' prefetch methods
* old NEON code dropped, also it eliminates the problem with float ABI
selection controversy
* more comments

Also automated tests pass and updated pixman works fine on real device without
visible problems.

Things to do next:
* finetune performance for some of the functions
* add GDK pixbuf related fast path functions
* try to reduce dispatch overhead in pixman-arm-neon.c (maybe adding some
probably autogenerated code with if/else/switch/case mess in
arm_neon_composite to decide which NEON fast path function can be used instead
of calling _pixman_run_fast_path can reduce a bit of overhead).

Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.cairographics.org/archives/cairo/attachments/20091104/17a11f9a/attachment.pgp 

More information about the cairo mailing list