[cairo] pixman: New ARM NEON optimizations

Soeren Sandmann sandmann at daimi.au.dk
Tue Oct 27 04:57:30 PDT 2009


Hi,

> > I think it's perfectly fine even if it is Linux specific at first;
> > people interested in other operating systems can feel free to send
> > patches.
> 
> I was more worried about what to do with the old NEON code. How do we even
> know whether it is even used (or will be used) by anyone on any non-linux
> systems?
> 
> I would probably even go as far as removing old NEON optimizations completely.
> They are available in 0.16.x versions of pixman and can be taken back into
> action if needed. Feedback from the users of Windows Mobile, Symbian and
> maybe some other systems running on ARM would be welcome.

I agree. It makes sense to remove them at least in the 0.17 series,
and if people complain, we can pull them back in.

> Yes, and even 24bpp can be supported, though it will add a lot more
> of conditionally compiled parts and clutter the code a bit.
> 
> 24bpp may be quite useful for accelerating some of the GDK stuff. I actually
> think about the idea of also reusing NEON graphics optimizations in different
> libraries like GDK, SDL and maybe something else in order to improve
> overall performance of the software in linux in general.

At some point it makes sense to make the pixman API available for
general consumption so that people can use it in cases like
those. This may require an API break first though, since parts of the
API are currently somewhat embarrassing (pixman_set_static_pointers(),
and pixman_blt()/fill() come to mind, but the 16 bit regions should
also eventually go away).

Some of those cases should probably just use cairo.

> > If possible, I think it would be useful to break down the
> > composite_composite_function macro into smaller bits, that mirror the
> > structure of the generated code, and then add a comment to the top of
> > each sub-routine. For example, a sub-macro for the initialization of
> > the registers plus setup, one for the left unligned part of the
> > scanline, one for the middle part, one for the right unaligned part,
> > one for moving to the next line, and one for doing the small rectangle
> > part.
> >
> > I think this would make it easier to grasp what is going on.
> 
> I was considering to add more comments there, but was not sure whether it
> makes much sense before all the features are implemented (like 24bpp
> support).

Breaking it down into submacros that mirror the structure of the
generated code would be useful before pushing this, I think.

> > Also, in various places, more specific comments would be useful, in
> > addition to the (generally good) highlevel comments that are already
> > there.
> >
> > For example,
> >
> > * I don't fully understand what the abits argument to the load/store
> >   functions is supposed to be doing. Does it have to do with masking
> >   in 0xff as appropriate? Part of this may be that I don't know what
> >
> >      [ &mem_operand&, :&abits& ]!
> 
> That's an alignment specifier. Load/store instructions with strictly
> specified alignment (128-bit or more) are a bit faster. And surely,
> memory address needs to be properly aligned, otherwise we get an
> exception.

Ok, that sense. I had gotten it into my head that "a" stood for alpha,
so I couldn't make heads or tail of it.

> > * If prefetch_distance is 0, shouldn't this macro not generate
> >   anything at all?
> 
> Well, this behavior is undefined at the moment :)

But in pixman_composite_src_n_8_asm_neon(), you do set it to 0?



Soren


More information about the cairo mailing list