[cairo] [PATCH/RFC][pixman] More ARM NEON performance updates
siarhei.siamashka at gmail.com
Tue Feb 23 04:01:20 PST 2010
On Friday 19 February 2010, Soeren Sandmann wrote:
> > > This all seems way too complex to me and implies that there would be an
> > > extra overhead introduced on every image creation.
> > >
> > > The branch 'fetch-r5g6b5-arm-neon' has much more simple solution and an
> > > extra overhead happens just once at setup time. It is not like CPU
> > > features are going to change at runtime
> > The main problem I have with it is that it causes ARM stuff to 'leak'
> > out outside of the ARM implementations. It introduces an undocumented
> > inter-dependency between implementations: they now have to be created
> > in a specific order, or they will overwrite each other's fetchers.
> Thinking some more about this, here is another proposal, which is less
> Add some new functions to the implementation struct. Something like
> fetch_scanline_t (* get_scanline_fetcher_32) (...)
> fetch_scanline_t (* get_scanline_fetcher_64 (...)
> store_scanline_t (* get_scanline_storer_32) (...)
> store_scanline_t (* get_scanline_storer_64) (...)
> By default these just delegate, and the general implementation will
> return the fetchers that are now being set up in the various
> property_changed() functions. The property_changed() functions are
> extended to also take an implementation argument, so that they can
> call the new functions to get the fetchers and storers.
> This allows CPU specific fetchers for both sampled images and
> gradients, while not being excessively complex.
> The downside is that some reorganisation of the code is required.
> The existing fetchers would now conceptually be part of the general
> implementation instead of being part of the images. Some fetchers
> would probably belong in the fast path implementation.
> This means that the image files (pixman-linear-gradient.c etc.) would
> have to be renamed to pixman-general-linear-gradient.c and their
> constructors moved to pixman-image.c.
> There is probably more reorganisation required, but hopefully it would
> be mostly a matter of shuffling code around.
I'm afraid, I can't provide much constructive feedback at the moment. No
matter what decision is made about the way how to plug CPU specific
optimizations for fetchers into pixman code, the practical performance
difference is going to be very small.
But CPU specific optimizations for the fetchers are definitely needed.
More information about the cairo