[cairo] pixman: New ARM NEON optimizations

Thu Jan 14 17:27:52 PST 2010

On Thursday 10 December 2009, Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
[...]
> > Here is a branch for delegates removal (for pixman_blt so far)
> > http://cgit.freedesktop.org/~siamashka/pixman/log/?h=no-delegates
> > This can be also easily done for pixman_fill and combiners.
[...]
> I think I should probably have called them 'fallbacks' instead of
> delegates. 'Delegates' sounds like some over-abstracted Enterprise
> Design Pattern Disaster, which I don't think is the case here.

The thing I don't like the most in the current code is the existence
of 'delegate_blt' function, it seems to be redundant to me. Any
implementation can just copy function pointer from a fallback
implementation at setup time, eliminating an extra hop through
'delegate_blt' at runtime. Just like it is done for 'pixman-vmx.c'
in 'no-delegates' branch for example.

> The whole point of the implementation/delegate mechanism is to allow
> falling back from more specific implementations to more generic
> ones. None of the current blt operations actually make use of this
> right now, but it is easy to imagine that you could write an 8 bit
> generic blt for the fast path implementation that you would then fall
> back to from the architecture specific ones.

Just a direct call to generic blt will run faster and require less lines of
code. Due to the way CPUs are designed (backwards compatibility is usually
preserved), in vast majority of cases this path through various fallbacks is
static and can be constructed at compile time. The only example where it may
be needed to be handled in a dynamic way could be something like the following
fallback chain:

SSE -> 3DNOW -> MMX -> generic C

In this case 3DNOW may need to be omitted from the chain based on runtime cpu
features detection. But fortunately(?) pixman does not use 3DNOW optimizations
at the moment and seems like all modern AMD processors support SSE nowadays,
making 3DNOW somewhat obsolete.

Ironically, something like this can also theoretically happen for ARM
(regarding IWMMXT vs. NEON).

But a more dynamic way of handling callbacks can be introduced at the same
time as such need arises. IMHO introducing something that may be theoretically
useful in the future is not a very good idea. Just such future may actually
never happen at all.

> If you have a better way to deal with fallbacks than what we have now,
> I'm listening. But inlining a generic implementation in every
> architecture specific operation is not the answer, given that this
> really is not a huge performance issue.

'no-delegates' branch does not have such inlining. That's another change
done for other reasons in a different branch :)

-- 
Best regards,
Siarhei Siamashka