[cairo] pixman: New ARM NEON optimizations
k.kooi at student.utwente.nl
Mon Oct 26 10:37:20 PDT 2009
On 26-10-09 17:14, Soeren Sandmann wrote:
>> Technically, there should be no problem catching up with SSE2,
>> especially if instructions scheduling perfection could be skipped at
>> the first stage. Right now only the existing NEON fast path
>> functions are reimplemented plus just a few more.
> Right, instruction scheduling is the main disadvantage to using the
> assembler as opposed to intrinsics. Cortex A9 is out-of-order I
> believe, so it will have different scheduling requirements than the
> A8, which in turn likely has different scheduling requirements than
> earlier in-order CPUs. Though, a reasonable approach might be to
> assume that the A9 will not be very sensitive to scheduling at all
> (due to it being out-of-order), and then simply optimize for the A8.
A9 has a big downside though: the NEON block was cut-down to make place
for a full VFP block instead of the VFPlite block in the A8 cores, so
using NEON will be slower compared to A8. Hopefully the out-of-order
will make up for it, but pure NEON code will take a hit.
More information about the cairo