[cairo] pixman: New ARM NEON optimizations

Tue Oct 27 07:23:09 PDT 2009

On 27-10-09 15:13, Jacob Bramley wrote:
>>> Right, instruction scheduling is the main disadvantage to using the
>>> assembler as opposed to intrinsics. Cortex A9 is out-of-order I
>>> believe, so it will have different scheduling requirements than the
>>> A8, which in turn likely has different scheduling requirements than
>>> earlier in-order CPUs. Though, a reasonable approach might be to
>>> assume that the A9 will not be very sensitive to scheduling at all
>>> (due to it being out-of-order), and then simply optimize for the A8.
>>
>> A9 has a big downside though: the NEON block was cut-down to make place
>> for a full VFP block instead of the VFPlite block in the A8 cores, so
>> using NEON will be slower compared to A8. Hopefully the out-of-order
>> will make up for it, but pure NEON code will take a hit.
>
> Where did you hear this? A8 does indeed have VFPlite (as opposed to A9's full VFP
> implementation), but A9 certainly doesn't have a cut-down NEON.

I keep hearing it on various places in the internet, but apparently I'm 
wrong. Laurent went on record and said "the A9 NEON unit is the exact 
same as the A8 one"

> Something to be aware of, however, is that A9 can't issue a VFP instruction until the previous
> NEON instruction has cleared out of the pipeline (and vice-versa), so interleaving NEON and VFP
> really hurts on A9 but didn't really make a huge difference on A8.

I guess that's where the confusing stems from. Let's hope google indexes 
this thread so the truth will surface :)

regards,

Koen