[cairo] [Pixman] pixman: New ARM NEON optimizations
jonathan.morton at movial.com
Tue Feb 16 13:36:53 PST 2010
> The biggest surprise here is the pathologically bad performance of 'memset'
> function in 'image' backend tests, especially for 'evolution' benchmark. My
> only guess is that glibc could have probably messed up with the caches somehow
> (maybe by improperly using nontemporal memory writes or something).
A quick check of glibc sources suggests that MOVNTIQ (the non-temporal
64-bit write) can indeed be used under at least some circumstances.
It's not immediately clear *which* circumstances, since there's a lot
of assembler in that file and I'm not used to x86 assembly.
It's all very well to avoid "cache pollution", but with a
general-purpose function like memset() it's not at all clear that
keeping a freshly zeroed buffer out of cache is a good idea. I
actually have a project on a different subject where attempting to
ensure the opposite is desirable. Especially with today's enormous L3
caches. Somebody please take a cluestick to the GNU folks.
The plain-C implementation of memset looks sane though.
More information about the cairo