[cairo] pixman: New ARM NEON optimizations

Siarhei Siamashka siarhei.siamashka at gmail.com
Tue Oct 13 15:13:17 PDT 2009


Hello,

This branch has new ARM NEON optimizations:
http://cgit.freedesktop.org/~siamashka/pixman/log/?h=arm-neon-update

It uses GNU assembler and its macro preprocessor to generate fast path
functions from a common template, so that only a minor part of the inner
loop code needs to be implemented for each function, saving a lot of work.
For example, the core function for performing x8r8g8b8->r5g6b5 conversion
uses as little as this much code:
http://cgit.freedesktop.org/~siamashka/pixman/commit/?h=arm-neon-update&id=b17297cf15122e5b38c082c9fe6f1ff708b7efa4

The template supports any kind of combinations of source, destination and mask
images with 8bpp, 16bpp and 32bpp color formats (24bpp support is a bit more
tricky, but can be done too). The code has comments which should help to get
the idea about how to implement new fast path functions. If comments are not
sufficient or not completely clear, I will try to update them.

The reasons to use GNU assembler are:
1. Full control over registers allocation (there are not too many of them, 
considering that up to 3 images are supported with their strides, pointers, 
prefetch stuff). I encountered problems running out of registers with inline
assembly and compiling with frame pointer.
2. This allows the use of more or less advanced macro preprocessor and makes
everything easier. A bit more flexible option would be to use JIT code
generation here (this is actually something to consider later).

Technically, there should be no problem catching up with SSE2, especially if
instructions scheduling perfection could be skipped at the first stage. Right
now only the existing NEON fast path functions are reimplemented plus just a
few more.

Now the thing to solve is how to handle the systems other than linux. There is
a potential problem with ABI compatibility - the functions must be fully
compatible with the calling conventions, etc. For now I'm only sure that
they are compatible with Linux EABI. Most likely the other systems should be
fine too, or will be fine with a few tweaks.

Benchmarks show that this new code generally has the same or substantially
better performance than the current pixman NEON fast path functions. Log
from cairo-perf is attached.

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
Speedups
========
image-rgba-???  paint-with-alpha_similar_rgba_over-512-0     11.14 (11.23 0.50%) ->   5.55 (5.74 1.46%):  2.01x speedup
image-rgb-???  one-rectangle_image_rgba_over-512-0     10.80 (10.83 0.49%) ->   5.55 (5.62 0.78%):  1.95x speedup
image-rgb-???  paint-with-alpha_image_rgba_over-512-0     11.14 (11.26 2.48%) ->   5.74 (5.92 1.01%):  1.94x speedup
image-rgba-???   paint_image_rgba_over-512-0     10.80 (10.86 0.21%) ->   5.58 (5.68 0.81%):  1.93x speedup
image-rgba-???  paint_similar_rgba_over-512-0     10.74 (10.77 0.51%) ->   5.62 (5.83 1.43%):  1.91x speedup
image-rgb-???  paint_similar_rgba_over-512-0     10.77 (10.83 0.42%) ->   5.71 (5.80 0.71%):  1.89x speedup
image-rgb-???   paint_image_rgba_over-512-0     10.77 (10.83 0.50%) ->   5.74 (5.74 0.61%):  1.88x speedup
image-rgba-???  one-rectangle_image_rgba_over-512-0     10.80 (10.86 0.35%) ->   5.80 (5.89 0.81%):  1.86x speedup
image-rgba-???  paint-with-alpha_image_rgba_over-256-0      2.59 (2.62 1.88%) ->   1.43 (1.47 0.96%):  1.81x speedup
image-rgba-???   paint_image_rgba_over-256-0      2.65 (2.69 2.32%) ->   1.53 (1.59 3.08%):  1.74x speedup
image-rgba-???  paint_similar_rgba_over-256-0      2.35 (2.41 1.79%) ->   1.37 (1.37 0.03%):  1.71x speedup
image-rgba-???  paint-with-alpha_similar_rgba_over-256-0      2.38 (2.44 2.43%) ->   1.40 (1.40 0.04%):  1.70x speedup
image-rgb-???   paint_image_rgba_over-256-0      2.26 (2.35 2.66%) ->   1.37 (1.43 1.49%):  1.65x speedup
image-rgb-???  paint-with-alpha_similar_rgba_over-256-0      2.44 (2.47 2.26%) ->   1.53 (1.56 2.57%):  1.60x speedup
image-rgb-???  paint-with-alpha_image_rgba_over-256-0      2.41 (2.50 2.43%) ->   1.53 (1.56 0.95%):  1.58x speedup
image-rgb-???  paint_similar_rgba_over-256-0      2.26 (2.35 2.03%) ->   1.43 (1.44 0.03%):  1.57x speedup
image-rgba-???  one-rounded-rectangle_solid_rgba_over-512-0     11.44 (11.54 0.40%) ->   8.09 (8.18 0.47%):  1.42x speedup
image-rgba-???  one-rounded-rectangle_solid_rgb_over-512-0     11.41 (11.57 0.56%) ->   8.18 (8.18 0.01%):  1.40x speedup
image-rgb-???  one-rectangle_similar_rgb_source-512-0      3.69 (3.72 1.06%) ->   3.27 (3.45 2.30%):  1.13x speedup
image-rgb-???  paint_similar_rgba_source-256-0      0.89 (0.89 1.40%) ->   0.79 (0.79 4.80%):  1.12x speedup
image-rgba-???  rectangles_similar_rgba_over-512-0     90.09 (90.76 0.70%) ->  81.18 (83.56 1.15%):  1.11x speedup
image-rgb-???  rectangles_similar_rgba_over-512-0     89.91 (91.43 0.69%) ->  81.60 (82.89 0.81%):  1.10x speedup
image-rgb-???  rectangles_similar_rgba_source-512-0     69.76 (70.98 0.99%) ->  64.64 (65.22 0.40%):  1.08x speedup
image-rgb-???  rectangles_image_rgba_source-512-0     69.55 (70.89 0.95%) ->  65.06 (65.55 0.46%):  1.07x speedup
image-rgba-???     fill_solid_rgb_over-256-0      3.66 (3.75 1.64%) ->   3.45 (3.54 1.56%):  1.06x speedup
image-rgba-???    fill_solid_rgba_over-256-0      3.66 (3.69 1.16%) ->   3.48 (3.48 1.12%):  1.05x speedup
Slowdowns
========
image-rgb-???  paint_solid_rgba_source-256-0      0.28 (0.28 0.00%) ->   0.30 (0.30 4.63%):  1.11x slowdown


More information about the cairo mailing list