[cairo] [PATCH] SSE2 support for pixman (v2)

Rodrigo Kumpera kumpera at gmail.com
Mon Mar 17 07:00:33 PDT 2008


Hi André,

Did you see why there are some big performance regressions between
perf-mmx-base-run4 and  perf-sse2-run4?

With cairo-perf-diff there are a few cases that are quite serious:

Slowdowns
=========
image-rgba  paint_image_rgba_mag_over-256    8.07 1.84% ->  12.86 0.25%:
1.59x slowdown
image-rgba     paint_solid_rgb_source-512    0.79 5.00% ->   1.04 2.54%:
1.46x slowdown
image-rgb    paint_similar_rgb_source-256    0.08 2.35% ->   0.11 0.64%:
1.35x slowdown
image-rgb        paint_image_rgb_over-256    0.09 3.44% ->   0.11 2.02%:
1.30x slowdown
image-rgb  paint-with-alpha_solid_rgb_source-256    4.83 0.03% ->   6.30
0.56%:  1.29x slowdown
image-rgb  paint-with-alpha_solid_rgba_source-256    4.84 0.02% ->   6.23
0.71%:  1.28x slowdown
image-rgb      paint_similar_rgb_over-256    0.09 2.14% ->   0.11 1.24%:
1.27x slowdown
image-rgba    paint_solid_rgba_source-512    0.79 5.12% ->   1.04 5.08%:
1.25x slowdown
image-rgba       paint_solid_rgb_over-512    0.80 4.64% ->   1.02 5.34%:
1.22x slowdown
image-rgba     paint_similar_rgb_over-256    0.21 0.69% ->   0.25 0.32%:
1.17x slowdown


I have a few observations about your patch:

@@ -1627,7 +1641,7 @@ get_fast_path (const FastPathInfo *fast_paths,

     if (!valid_mask)
         continue;
-
+

Introducing whitespace noise is not very desirable.

static FASTCALL void
coreCombineOverUsse2_8888x8888 (uint32_t* dst, const uint32_t* src, int
width, int reverse)

On this function you are using 8 __m128i variables, here on my gcc 4.2.1 it
generated some pretty bad code with stack spills. It's possible to get down
to 6 and avoid going to stack. Which is a really really a bad thing as gcc
is dumb enough to not align stack frames with sse spills causing spurious
unaligned faults.

Overall, I found that sse is not that much of a help for a Core 2 cpu, that
can sustain the same memory bandwidth with mmx code. The same cannot be said
for other models such as the P4, which gets a pretty good speedup.


Keep up the good work,
Rodrigo


On Sat, Mar 15, 2008 at 3:07 PM, André Tupinambá <andrelrt at gmail.com> wrote:

> Hi,
>
> This is a new version of SSE2 support for pixman.
>
> I change the save128WriteCombining to save128Aligned and implement a
> lot of other functions.
>
> In tar file there are some perf that I ran in a P4-3.2GHz machine, the
> results may vary in more powerful CPUs.
>
> Best Regards
>
> André Tupinambá
>
> _______________________________________________
> cairo mailing list
> cairo at cairographics.org
> http://lists.cairographics.org/mailman/listinfo/cairo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.cairographics.org/archives/cairo/attachments/20080317/69675447/attachment-0001.htm 


More information about the cairo mailing list