[cairo] [PATCH] SSE2 support for pixman (v2)
kumpera at gmail.com
Mon Mar 17 07:00:33 PDT 2008
Did you see why there are some big performance regressions between
perf-mmx-base-run4 and perf-sse2-run4?
With cairo-perf-diff there are a few cases that are quite serious:
image-rgba paint_image_rgba_mag_over-256 8.07 1.84% -> 12.86 0.25%:
image-rgba paint_solid_rgb_source-512 0.79 5.00% -> 1.04 2.54%:
image-rgb paint_similar_rgb_source-256 0.08 2.35% -> 0.11 0.64%:
image-rgb paint_image_rgb_over-256 0.09 3.44% -> 0.11 2.02%:
image-rgb paint-with-alpha_solid_rgb_source-256 4.83 0.03% -> 6.30
0.56%: 1.29x slowdown
image-rgb paint-with-alpha_solid_rgba_source-256 4.84 0.02% -> 6.23
0.71%: 1.28x slowdown
image-rgb paint_similar_rgb_over-256 0.09 2.14% -> 0.11 1.24%:
image-rgba paint_solid_rgba_source-512 0.79 5.12% -> 1.04 5.08%:
image-rgba paint_solid_rgb_over-512 0.80 4.64% -> 1.02 5.34%:
image-rgba paint_similar_rgb_over-256 0.21 0.69% -> 0.25 0.32%:
I have a few observations about your patch:
@@ -1627,7 +1641,7 @@ get_fast_path (const FastPathInfo *fast_paths,
Introducing whitespace noise is not very desirable.
static FASTCALL void
coreCombineOverUsse2_8888x8888 (uint32_t* dst, const uint32_t* src, int
width, int reverse)
On this function you are using 8 __m128i variables, here on my gcc 4.2.1 it
generated some pretty bad code with stack spills. It's possible to get down
to 6 and avoid going to stack. Which is a really really a bad thing as gcc
is dumb enough to not align stack frames with sse spills causing spurious
Overall, I found that sse is not that much of a help for a Core 2 cpu, that
can sustain the same memory bandwidth with mmx code. The same cannot be said
for other models such as the P4, which gets a pretty good speedup.
Keep up the good work,
On Sat, Mar 15, 2008 at 3:07 PM, André Tupinambá <andrelrt at gmail.com> wrote:
> This is a new version of SSE2 support for pixman.
> I change the save128WriteCombining to save128Aligned and implement a
> lot of other functions.
> In tar file there are some perf that I ran in a P4-3.2GHz machine, the
> results may vary in more powerful CPUs.
> Best Regards
> André Tupinambá
> cairo mailing list
> cairo at cairographics.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cairo