Hi André, Did you see why there are some big performance regressions between perf-mmx-base-run4 and  perf-sse2-run4? With cairo-perf-diff there are a few cases that are quite serious: Slowdowns ========= image-rgba  paint_image_rgba_mag_over-256    8.07 1.84% ->  12.86 0.25%:  1.59x slowdown image-rgba     paint_solid_rgb_source-512    0.79 5.00% ->   1.04 2.54%:  1.46x slowdown image-rgb    paint_similar_rgb_source-256    0.08 2.35% ->   0.11 0.64%:  1.35x slowdown image-rgb        paint_image_rgb_over-256    0.09 3.44% ->   0.11 2.02%:  1.30x slowdown image-rgb  paint-with-alpha_solid_rgb_source-256    4.83 0.03% ->   6.30 0.56%:  1.29x slowdown image-rgb  paint-with-alpha_solid_rgba_source-256    4.84 0.02% ->   6.23 0.71%:  1.28x slowdown image-rgb      paint_similar_rgb_over-256    0.09 2.14% ->   0.11 1.24%:  1.27x slowdown image-rgba    paint_solid_rgba_source-512    0.79 5.12% ->   1.04 5.08%:  1.25x slowdown image-rgba       paint_solid_rgb_over-512    0.80 4.64% ->   1.02 5.34%:  1.22x slowdown image-rgba     paint_similar_rgb_over-256    0.21 0.69% ->   0.25 0.32%:  1.17x slowdown I have a few observations about your patch: @@ -1627,7 +1641,7 @@ get_fast_path (const FastPathInfo *fast_paths,        if (!valid_mask)          continue; -    + Introducing whitespace noise is not very desirable. static FASTCALL void coreCombineOverUsse2_8888x8888 (uint32_t* dst, const uint32_t* src, int width, int reverse) On this function you are using 8 __m128i variables, here on my gcc 4.2.1 it generated some pretty bad code with stack spills. It's possible to get down to 6 and avoid going to stack. Which is a really really a bad thing as gcc is dumb enough to not align stack frames with sse spills causing spurious unaligned faults. Overall, I found that sse is not that much of a help for a Core 2 cpu, that can sustain the same memory bandwidth with mmx code. The same cannot be said for other models such as the P4, which gets a pretty good speedup. Keep up the good work, Rodrigo <div class="gmail_quote">On Sat, Mar 15, 2008 at 3:07 PM, André Tupinambá <<a href="mailto:andrelrt@gmail.com">andrelrt@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hi, This is a new version of SSE2 support for pixman. I change the save128WriteCombining to save128Aligned and implement a lot of other functions. In tar file there are some perf that I ran in a P4-3.2GHz machine, the results may vary in more powerful CPUs. Best Regards André Tupinambá _______________________________________________ cairo mailing list <a href="mailto:cairo@cairographics.org">cairo@cairographics.org</a> <a href="http://lists.cairographics.org/mailman/listinfo/cairo" target="_blank">http://lists.cairographics.org/mailman/listinfo/cairo</a> </blockquote></div>