Hi André,<br><br>Did you see why there are some big performance regressions between perf-mmx-base-run4 and perf-sse2-run4?<br><br>With cairo-perf-diff there are a few cases that are quite serious:<br><br>Slowdowns<br>=========<br>
image-rgba paint_image_rgba_mag_over-256 8.07 1.84% -> 12.86 0.25%: 1.59x slowdown<br>image-rgba paint_solid_rgb_source-512 0.79 5.00% -> 1.04 2.54%: 1.46x slowdown<br>image-rgb paint_similar_rgb_source-256 0.08 2.35% -> 0.11 0.64%: 1.35x slowdown<br>
image-rgb paint_image_rgb_over-256 0.09 3.44% -> 0.11 2.02%: 1.30x slowdown<br>image-rgb paint-with-alpha_solid_rgb_source-256 4.83 0.03% -> 6.30 0.56%: 1.29x slowdown<br>image-rgb paint-with-alpha_solid_rgba_source-256 4.84 0.02% -> 6.23 0.71%: 1.28x slowdown<br>
image-rgb paint_similar_rgb_over-256 0.09 2.14% -> 0.11 1.24%: 1.27x slowdown<br>image-rgba paint_solid_rgba_source-512 0.79 5.12% -> 1.04 5.08%: 1.25x slowdown<br>image-rgba paint_solid_rgb_over-512 0.80 4.64% -> 1.02 5.34%: 1.22x slowdown<br>
image-rgba paint_similar_rgb_over-256 0.21 0.69% -> 0.25 0.32%: 1.17x slowdown<br><br><br>I have a few observations about your patch:<br><br>@@ -1627,7 +1641,7 @@ get_fast_path (const FastPathInfo *fast_paths,<br>
<br> if (!valid_mask)<br> continue;<br>- <br>+<br><br>Introducing whitespace noise is not very desirable.<br><br>static FASTCALL void<br>coreCombineOverUsse2_8888x8888 (uint32_t* dst, const uint32_t* src, int width, int reverse)<br>
<br>On this function you are using 8 __m128i variables, here on my gcc 4.2.1 it generated some pretty bad code with stack spills. It's possible to get down to 6 and avoid going to stack. Which is a really really a bad thing as gcc is dumb enough to not align stack frames with sse spills causing spurious unaligned faults.<br>
<br>Overall, I found that sse is not that much of a help for a Core 2 cpu, that can sustain the same memory bandwidth with mmx code. The same cannot be said for other models such as the P4, which gets a pretty good speedup.<br>
<br><br>Keep up the good work,<br>Rodrigo<br><br><br><div class="gmail_quote">On Sat, Mar 15, 2008 at 3:07 PM, André Tupinambá <<a href="mailto:andrelrt@gmail.com">andrelrt@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi,<br>
<br>
This is a new version of SSE2 support for pixman.<br>
<br>
I change the save128WriteCombining to save128Aligned and implement a<br>
lot of other functions.<br>
<br>
In tar file there are some perf that I ran in a P4-3.2GHz machine, the<br>
results may vary in more powerful CPUs.<br>
<br>
Best Regards<br>
<font color="#888888"><br>
André Tupinambá<br>
</font><br>_______________________________________________<br>
cairo mailing list<br>
<a href="mailto:cairo@cairographics.org">cairo@cairographics.org</a><br>
<a href="http://lists.cairographics.org/mailman/listinfo/cairo" target="_blank">http://lists.cairographics.org/mailman/listinfo/cairo</a><br></blockquote></div><br>