[cairo] perf gains using manual inlining of critical MMX intrinsics
frederic.plourde at polymtl.ca
Thu Mar 6 18:44:49 PST 2008
Previously, Vlad wrote :
"I'm really worried about the huge slowdown with MMX enabled and alpha
-- sounds like there's just something really bad going on in that case,"
Here are some news :
Using Intel VTune, I've been profiling MMX codepaths through
pixman-mmx.c using "over" and "src" ops and it seems our
performance-drop issues were related to deficient inlining of very
crucial functions like expand8888(...), expand_alpha(...), over(...) and
pix_multiply(...) under win32 platform.
for example, the profiling call graph of "cairo-perf.exe" for "paint"
source, "image" backend and "over" operator spent more than 65% of its
time inside the "over(...)" and expand8888(...) functions, calling
expand8888 more than 15 million times! expand8888 contains a series of
if-then-else statements that actually kill MMX perf gains. I optimized
them, discarding the conditional statements with some reasoning.
I've also manually inlined those functions to see the perf gain....
Here are some preliminary results (very encouraging) with an average
speedup of 3X : http://pastebin.mozilla.org/357480
I'm done for tonight.
More news coming up tomorrow on that issue, with, more than likely, some
related patches :-)
More information about the cairo