[cairo] perf gains using manual inlining of critical MMX intrinsics

Frédéric Plourde frederic.plourde at polymtl.ca
Thu Mar 6 18:44:49 PST 2008

Previously, Vlad wrote :
"I'm really worried about the huge slowdown with MMX enabled and alpha 
-- sounds like there's just something really bad going on in that case,"

Here are some news :
Using Intel VTune, I've been profiling MMX codepaths through 
pixman-mmx.c using "over" and "src" ops and it seems our 
performance-drop issues were related to deficient inlining of very 
crucial functions like expand8888(...), expand_alpha(...), over(...) and 
pix_multiply(...) under win32 platform.

for example, the profiling call graph of "cairo-perf.exe" for "paint" 
source, "image" backend and "over" operator spent more than 65% of its 
time inside the "over(...)" and expand8888(...) functions, calling 
expand8888 more than 15 million times! expand8888 contains a series of 
if-then-else statements that actually kill MMX perf gains. I optimized 
them, discarding the conditional statements with some reasoning.

I've also manually inlined those functions to see the perf gain....

Here are some preliminary results (very encouraging) with an average 
speedup of 3X : http://pastebin.mozilla.org/357480

I'm done for tonight.
More news coming up tomorrow on that issue, with, more than likely, some 
related patches :-)

More information about the cairo mailing list