[cairo] M64 parts of the SSE instruction set

Frédéric Plourde frederic.plourde at polymtl.ca
Thu Mar 20 09:15:26 PDT 2008


Hi Soeren !

   I've hit a snag recently about a cool optimization Jeff Muizelaar 
proposed on cairographics. It uses the pmulhuw instruction (through the 
_mm_mulhi_pu16 intrinsic). Initially, he proposed it for SSE2 parts, but 
I intended to implement it in pure MMX codepaths as well (especially for 
the pix_multiply and pix_add_mul functions)

The problem is :  the _mm_mulhi_pu16 is NOT declared in mmintrin.h but 
rather in the xmmintrin.h file...meaning it's not part of pure MMX 
intrinsics, but rather part of the "Integer Intrinsics Using Streaming 
SIMD Extensions"... still using __m64 registers though.

Besides, I checked the Intel's instruction set and pmulhuw is clearly 
available as a pure MMX instruction... weird... why didn't the intrinsic 
make it through Visual C++'s mmintrin.h file then ?

So, of course, I tried inline assembly inside pix_multiply, but the 
overhead that is brought by the automatically-generated function's 
epilog and prolog is really hitting performance (because pix_multiply is 
called intensively !).

So my question is (at last ;-) ::

I would rather go back and use the _mm_mulhi_pu16 intrinsic because it 
makes everything run smoothly (no epilog overhead, nice assembly) but it 
would require me to include the xmmintrin.h file inside USE_MMX 
blocks... is this acceptable to your pixman-mmx.c philosophy ?

Also... I've got some concerns about other MMX-only architectures... 
Though I double-checked the OLPC... everything's fine there... the OLPC 
implements the Geode instruction set, which has all the MMX 
instructions, plus the _m64-wise SSE instructions like pmulhuw.....  but 
I was concerned about other mmx-only architectures that you guys might 
be supporting and that I wouldn't be aware of.

ok... that's about it.
thanx for your help !

-fred-


More information about the cairo mailing list