[cairo] M64 parts of the SSE instruction set
frederic.plourde at polymtl.ca
Thu Mar 20 09:15:26 PDT 2008
Hi Soeren !
I've hit a snag recently about a cool optimization Jeff Muizelaar
proposed on cairographics. It uses the pmulhuw instruction (through the
_mm_mulhi_pu16 intrinsic). Initially, he proposed it for SSE2 parts, but
I intended to implement it in pure MMX codepaths as well (especially for
the pix_multiply and pix_add_mul functions)
The problem is : the _mm_mulhi_pu16 is NOT declared in mmintrin.h but
rather in the xmmintrin.h file...meaning it's not part of pure MMX
intrinsics, but rather part of the "Integer Intrinsics Using Streaming
SIMD Extensions"... still using __m64 registers though.
Besides, I checked the Intel's instruction set and pmulhuw is clearly
available as a pure MMX instruction... weird... why didn't the intrinsic
make it through Visual C++'s mmintrin.h file then ?
So, of course, I tried inline assembly inside pix_multiply, but the
overhead that is brought by the automatically-generated function's
epilog and prolog is really hitting performance (because pix_multiply is
called intensively !).
So my question is (at last ;-) ::
I would rather go back and use the _mm_mulhi_pu16 intrinsic because it
makes everything run smoothly (no epilog overhead, nice assembly) but it
would require me to include the xmmintrin.h file inside USE_MMX
blocks... is this acceptable to your pixman-mmx.c philosophy ?
Also... I've got some concerns about other MMX-only architectures...
Though I double-checked the OLPC... everything's fine there... the OLPC
implements the Geode instruction set, which has all the MMX
instructions, plus the _m64-wise SSE instructions like pmulhuw..... but
I was concerned about other mmx-only architectures that you guys might
be supporting and that I wouldn't be aware of.
ok... that's about it.
thanx for your help !
More information about the cairo