[cairo] M64 parts of the SSE instruction set

Nicholas Miell nmiell at comcast.net
Thu Mar 20 14:11:38 PDT 2008

On Thu, 2008-03-20 at 11:15 -0500, Frédéric Plourde wrote:
> Hi Soeren !
>    I've hit a snag recently about a cool optimization Jeff Muizelaar 
> proposed on cairographics. It uses the pmulhuw instruction (through the 
> _mm_mulhi_pu16 intrinsic). Initially, he proposed it for SSE2 parts, but 
> I intended to implement it in pure MMX codepaths as well (especially for 
> the pix_multiply and pix_add_mul functions)
> The problem is :  the _mm_mulhi_pu16 is NOT declared in mmintrin.h but 
> rather in the xmmintrin.h file...meaning it's not part of pure MMX 
> intrinsics, but rather part of the "Integer Intrinsics Using Streaming 
> SIMD Extensions"... still using __m64 registers though.
> Besides, I checked the Intel's instruction set and pmulhuw is clearly 
> available as a pure MMX instruction... weird... why didn't the intrinsic 
> make it through Visual C++'s mmintrin.h file then ?

Because only Intel processors that support SSE or AMD processors that
support either SSE or "MMX Extensions" have that MMX instruction.

(There's a handy table in the back of the AMD64 Architecture
Programmer’s Manual Volume 3: General-Purpose and System Instructions
that lists what instructions are available with which CPUID feature

> So, of course, I tried inline assembly inside pix_multiply, but the 
> overhead that is brought by the automatically-generated function's 
> epilog and prolog is really hitting performance (because pix_multiply is 
> called intensively !).
> So my question is (at last ;-) ::
> I would rather go back and use the _mm_mulhi_pu16 intrinsic because it 
> makes everything run smoothly (no epilog overhead, nice assembly) but it 
> would require me to include the xmmintrin.h file inside USE_MMX 
> blocks... is this acceptable to your pixman-mmx.c philosophy ?
> Also... I've got some concerns about other MMX-only architectures... 
> Though I double-checked the OLPC... everything's fine there... the OLPC 
> implements the Geode instruction set, which has all the MMX 
> instructions, plus the _m64-wise SSE instructions like pmulhuw.....  but 
> I was concerned about other mmx-only architectures that you guys might 
> be supporting and that I wouldn't be aware of.

Well, right now pixman_have_mmx() only returns true if the processor
supports MMX and the AMD MMX Extensions, so by definition a) the MMX
paths only ever run on AMD CPUs and b) all AMD CPUs that it runs on have
PMULHUW (and also PSHUFW, which means the existing USE_SSE hackery for
expand_alpha/expand_alpha_rev/invert_colors is unnecessary).

Which raises the question: is pixman's MMX support intended for down-rev
CPUs in general (i.e. stuff from the Pentium 2 or Pentium MMX era) or is
it intended only for the OLPC?

Nicholas Miell <nmiell at comcast.net>

More information about the cairo mailing list