[cairo] M64 parts of the SSE instruction set

Thu Mar 20 20:32:31 PDT 2008

Hi,

This issue is confusing for a number of reasons:

* As Nicholas said, the pmulhuw instruction was only added as part of
  AMD's 3DNow! instructions that Intel picked up as part of SSE. This
  is not mentioned in the Intel documentation for the instruction. It
  is mentioned elsewhere though.

* The 'SSE' flag in pixman-pict.c is 0x6 which includes the
  'MMX_Extensions' flag (0x2), so currently the MMX code paths get run
  on machines that support either the AMD extensions or SSE.

* The code guarded by USE_SSE is only making use of the MMX
  extensions, not the actual SSE floating point instructions.

* GCC requires the -msse flag to make use of MMX extensions, but this
  flag also causes it to generate SSE instructions (movss) that are
  not part of the MMX_Extension set. See

        http://bugs.gentoo.org/show_bug.cgi?id=192048

So no, as things stand we can't use pmulhuw in pixman-mmx.c.

The fact that we check for MMX_Extensions, but never make any use of
it is silly of course. We should just drop the MMX_Extensions check.
Basically, the USE_SSE code can only be safely used on x86-64.

Soren

Nicholas Miell <nmiell at comcast.net> writes:

> On Thu, 2008-03-20 at 11:15 -0500, Frédéric Plourde wrote:
> > Hi Soeren !
> > 
> >    I've hit a snag recently about a cool optimization Jeff Muizelaar 
> > proposed on cairographics. It uses the pmulhuw instruction (through the 
> > _mm_mulhi_pu16 intrinsic). Initially, he proposed it for SSE2 parts, but 
> > I intended to implement it in pure MMX codepaths as well (especially for 
> > the pix_multiply and pix_add_mul functions)
> > 
> > The problem is :  the _mm_mulhi_pu16 is NOT declared in mmintrin.h but 
> > rather in the xmmintrin.h file...meaning it's not part of pure MMX 
> > intrinsics, but rather part of the "Integer Intrinsics Using Streaming 
> > SIMD Extensions"... still using __m64 registers though.
> > 
> > Besides, I checked the Intel's instruction set and pmulhuw is clearly 
> > available as a pure MMX instruction... weird... why didn't the intrinsic 
> > make it through Visual C++'s mmintrin.h file then ?
> > 
> 
> Because only Intel processors that support SSE or AMD processors that
> support either SSE or "MMX Extensions" have that MMX instruction.
> 
> (There's a handy table in the back of the AMD64 Architecture
> Programmer’s Manual Volume 3: General-Purpose and System Instructions
> that lists what instructions are available with which CPUID feature
> bits.).
> 
> > So, of course, I tried inline assembly inside pix_multiply, but the 
> > overhead that is brought by the automatically-generated function's 
> > epilog and prolog is really hitting performance (because pix_multiply is 
> > called intensively !).
> > 
> > So my question is (at last ;-) ::
> > 
> > I would rather go back and use the _mm_mulhi_pu16 intrinsic because it 
> > makes everything run smoothly (no epilog overhead, nice assembly) but it 
> > would require me to include the xmmintrin.h file inside USE_MMX 
> > blocks... is this acceptable to your pixman-mmx.c philosophy ?
> > 
> > Also... I've got some concerns about other MMX-only architectures... 
> > Though I double-checked the OLPC... everything's fine there... the OLPC 
> > implements the Geode instruction set, which has all the MMX 
> > instructions, plus the _m64-wise SSE instructions like pmulhuw.....  but 
> > I was concerned about other mmx-only architectures that you guys might 
> > be supporting and that I wouldn't be aware of.
> > 
> 
> Well, right now pixman_have_mmx() only returns true if the processor
> supports MMX and the AMD MMX Extensions, so by definition a) the MMX
> paths only ever run on AMD CPUs and b) all AMD CPUs that it runs on have
> PMULHUW (and also PSHUFW, which means the existing USE_SSE hackery for
> expand_alpha/expand_alpha_rev/invert_colors is unnecessary).
> 
> Which raises the question: is pixman's MMX support intended for down-rev
> CPUs in general (i.e. stuff from the Pentium 2 or Pentium MMX era) or is
> it intended only for the OLPC?
> 
> -- 
> Nicholas Miell <nmiell at comcast.net>