[cairo] [PATCH] SSE2 patch for pixman

André Tupinambá andrelrt at gmail.com
Mon Apr 7 17:18:53 PDT 2008

On Sun, Apr 6, 2008 at 10:26 PM, Soeren Sandmann <sandmann at daimi.au.dk> wrote:
>  I tried out the patch and noticed a couple of problems:
>  First, after restarting the X server, all the icons were a bit messed
>  up. This indicates that there are problems with some of the
>  NEED_PIXBUF fast paths.

Hum... I will recheck the fbCompositeSrc_8888RevNPx8888sse2 and
fbCompositeSrc_8888RevNPx0565sse2 functions. How do you make this
check? Do Install the pixman in the /usr/lib and restart? Could I ran
this test with Xephyr too?

>  Second, I tried running rendercheck:
>     http://gitweb.freedesktop.org/?p=xorg/app/rendercheck.git;a=summary
>  and it failed on various tests.
>  A simple way to test this is to install the Xephyr X server and run
>  rendercheck against it.

I tried to ran the Xephyr and the rendercheck but I think they doesn't
use the new pixman library, I don't know much about this software.
Could you please send me a small how-to test using this way?

>  Finally, while I didn't do thoroug review of the patch, I did notice a
>  couple of things:
>  +static inline uint32_t
>  +packAlpha (__m128i x)
>  +{
>  +   return _mm_movemask_epi8 (_mm_cmpeq_epi8 (_mm_and_si128 (x, Maskff000000), Maskff000000));
>  +}
>  The name of this function suggests that it generates a representation
>  of the alpha values, and later on it is used in that way:

This code returns 0xffff if all Alpha are 0xff.

>  +        pa = packAlpha (xmmSrcHi);
>  +
>  +        if (pa == 0xffff)
>  +        {
>  +            save128Aligned ((__m128i*)pd, xmmSrcHi);
>  +        }
>  +        else if (pa)
>  +        {
>  but in fact, pa == 0 only means that all the alpha values were
>  *different from 0xFF*, not that they were 0. This may explain some of
>  the rendering problems. (Unfortunately, it may also explain some of
>  the speedups).

Oh... hum... oh my god, a bug!

In fact pa never will be 0 because I'm using the 8-bits comparison
(instead of 32-bits one). If Alpha was different from 0xFF the other
bytes will be always equal (0x## 00 00 00 cmp 0xff 00 00 00 == 0x## ff
ff ff) and pa will be 0x7777, in worst case. Really this can explain
the rendering problems, but the speedups are ok :)

>  Another thing I noticed:
>  I am wondering why you are using _MM_HINT_T0 here? I would have
>  expected _MM_HINT_NTA since the data is not going to be used again.

The original proof-of-concept code had _MM_HINT_NTA hint for SRC
pointers (read only) and _MM_HINT_T0 hint for DST pointers (read and
write), but when I ran some tests I didn't saw any performance
differences (in P4, I didn't ran this test again with Core2). So I
change to a single instruction to simplify the code.

I will test this again with Core2, but I guess the difference will be none too.

More information about the cairo mailing list