[cairo] [PATCH] SSE2 patch for pixman
André Tupinambá
andrelrt at gmail.com
Mon Apr 7 17:18:53 PDT 2008
On Sun, Apr 6, 2008 at 10:26 PM, Soeren Sandmann <sandmann at daimi.au.dk> wrote:
> I tried out the patch and noticed a couple of problems:
>
> First, after restarting the X server, all the icons were a bit messed
> up. This indicates that there are problems with some of the
> NEED_PIXBUF fast paths.
Hum... I will recheck the fbCompositeSrc_8888RevNPx8888sse2 and
fbCompositeSrc_8888RevNPx0565sse2 functions. How do you make this
check? Do Install the pixman in the /usr/lib and restart? Could I ran
this test with Xephyr too?
> Second, I tried running rendercheck:
>
> http://gitweb.freedesktop.org/?p=xorg/app/rendercheck.git;a=summary
>
> and it failed on various tests.
>
> A simple way to test this is to install the Xephyr X server and run
> rendercheck against it.
I tried to ran the Xephyr and the rendercheck but I think they doesn't
use the new pixman library, I don't know much about this software.
Could you please send me a small how-to test using this way?
> Finally, while I didn't do thoroug review of the patch, I did notice a
> couple of things:
>
> +static inline uint32_t
> +packAlpha (__m128i x)
> +{
> + return _mm_movemask_epi8 (_mm_cmpeq_epi8 (_mm_and_si128 (x, Maskff000000), Maskff000000));
> +}
>
> The name of this function suggests that it generates a representation
> of the alpha values, and later on it is used in that way:
This code returns 0xffff if all Alpha are 0xff.
>
> + pa = packAlpha (xmmSrcHi);
> +
> + if (pa == 0xffff)
> + {
> + save128Aligned ((__m128i*)pd, xmmSrcHi);
> + }
> + else if (pa)
> + {
>
> but in fact, pa == 0 only means that all the alpha values were
> *different from 0xFF*, not that they were 0. This may explain some of
> the rendering problems. (Unfortunately, it may also explain some of
> the speedups).
Oh... hum... oh my god, a bug!
In fact pa never will be 0 because I'm using the 8-bits comparison
(instead of 32-bits one). If Alpha was different from 0xFF the other
bytes will be always equal (0x## 00 00 00 cmp 0xff 00 00 00 == 0x## ff
ff ff) and pa will be 0x7777, in worst case. Really this can explain
the rendering problems, but the speedups are ok :)
> Another thing I noticed:
(...)
> I am wondering why you are using _MM_HINT_T0 here? I would have
> expected _MM_HINT_NTA since the data is not going to be used again.
The original proof-of-concept code had _MM_HINT_NTA hint for SRC
pointers (read only) and _MM_HINT_T0 hint for DST pointers (read and
write), but when I ran some tests I didn't saw any performance
differences (in P4, I didn't ran this test again with Core2). So I
change to a single instruction to simplify the code.
I will test this again with Core2, but I guess the difference will be none too.
More information about the cairo
mailing list