[cairo] [PATCH] SSE2 patch for pixman

Soeren Sandmann sandmann at daimi.au.dk
Tue Apr 22 22:26:26 PDT 2008

Hi André,

> I'm glad to say that's all rendercheck's (in 24bpp, 16bpp and 8bpp
> XServers) ran ok and I really believe that's this is the final patch.
> I ran some perf's too and the results was very good in my Core2 Q6600 machine.
> Thanks for everyone thats test the and check the results.

This is great work! I have committed the patch, but there are still a
couple of things that may be worth looking into:

** Commented-out uses of fbCompositeCopyAreasse2()

The comment says that these are not actually faster than the generic
code. Looking at the generated assembly,

        leal    64(%ecx), %edi
        subl    $64, %esi
        movdqu  (%ecx), %xmm0
        movdqu  16(%ecx), %xmm1
        movdqu  32(%ecx), %xmm2
        movdqu  48(%ecx), %xmm3
        prefetcht0      (%eax)
        movdqa  %xmm0, -128(%eax)
        movdqa  %xmm1, -112(%eax)
        movdqa  %xmm2, -96(%eax)
        movdqa  %xmm3, -80(%eax)
        addl    $64, %eax
        cmpl    $63, %esi
        prefetcht0      128(%ecx)
        movl    %edi, %ecx
        jg      .L35

I find that hard to believe. Which test did you use to determine this?
Note that the copy_subimage() tests copy really tiny images, so they
are not a good indicator of blit performance.

** Prefetch with _MM_HINT_T0 instead of _MM_HINT_NTA

You said previously that it did not actually make any difference which
one we used. Even if that is true for benchmarks, I think NTA is
probably a better choice in real applications where L2 cache pollution
is a much bigger deal.

** Commented out fbCompositeOver_x888x8x8888sse2()

The comment says that the MMX version is buggy and that the SSE2
version inherits that bug. But the MMX version is not buggy, it is
just not as fast as the generic code, so there is no point using it.

This may or may not apply to the SSE2 version, but if it doesn't, we
should turn it on.

** Not making use of pixmanFillsse2() and pixmanBltsse2()

These two functions could be used in pixman-utils.c in pixman_fill()
and pixman_blt().

None of the above affects the correctness of the patch though, so
since cairo-perf reports good speedups - more than 7 times in some
cases - I committed the patch and pushed it out.


More information about the cairo mailing list