[cairo] [PATCH] SSE2 patch for pixman
Soeren Sandmann
sandmann at daimi.au.dk
Tue Apr 22 22:26:26 PDT 2008
Hi André,
> I'm glad to say that's all rendercheck's (in 24bpp, 16bpp and 8bpp
> XServers) ran ok and I really believe that's this is the final patch.
>
> I ran some perf's too and the results was very good in my Core2 Q6600 machine.
>
> Thanks for everyone thats test the and check the results.
This is great work! I have committed the patch, but there are still a
couple of things that may be worth looking into:
** Commented-out uses of fbCompositeCopyAreasse2()
The comment says that these are not actually faster than the generic
code. Looking at the generated assembly,
.L35:
leal 64(%ecx), %edi
.L23:
subl $64, %esi
movdqu (%ecx), %xmm0
movdqu 16(%ecx), %xmm1
movdqu 32(%ecx), %xmm2
movdqu 48(%ecx), %xmm3
prefetcht0 (%eax)
movdqa %xmm0, -128(%eax)
movdqa %xmm1, -112(%eax)
movdqa %xmm2, -96(%eax)
movdqa %xmm3, -80(%eax)
addl $64, %eax
cmpl $63, %esi
prefetcht0 128(%ecx)
movl %edi, %ecx
jg .L35
I find that hard to believe. Which test did you use to determine this?
Note that the copy_subimage() tests copy really tiny images, so they
are not a good indicator of blit performance.
** Prefetch with _MM_HINT_T0 instead of _MM_HINT_NTA
You said previously that it did not actually make any difference which
one we used. Even if that is true for benchmarks, I think NTA is
probably a better choice in real applications where L2 cache pollution
is a much bigger deal.
** Commented out fbCompositeOver_x888x8x8888sse2()
The comment says that the MMX version is buggy and that the SSE2
version inherits that bug. But the MMX version is not buggy, it is
just not as fast as the generic code, so there is no point using it.
This may or may not apply to the SSE2 version, but if it doesn't, we
should turn it on.
** Not making use of pixmanFillsse2() and pixmanBltsse2()
These two functions could be used in pixman-utils.c in pixman_fill()
and pixman_blt().
None of the above affects the correctness of the patch though, so
since cairo-perf reports good speedups - more than 7 times in some
cases - I committed the patch and pushed it out.
Thanks,
Soren
More information about the cairo
mailing list