[cairo] ahhh ! I think I get it ! - the fbCompositeCopyAreammx case.
frederic.plourde at polymtl.ca
Fri Mar 28 07:45:44 PDT 2008
Soeren Sandmann a écrit :
> Vladimir Vukicevic <vladimir at pobox.com> writes:
>> On Mar 27, 2008, at 11:51 AM, Frédéric Plourde wrote:
>>> Remember that issue you brought up about subimage_copy with 512X512
>>> which made things slower ?
>>> I've got some more news about it...
>>> Since Soeren questionned the precision of giving perf results with
>>> only a small number of iterations, I've used a minimum of 100
>>> iterations on my perf tests.
>>> Take a look at the newest results about "subimage_copy"...
>>> First group of results shows NO GAIN between pre-mmx and post-mmx
>>> except for the 512X512 case alone, which shows 58% speedup... that's
>>> The second group of results shows perf GAINS between : before
>>> applying my patch and after applying my "fbCompositeCopyArea" patch
>>> But these gains, as you noticed earlier, seem to fade out as the
>>> image scales up.
>>> I'll investigate some more about that, but for now, I think the
>>> patch "pixman_OPT_MMX_addFastPath_to_fbCompositeCopyAreammx.patch"
>>> is worth it.
>> Sounds good to me -- Soeren, adding an x8r8g8b8 -> a8r8g8b8 SOURCE
>> path being a Copy sounds sane, right?
> Sure, it's correct enough (provided you mean a8r8g8b8->x8r8g8b8), but
> it's just not going to make much of a performance difference.
> For the subimage_copy tests, even 100 iterations is way too low, so I
> tested with 100000 and cairo-perf-diff reports this:
> image-rgb subimage_copy-256 0.01 2.00% -> 0.01 1.95%: 1.06x speedup
> image-rgb subimage_copy-128 0.01 2.02% -> 0.01 1.98%: 1.05x speedup
> image-rgb subimage_copy-16 0.01 1.91% -> 0.01 1.85%: 1.05x speedup
> There is just nothing there.
> This is not surprising either, because the operation is memory
> bandwidth limited, so it does not matter whether you use memcpy() or
> I'm not actively oppposed to the patch, but if you do add it, please
> also use fbCopyAreammx in the ABGR->XBGR case as well.
I got it ! I know why we dissagree on the fbCompositeCopyAreammx thing ;-)
First, I understand Soeren that, given your results - which don't show
perf gains at all - I understand that you've been drawn to conclusions
like memory-bandwidth-limitation kind of issues to explain it. But
again, did you see my last perf report with 100 iterations ?
it shows good gains for the 32X32 to 512X512 cases and the associated std deviations are just fine. But then again, I understand why you were tempted to think that 100 iterations weren't enough.... so I boosted the number of iterations with -i 10000 and I still got the same perf gains! see this :
And I finally understood why you get consistent, no-gain results, even as image size goes up... (it's a very silly reason, in fact), it's because the "subimage_copy" perftest only copies a 2 X 2 area, no matter how large the image is !! A Questionnable choice if you ask me... But I shouldn't judge this choice so harshly, whoever put it that way at first sure had good reasons, but I'll bring up this issue to cairographics very soon, of course.
To get a sense of the progression of our little fbCompositeCopyAreammx as image scales up, just change line #40 of subimage_copy.c from
cairo_rectangle (cr, 2, 2, 4, 4);
cairo_rectangle (cr, 0, 0, width, height);
... or apply the attached patch ;-)
and you might get results that look like mine.
I would be pleased if you tested it on your side as well to double-check.
My current issue right now, as Vlad noticed, is explain why does the perf gains shrink as image scales up above 2048 X 2048... I almost completed this study, and will be able to post some conclusions today ;-)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 418 bytes
Desc: not available
Url : http://lists.cairographics.org/archives/cairo/attachments/20080328/cf779be9/attachment-0001.bin
More information about the cairo