[cairo] [PATCH] A set of NEON blitters for Pixman.

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Jun 4 04:00:46 PDT 2009


On Thursday 04 June 2009, Jonathan Morton wrote:
> On Thu, 2009-06-04 at 00:19 +0300, Siarhei Siamashka wrote:
> > On Wednesday 03 June 2009, Koen Kooi wrote:
> > > On 03-06-09 16:48, Jeff Muizelaar wrote:
> > > > On Tue, Jun 02, 2009 at 03:27:14PM +0300, Jonathan Morton wrote:
> > > >>> I've attached a revised version (directly editing the patch!) which
> > > >>> just includes the code, without actually enabling anything.
> > > >>
> > > >> Ignore the previous patch, it's broken because I edited it wrong.
> > > >> Attached is a complete patch series which successively enables the
> > > >> blitters as requested.
> > > >>
> > > >> More blitters to follow later.
> > > >
> > > > Pushed.
> > >
> > > I'm getting weird artifacts with fonts on omap3 when using git head:
> > >
> > > http://scap.linuxtogo.org/files/cfe19862afe80f1693324aa8ca645bdf.png
> >
> > Found and fixed at least one bug (just because it happened to be detected
> > by scaling-test program). A separate test which would cover all the neon
> > fastpath functions is needed to increase chances that there are no other
> > bugs left.
> >
> > Patch is attached.
>
> Ah, I see the real bug.  It's a one-liner - r5 is being sourced
> explicitly instead of the %[width] reference.  It's probably a
> copy-paste bug, not spotted because I didn't originally write it.

This small piece of code actually happened to have two bugs. In addition to
using r5 register incorrectly, it just does not advance the destination
pointer (r4 register).

This is the buggy instruction with pointer pre-increment by 0:
       "str            %[color], [r4]!\n"

And this is the correct post-increment by 4:
       "str            %[color], [r4], #4\n"

> BTW, the reason test/branch/store is used instead of
> test/conditional-store is that apparently some ARMv7 CPUs no longer
> handle conditional instructions efficiently unless they're branches, and
> trigger an unconditional full branch mispredict penalty.  (Sigh.)

Do you have some benchmark results which justify the use of a conditional
branch just over a single instruction here?

Some kind of test program is attached. My tests show that for the width set to
3, not having conditional branches is a clear performance winner (something
like 21 vs. 24 seconds). For the width set to 1, the performance is the same
(around 21 seconds). I'm also not quite convinced that mixing NEON and ARM
stores here is a good idea.

But reviewing all the optimizations and finetuning them for the best possible
performance can be really done later. Right now it's more important to have
correct code.

> So a one-line fix is probably better, and I've attached a patch that
> does exactly that.

The one-line fix is not enough (it can be easily verified by running
scaling-test).

Sorry for all the confusion caused by bundling 2 bugfixes and a
simplification/optimization in a single patch.

> I think this bug would cause the vertical bars on the right that Koen is
> seeing, because the glyph path in Xorg includes filling a temporary
> pixmap.  The reason I don't see it could be that I've locally changed
> Xorg's behaviour in this respect, accidentally making it more tolerant
> of a bad filler.

Yes, this might be the only bug, and it would be nice if Koen could verify
whether the problem that he was observing is now gone.

But realistically, neon optimizations have quite a number of alternative
code paths and branching logic. I would not be surprised if it has some other
bugs ;-) Seems like your tests are not good enough, so they could not detect
the problem in fill function. I'm repeating myself, but I still think that 
tests with good coverage for neon optimizations are needed, and they need to
end up in pixman git repository. So that any modifications to this code could
be tested at any time easily.

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bench-small-fill.c
Type: text/x-csrc
Size: 407 bytes
Desc: not available
Url : http://lists.cairographics.org/archives/cairo/attachments/20090604/dc826fc4/attachment.c 


More information about the cairo mailing list