[cairo] [PATCH] pixman: fast path for nearest neighbour scaled compositing operations.
siarhei.siamashka at nokia.com
Sun May 31 09:09:55 PDT 2009
On Sunday 31 May 2009 04:29:40 ext Soeren Sandmann wrote:
> First, thanks for looking at optimizing these functions. It is
> definitely a noticable improvement for Firefox on
> non-hardware-accelerated X servers. And I appreciate you putting this
> in a git repository; it makes looking at the patches a lot easier.
> > Additionally: 1. Scaling test utility patch got more comments about
> > its usage (btw, it would be nice to have all the pixman tests well
> > commented, just in order to know how to interpret their results).
> I've pushed this patch.
> The test programs are not really regression tests. They are just
> programs that happened to be useful at one point or other.
> A real regression test suite like the one cairo has would certainly be
> appreciated. Ideally, any time a bug is found, it would be simple to
> add it to the test suite to make sure it never appears again.
> The scaling test program is certainly very useful, but it's not easy
> to add new tests to it, and it can be difficult to track down exactly
> what is going on when it fails.
Sure it does not fix bugs automatically for you and does not make coffee :-)
But at least the test shows precise information about what kind of operation
has failed so that one can investigate it.
For example the scaling-test easily detected a bug in your patch:
+ if (!repeat (src_repeat, &y, src_height))
+ memset (dst, 0, sizeof (*dst) * width);
This should be:
if (op != PIXMAN_OP_OVER) memset (dst, 0, sizeof (*dst) * width);
> > Overall, I don't quite like code duplication (many functions sharing the
> > same code), but alternatives may look a bit more obfuscated. Alternatives
> > could use either C preprocessor or perl script for emitting scaler
> > functions from a common template. In any case, the most important thing
> > is that the code has an extensive correctless test, so it is easy to
> > transform code to any other representation without too much risk of
> > breaking it.
> The duplication is my main concern too. Even with testing, the code is
> still unwieldy and difficult to follow. With bilinear fast paths, we
> are looking at another doubling of the number of functions. With
> rotations, another. I don't think an exponential number of duplicated
> fast paths is necessarily the best way to go here.
The number of duplicated fast paths is far from exponential. The most used
operations are SRC and OVER. Scaling is also commonly used. Having these
operations properly optimized will not add too much lines of code or
complexity to pixman.
> So, how can we avoid some of the duplication? Here are some ideas:
> - Instead of duplicating all the code, simply generalize and optimize
> the existing fbSrcScaleNearest. I have attached a patch that does
> that. It doesn't deal with 0565 formats, but that could be added.
> It could be made faster by unrolling the loop a couple of times.
> One advantage of this approach is that it deals with all repeat
> types, including NONE horizontally. It does add some branches to the
> inner loop, but those are very predictable. I'd be interested in
> seeing what the performance of it is on renderbench.
Benchmarked it on Cortex-A8: http://pastebin.ca/1442396
Your code is ~25% to ~60% slower in various tests. But maybe on x86 it could
be a bit different.
Any kind of branches in the inner loop are a bad idea if they can be avoided,
no matter if they are predictable or not. Moreover, render_bench also does
alpha blending, having extra clipping checks for normal scaling is even less
desirable. The next step would be to introduce SIMD optimizations to the
scaler functions, which should provide a noticeable speed boost.
I'm still more interested to get some decent optimizations for the bilinear
scaling in a reasonable timeframe. It should have a good balance between
performance and image quality.
> The patch is also available here:
Actually I thought about adding support for OVER compositing (like you did)
and also rotation to fbCompositeSrcScaleNearest as the next step. That would
make it a nice fallback option. Rotation is relatively cheap and there is no
need to cripple it, considering that we already do clipping in this function.
In any case, I think that it would be nice if you could commit your changes
(with a bugfix mentioned above). That's definitely an improvement over the
> - Can we autogenerate the special cases?
> If we end up having to have all these fast paths for
> bilinear/nearest/rotation, maybe its better to just bite the bullet
> now and write a perl script to generate them.
I'm also in favour of this idea, but don't know perl myself :-/
I could do it easily in ruby, but understand that it is definitely not
desirable to have ruby as a build dependency for pixman.
I have an idea to get some kind of Finite-state machine to optimize code
further. Currently my code runs an empty loop first to check whether clipping
is needed or not. That's a bit of setup overhead. The code may be modified to
process the first line so that it does both pixels processing and a clipping
check. If no clipping happens on processing the first line, it can switch to a
different variant of loop which runs without clipping check...
More information about the cairo