[cairo] [PATCH] Image scaling regression test script and 'fbCompositeSrcScaleNearest' bugfixes

Wed Apr 1 02:20:45 PDT 2009

On Wednesday 01 April 2009 04:53:23 ext Soeren Sandmann wrote:
> Hi,
>
> > I'm currently working on porting performance optimizations from Xomap
> > (X server used in Nokia internet tablets N800 and N810) into the latest
> > version of pixman and now also have a permission to submit these patches
> > directly upstream :)
>
> Great! Upstream much appreciates this.
>
> > Xomap has optimizations which include fastpath functions for nearest
> > neighbour scaling with the support of SRC and OVER operations for
> > NONE and NORMAL repeat in the case when no rotation is involved and
> > only a single clipping rectangle is used. Also it has a set of ARMv6
> > assembly optimizations (they partially overlap with the ARM optimizations
> > that pixman also got recently, but still have some useful stuff).
>
> Jeff also has some ARM optimizations here:
>
>     http://cgit.freedesktop.org/~jrmuizel/pixman/log/?h=neon

Yes, I have seen and tried this patch. Are there any immediate plans for
merging it into the main branch? This way we can move on and improve it
further. For example, adding prefetch support increases NEON x8r8g8b8->r5g6b5
conversion performance enormously.

Efficient ARM NEON optimizations for all the commonly used pixman functions is
also something that I'm interested to have as the end result.

> Is ARMv6 the version that got neon? ARM names always confuse me; is
> there a cheat sheet somewhere that explains the difference between
> names like ARM8, ARMv8 and Cortex A8?

NEON is supported by newer ARM chips. ARMv6 is not considered bleeding edge
anymore, but as these optimizations already exist, little effort is needed to
get them in and they are useful for older devices.

> > An attempt at developing such script to provide a solid testing base is
> > attached. It's quite simple, but already helped to hunt down two bugs
> > in 'fbCompositeSrcScaleNearest' function (patch is also attached).
> >
> > +#!/usr/bin/env ruby
>
> I didn't really look at the code here, but I applied the patch
> anyway. If there are test programs that people find useful, I'm
> generally happy enough to add them to pixman/test. (I would be even
> happier to add a real regression test suite of course ...)

Actually I expected some feedback for this part :)

This code can be easily used for the automated regression testing. It's
behaviour is fully deterministic, so crc32 value at the end should be always
the same. If it is calculated once for the generic path, correctness of other
implementations can be always verified.

But it is of course not absolutely bulletproof, for example it can't see the
differences of having or not having 'pixman_fixed_e' correction applied. Also
in order to verify that the fastpath function is not selected for some
inappropriate cases, the test should be extended to also stress different
clipping settings, etc.

> > From bacba33cc4fe0e4dcd0dc5804b0f6e83367cfb4f Mon Sep 17 00:00:00 2001
> > From: Siarhei Siamashka <siarhei.siamashka at nokia.com>
> > Date: Tue, 31 Mar 2009 21:02:44 +0300
> > Subject: [PATCH] fbCompositeSrcScaleNearest bugfixes
> >
> > 'fbCompositeSrcScaleNearest' function had a problem with alpha channel
> > in the destination image. Also NORMAL repeat was broken (the optimized
> > function can handle repeat operation itself and can be screwed up if
> > 'pixman_walk_composite_region' tries to help it by splitting the work
> > into handling multiple separate areas).
>
> The srcRepeat = FALSE change looks correct and should give a good
> speedup, but does it actually fix a *bug* or just a performance
> problem?
>
> If fbCompositeSrcScaleNearest() actually renders incorrectly when it
> is given 1x1 rectangles, then that's a bug in its own right that we
> can't work around by setting srcRepeat to FALSE.

Yes, it's a rendering bug. Splitting work into handling different areas does
not work right for the transform case (and it is never used for generic path).
The point is that this splitting only has full pixel precision at the moment,
while correct blitting needs to preserve some fractional part in calculations
when moving from one "tile" to another.

This splitting actually might need to be improved later so that it could be
used for the transform case too. It may be needed for the performance
reasons (in order to have fast scaling functions which can handle clipping
efficiently).

> The alpha change I don't understand. Note this line:
>
>          && pSrc->bits.format == pDst->bits.format
>
> down in the if statement; fbCompositeSrcScaleNearest() only gets
> called when source and destination have the same format, so copying
> pixels directly between them should be correct, and there should be no
> need for masking in 0xff in the alpha channel.

This is a bit of gray area. This change is needed to provide exactly the same
rendering results as the generic path. Generic path always sets the (unneeded)
alpha bits to 0xFF if the source picture had them set to something different.
But if the values of these bits are actually supposed to be "undefined", then
the test program needs to be modified not to take them into account when
checking correctness of rendering.

Another somewhat related case is 16bpp rendering. Generic path works by
converting the data to 32bpp, doing the compositing operation and finally
converting data back to 16bpp. Fastpath functions need to do the same if
bitexact results when compared to generic path are desired. On the other hand,
doing compositing operation directly in 16bpp is both faster and provides more
precision. The drawback of doing this optimization is that testing for
correctness of the code is more difficult because it can't be directly
compared to the generic path.

-- 
Best regards,
Siarhei Siamashka