[cairo] [Pixman] Image resampling [PATCH 0/6]

Tue Nov 27 01:51:29 PST 2012

On Sat, 24 Nov 2012 04:57:22 +0100
sandmann at cs.au.dk (Søren Sandmann) wrote:

> Hi,
> 
> Reviewing the supersampling patch here:
> 
>    http://cgit.freedesktop.org/~ajohnson/pixman/log/?h=supersampling
> 
> I wasn't happy with either the performance and image quality, and I
> realized that the whole supersampling approach just isn't going to
> fly. Since I told people to do it that way, I apologize for that. The
> approach advocated by Bill Spitzak in the various downsampling threads
> of computing a convolution kernel up front, is the much better way to
> go. To make up for being misleading, the following patches implement
> comprehensive support for high-quality image scaling filters.
> 
> Pixman already has a convolution filter, but since it only allows one
> sample per pixel of the filter, it is limited in the quality that it can
> support, so the following patches (to be applied on top of the three
> rounding patches) add a new filter type
>     
>     PIXMAN_FILTER_SEPARABLE_CONVOLUTION
> 
> that supports multiple different convolution matrices that are chosen
> between based on the subpixel source location. The matrices are
> specified as tensor products of x/y vectors, which makes them
> separable by definition.

I like this approach. This is a clean design, which look simple enough
to optimize for better performance with SIMD.

> -=- Further work and examples
> 
> There is some additional work that could be done:
> 
> - Performance improvements. Low-hanging fruit includes adding new fast
>   path iterators that assume the source is a8r8g8b8 or r5g6b5. Higher
>   hanging fruit is SIMD optimziations and implementations that take
>   advantage of separability. It may also be interesting to speed up
>   pixman_filter_create_separable_convolution() by tabularizing some of
>   the trigonometric functions etc.

From what I see, the separable convolution filter shares a lot of
similarities with the existing pixman SIMD code for bilinear scaling,
which could be extended with relatively little effort.

Bilinear scaling uses weighted average of 2 pixels (in one direction),
with weights calculated on the go. Separable convolution uses weighted
average of N pixels, with weights obtained by table lookups. Both use
subpixel positions (7 phase bits or 128 phases for current bilinear
implementation) to lookup or calculate weights. Bilinear filter is
naturally a subset of separable convolution.

The biggest challenge for optimized bilinear scaling (compared to
nearest) had been properly implementing different types of repeat on
image edges due to sampling of some pixels outside of the image
boundaries. But "many" pixels from the separable convolution filter is
not much different from just "two" from the (bi)linear filter in this
respect, so this should already work fine also for separable
convolution with just minor tweaks.

Regarding SIMD optimized iterators (for both bilinear and separable
convolution), they are rather simple to implement as well. Just have a
look at my old OpenMP proof of concept patch:
    http://lists.freedesktop.org/archives/pixman/2012-June/002071.html
The OMP_BILINEAR_PARALLEL_FOR define lists all the variables which
fully represent the state of the iterator, needed to walk over the
source image and scale it (only the loop counter 'i' is different for
each scanline). This state can be calculated when creating an
iterator, and then we can simply pull one scaled scanline at a time.
The fact that the current bilinear code is not separable is a minor
implementation detail, we could as well have 2x1 sized filter instead
of 2x2 and then do vertical scaling separately with the help of a
temporary buffer in L1 cache like you tried before in the
"separable-bilinear" branch:
    http://lists.freedesktop.org/archives/pixman/2012-June/002140.html
And I tried a similar "two-passes in L1 cache" approach for YUV->RGB
scaling code prototype much earlier, with reasonably good results:
    https://bugzilla.mozilla.org/show_bug.cgi?id=634557#c53
So this is expected to provide good performance.

Affine transformations (still to be implemented with SIMD) are a bit
different and iterators are not a very good choice for them due to less
than perfect memory access pattern.

-- 
Best regards,
Siarhei Siamashka