[cairo] Concerns about using filters for downscaling

Thu Mar 27 10:48:56 PDT 2014

Owen Taylor <otaylor at redhat.com> writes:

> Thoughts
> =======>
> The good thing about this general approach is that it's
> straightforward to understand what is going on and what results it
> produces and isn't much code. And it's there in the repository, not
> just theory about what we *could* do. :-)
>
> Downsides with the general approach:
>
>  * The approach is not efficient for large downscales, especially
>    when transformed. Approaches involving:
>
>     - Sampling from a mipmap
>     - Sampling a sparse set of irregularly distributed points
>
>    will be more efficient.

I'm not sure what "Sampling a sparse set of iregularly distributed
points" means.

>  * On a hardware backend, running even a large convolution filter in
>    a shader is likely possible on current hardware, but it's not
>    taking efficient use of how hardware is designed to sample images.
>
> Here are my suggestions with an eye to getting a release of 1.14
> out quickly
>
>  * The convolution code is restricted to the BEST filter, leaving
>    GOOD and BILINEAR untouched for now. The goal in the future
>    is we'd try to get similar quality for all backends, whether by:
>
>     A) Triggering a fallback
>     B) Implementing convolution with the same filter
>     C) Using an entirely different technique with the similar
>        quality.
>
>  * For BEST a filter is used that is more like what is used for
>    GOOD in the current code - i.e. 10x slowdown from BILINEAR,
>    not 100x.
>  
>  * An attempt is made to address the bugs listed above.

Under the assumption that all we can do is change what the existing
cairo API does, my suggestions would be:

* BILINEAR and NEAREST should do what they are doing now

For GOOD and BEST, first compute scale factors xscale and yscale
based on the bounding box of a transformed destination
pixel. That is, consider the destination pixel as a square and
apply the transformation to that square. Then take the bounding
box of the resuling parallelogram and compute the scale factors
that would turn the destination pixel into that bounding box.

Then,

* GOOD should do:
  - If xscale and yscale are both > 1, then
    - Use PIXMAN_FILTER_BILINEAR
    - Otherwise, for each dimension:
      - If downscaling, use Box/Box/4
        - If upscaling, use Linear/Impulse/6

* BEST should do:
  - For each dimension:
    - If upscaling, use Cubic/Impulse/MAX(3, log of scale factors)
      - If downscaling, use Impulse/Lanczos3/4

Where the filters are given as Reconstruction/Sampling/Subsample-bits.

The rationale here is that this would make GOOD equivalent to GdkPixbuf
at least quality, and there is no reason it couldn't be equivalent or
better in performance. Since GdkPixbuf was the default image scaling for
GTK+ for many years without a lot of complaints about either performance
or quality, it makes sense to make this the default for cairo as well.

>  * In the future, the benchmark for GOOD is downscaling by
>    factors of two to the next biggest power of two, and sampling
>    from that with bilinear filtering. Pixel based backends
>    should do at least that well in *both* performance and
>    quality.

For a one-off downscaling I don't see how that this is all that much
more efficient than simply GdkPixbuf-style box filtering (ie.,
Box/Box/4).

In the power-of-two method, a source pixel will be touched
between one and two times for the downsampling, and then four
pixels will be touched per destination pixel. The GdkPixbuf style
Box filtering will touch each source pixel once, and then certain
pixels at positions that are a multiple of the filter size will
be touched once more. So regarding memory traffic, I'd expect the
Box/Box/4 filter to be competitive. It will likely also look
better.

The advantages to the power-of-two method are that it may do less and
simpler arithmetic, and that Pixman's bilinear filter has been optimized
a lot compared to the separable convolution code. I also agree that when
the transformation is not a pure scale, the power-of-two method is
likely much faster. (Though in this case there are big wins possible for
both methods by using a tiled access pattern to improve cache locality).

Regarding mipmaps, I think this is a somewhat orthogonal issue. They
likely are useful for applications that do interactive or animated
transformations, and so it would make sense to support them. But for
one-off scalings on CPUs, I just don't see how they would be faster or
higher quality than the convolutions.

Søren