[cairo] [Pixman] Planar YUV support

Fri Mar 4 13:47:58 PST 2011

Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

>>> The pipeline as it is now:
>>>
>>>    1 convert image sample to a8r8g8b8
>>>    2 extend sample grid in all directions according to repeat
>>>    3 interpolate between sample according to filter
>>>    4 transform
>>>    5 resample
>>>    6 combine
>>>    7 store
> 
>> What is the difference between "3 interpolate between sample according
>> to filter" and "5 resample"?
> 
> The output of stage 3 is an image that is defined on all of the real
> plane. There are no pixels any more, so there is no question about what
> stage 4, "transform", means. Stage 5 converts back to pixels by point
> sampling.

This is a very poor description of what should be happening. You cannot 
do Stage 5 as a point sample. This is what the bilinear interpolation is 
doing, and everybody should have realized by now that the output image 
is no good for scales less than .5.

It is MUCH better to combine steps 3,4,5 together. The goal is to 
produce a pixel in the output coordinate system. This is done by making 
up a filter that will vary depending on the transform and the output 
pixel, applying it to the source image, and the result is the output 
pixel. It is absolutely impossible to do a "sample" step last that does 
not take into account the transform.

For affine transforms an output pixel maps to a parallelogram on the 
input image. This parallelogram can be much bigger or much smaller than 
a single pixel. The parallelogram has 6 degres of freedom. It is obvious 
that two numbers cannot describe the parallelogram, and therefore you 
cannot use point sampling (no matter how fancy your bilinear 
interpolation is) to produce the output image.

>>> To add support for potentially subsampled YUV, some additional stages
>>> have to be inserted before the first:
>>>
>>>   -2 interpolate subsampled components of YUV to get the same
>>>      resolution as the Y plane
>>>
>>>   -1 if the format is planar, stitch together components to form YUV
>>>      pixels
>>>
>>>    0 convert to sRGB
>>>
>>> Stage -2 is important because the filter used in that interpolation
>>> should probably be user-specifiable eventually, which has the
>>> implication that whatever simple support is added first, it needs to be
>>> clear what filter precisely is being used.

No I do not think the filter for UV should be "user specified". You are 
adding meaningless complexity to the API and actually *preventing* the 
interpolation from being improved.

It is quite possible to merge step -2 with the transform. The 
parallelogram I described above would be 1/2 as big for the UV planes. 
Also it may be shifted 1/2 pixel between U and V (because most producers 
subsample the UV by averaging different pairs of pixels).

It does mean you cannot do "extend" of "black" as an earlier step. 
However I very strongly believe that the current cairo behavior is not 
wanted by anybody and is inefficient on modern hardware. See below about 
this.

>>> Stage 0 is a color space conversion and need to eventually be
>>> configurable too, which means it has to be specified which matrix is
>>> being used.

I believe some assumptions can be made about the color samples so that 
stage 0 can be moved to a later point.

All color spaces of interest have orthogonal channels which can be 
filtered independently. Thus the filtering can be done before conversion.

If a channel is non-linear, it technically will effect the filtering. 
See below for comments on why I think this may not be necessary. Even if 
it is necessary, all interesting non-linear channels are so close to a 
power of 2 that a single alternate filter that squares the input image, 
applies the same filter, and does the square root, will produce an 
answer that is accurate to 5 bits for the worst case of a white pixel 
next to black, and well over 12 bits for most photographic images.

> Note that if some day we add compositing in linear RGB, the alternative
> process breaks down because the initial interpolation will be taking
> place in non-linear color space, whereas with intermediates in linear
> RGB, you'd want to do the second interpolation (but not the first) in
> linear light.

I do not think there is a requirement that the transform filtering be 
done in linear space. It could be useful, but it will not completely 
break doing the rest of the composite in linear RGB.

The reason is that for low-contrast images the gamma curve between two 
adjacent pixels is extremely close to a straight line and thus the 
result is almost identical.

There are problems with doing transforms in linear space:

For very large scales of high contrast images users are unhappy with 
true linear filtering and prefer the gamma filtering. The reason is that 
once the pixels become visible it becomes a perceptual rather than 
physical appearance and the image just looks "wrong". This will mostly 
effect "magnifier" applications for enlarging already-rendered text.

Linear filtering can also have very nasty side effects if the images are 
premultiplied. The premultiplied pixels have been stored at much lower 
resolution, in effect, and linearization can produce very bright colors 
that will produce artifacts when blended with neighboring pixels.

If you do linear filtering it may only want to be done for scales less 
than 1. Also there is no need to do it on color spaces where high 
contrast is already poorly supported, so there is no need to do it to 
the UV channels.

> There is also a question of what to do with YUV images with a
> non-premultiplied alpha channel. Interpolating the samples of such an
> image direclty is definitely wrong, but it may be that simply
> premultiplying first will work.

Filtering non-premulitplied data is a problem with all data formats, not 
just YUV!

The problem is that where the alpha is zero the color is often black. A 
filter that covers this area will bleed black into the object, making 
the resulting image as though the object turns slightly darker at the 
edges. The only way to get "correct" results is to ignore the 
contribution of alpha zero pixels to the filter for the color channels. 
Depending on the source you may have to ignore tiny alphas as well (some 
programs produce this with black due to internal filtering).

You do have to watch out for "premultiplied" YUV where the UV channels 
go towards what is really the maximum negative value, rather than the 
neutral value, as the alpha goes to zero. You can easily correct these 
by adding (255-alpha)/2 to the UV channels.

> The two-interpolation pipeline has the practical benefit that chroma
> reconstruction can be done in the fetchers, at least as long as the
> chroma filter is fixed, where as the one-step process means the general
> code for bilinear filtering would have to sample each component
> individually, then filter, and then do a color conversion. It would no
> longer be able to simply ask the underlying system to fetch an RGB
> pixel.

Here is the steps as I see it, with the parts that I believe CANNOT be 
separated are made a single step:

   1. Widen to 8 bit components
   2. Extend sample grid but use "repeat" for "black outside"
   3. Transform/filter to 1 sample per output pixel
   4. Convert to interlaced
   5. Convert to sRGB
   6. Do "black outside" by multiplying by an antialiased quad
   7. Composite into output buffer