[cairo] The right approach to projective transformations

Fri Aug 20 10:34:09 PDT 2010

On 08/20/2010 03:03 AM, Maarten Bosmans wrote:
> The patch I send earlier mainly resulted in discussion about how many
> dimensions cairo should use and how big the transformation matrix
> should be. To sidestep that I first would like to define the scope of
> the proposed feature.
>
> I want to enable projective transformations in Cairo using a linear
> transformation of the 2D homogeneous coordinates. This is the same
> method that is already implemented in Pixman, using a 3x3 matrix.
> So I specifically do not want to add a z-axis. The feature is meant to
> enable the mapping of a rectangle to an arbitrary (convex)
> quadrilateral instead of only the parallelograms that are currently
> possible.

I looked at this some more and I do think 3x3 will work. I was confused 
because I have always seen perspective as putting non-zero in the z->w 
location in a 4x4 matrix. This location is ignored when translating 2D 
coordinates to the 2D screen, but if the matrix is multiplied by others 
then it can make the x->w and y->w locations non-zero to get perspective 
on the screen.

However if you set these locations directly (which you are doing) then 
you can get all possible results without bothering with these columns.

In addition the existing Cairo 2D transformations, and this 
transformation, will concatenate correctly. Multiplying these 3x3 
matrixes will produce the same result as multiplying two 4x4 matrixes 
with the third row and column all zero. This means that even transforms 
such as setting up a projection, and setting up another projection 
inside that, will produce the desired result: it will look like the 
inner projection has been done onto a flat plane and that plane 
projected onto the final output.

However I think you will need 9 numbers, not 8. It is pretty easy to 
make the lower-right number be zero so that normalization will not work:

	| 1 0 0 |   | 1 0 1 |
	| 0 1 0 | x | 0 1 0 |
	|-1 0 1 |   | 0 0 1 |

I suspect that the lower-right number will often end up with very tiny 
values and that normalizing the matrix will make all the other values 
huge, resulting in overflow of even floating point after mulitplying 
several with a normalization step after each multiply.

> In the patch I proposed this was accomplished by adding to more
> elements to cairo_matrix_t. Krzysztof suggested that this was
> unacceptable because of the ABI break.
> So if we need a new matrix type, wouldn't it make the most sense to
> just use the Pixman floating point matrix? That could then be the
> matrix stored in the gstate of the context.
>
> Any comments on how to implement such a feature?

I think also the numbers should be in either row or column order. Adding 
more numbers to the end of the existing cairo_matrix_t puts them in this 
order which makes little sense:

	| 1 3 5 |
	| 2 4 6 |
	| 7 8 9 |

Column-major order would match OpenGL. However if pixman has it the 
other way I would do whatever you are doing already.

> Also adding projective transformations means that translational
> invariance is lost. For things like cairo_rel_move_to and the likes
> this can be worked around, but I'm not sure how much trouble this
> gives in other places.

This is why I think any such change must also be accompanied by "line 
width locking" and "font matrix locking" (as they were called in the 
roadmap). Then if the font and line and dash are selected before the 
perspective is set up, they will always draw the same size everywhere, 
allowing the font cache to be reused and allowing the existing 2D 
stroking algorithim to be reused. (Specifying the font/line/dash after 
perspective is set up can throw an "unsupported" error, so it can be 
reserved for actual perspective of fonts and lines.)

Font matrix locking means that the api to draw glyphs needs to be 
altered: it should take an "anchor point" which is transformed by the 
CTM, and a set of xy offsets relative to the anchor point, these are 
transformed by the font matrix, to position each glyph.

Line width locking means that the "pen" in the current space needs 4 
numbers to describe it, so a new api to get/set the pen is needed. I 
also think it means the dashes must be specified in "pen space" (ie they 
will get longer as the line gets thicker and the ends slant with the pen 
space). To emulate the current api I would also track a "thickness" 
value, it is set by the old line-width call, reset to 1 by the new pen 
api, and dash patterns are divided by this value.