# [cairo] The right approach to projective transformations

Bill Spitzak spitzak at gmail.com
Fri Aug 20 10:34:09 PDT 2010

```On 08/20/2010 03:03 AM, Maarten Bosmans wrote:
> The patch I send earlier mainly resulted in discussion about how many
> dimensions cairo should use and how big the transformation matrix
> should be. To sidestep that I first would like to define the scope of
> the proposed feature.
>
> I want to enable projective transformations in Cairo using a linear
> transformation of the 2D homogeneous coordinates. This is the same
> method that is already implemented in Pixman, using a 3x3 matrix.
> So I specifically do not want to add a z-axis. The feature is meant to
> enable the mapping of a rectangle to an arbitrary (convex)
> possible.

I looked at this some more and I do think 3x3 will work. I was confused
because I have always seen perspective as putting non-zero in the z->w
location in a 4x4 matrix. This location is ignored when translating 2D
coordinates to the 2D screen, but if the matrix is multiplied by others
then it can make the x->w and y->w locations non-zero to get perspective
on the screen.

However if you set these locations directly (which you are doing) then
you can get all possible results without bothering with these columns.

In addition the existing Cairo 2D transformations, and this
transformation, will concatenate correctly. Multiplying these 3x3
matrixes will produce the same result as multiplying two 4x4 matrixes
with the third row and column all zero. This means that even transforms
such as setting up a projection, and setting up another projection
inside that, will produce the desired result: it will look like the
inner projection has been done onto a flat plane and that plane
projected onto the final output.

However I think you will need 9 numbers, not 8. It is pretty easy to
make the lower-right number be zero so that normalization will not work:

| 1 0 0 |   | 1 0 1 |
| 0 1 0 | x | 0 1 0 |
|-1 0 1 |   | 0 0 1 |

I suspect that the lower-right number will often end up with very tiny
values and that normalizing the matrix will make all the other values
huge, resulting in overflow of even floating point after mulitplying
several with a normalization step after each multiply.

> In the patch I proposed this was accomplished by adding to more
> elements to cairo_matrix_t. Krzysztof suggested that this was
> unacceptable because of the ABI break.
> So if we need a new matrix type, wouldn't it make the most sense to
> just use the Pixman floating point matrix? That could then be the
> matrix stored in the gstate of the context.
>
> Any comments on how to implement such a feature?

I think also the numbers should be in either row or column order. Adding
more numbers to the end of the existing cairo_matrix_t puts them in this
order which makes little sense:

| 1 3 5 |
| 2 4 6 |
| 7 8 9 |

Column-major order would match OpenGL. However if pixman has it the
other way I would do whatever you are doing already.

> Also adding projective transformations means that translational
> invariance is lost. For things like cairo_rel_move_to and the likes
> this can be worked around, but I'm not sure how much trouble this
> gives in other places.

This is why I think any such change must also be accompanied by "line
width locking" and "font matrix locking" (as they were called in the
roadmap). Then if the font and line and dash are selected before the
perspective is set up, they will always draw the same size everywhere,
allowing the font cache to be reused and allowing the existing 2D
stroking algorithim to be reused. (Specifying the font/line/dash after
perspective is set up can throw an "unsupported" error, so it can be
reserved for actual perspective of fonts and lines.)

Font matrix locking means that the api to draw glyphs needs to be
altered: it should take an "anchor point" which is transformed by the
CTM, and a set of xy offsets relative to the anchor point, these are
transformed by the font matrix, to position each glyph.

Line width locking means that the "pen" in the current space needs 4
numbers to describe it, so a new api to get/set the pen is needed. I
also think it means the dashes must be specified in "pen space" (ie they
will get longer as the line gets thicker and the ends slant with the pen
space). To emulate the current api I would also track a "thickness"
value, it is set by the old line-width call, reset to 1 by the new pen
api, and dash patterns are divided by this value.

```