[cairo] Reduce number of floating point operations
Carl Worth
cworth at cworth.org
Fri Sep 22 23:41:05 PDT 2006
On Thu, 14 Sep 2006 22:30:06 +0000, Aivars Kalvans wrote:
> This is basicly the same patch I sent a while ago to
> performance-list at gnome.org
> It simplifies matrix operations for identity matrices which are very
> common. I also wrote a very simple performance test although
> improvements will be noticeable on platforms without FPU.
Thanks for sending this. And in particular, thanks for sending it
with a performance test case. This is really fabulous!
While we're on the subject of floating-point, I only just now happened
to notice that we've been doing floating-point intersection
computations in the (old) tessellator (*blush*). I knew that we had
written fixed-point intersection code ages ago, but I had forgotten
that we had disabled it "temporarily" when noticing a bug in the
combination of the old tessellator plus the fixed-point intersection
code, and then we never got around to enabling it again.
Anyway, that's one more advantage that the new tessellator has going
for it. And it's one way to reduce floating-point reliance that I
hadn't even remembered was hiding inside of cairo, (it was sneaky
because I knew we converted all path data from floating- to
fixed-point right after the initial transformation, but it was easy to
forget that we had this old code that converted _back_ to
floating-point down deep in the tessellator).
So, that gets me excited to see how the new tessellator performs on
arm. I'll see if I can't get my Nokia 770 in functional shape for
cairo testing early next week and see how it does, (which would also
help me feel more comfortable committing some of the fixes below---I
also like being able to personally justify code before personally
committing it---whether for bug fix or for performance fix).
> Some profiles
> Before: http://www.o-hand.com/~jorn/pango-benchmarks/210-softfloat/cairo.txt
> After:
> http://www.o-hand.com/~jorn/pango-benchmarks/210-softfloat/cairo-floating-point.txt
Could you provide some guidance, (or a pointer to documentation), on
how one should read those?
I got the feeling that Behdad gave some patch review that didn't make
it to the list, but I'd like to offer some as well. The biggest item
is that there are several independent things going on here, so they
should be implemented as separate patches. The things I see from a
quick glance might be broken up as follows:
1. Add cairo/perf/text test case
2. Make cairo_gstate_translate/scale use more direct matrix
manipulations than cairo_matrix_multiply
3. Optimize matrix transformations for _cairo_gstate_show_glyphs
4. Optimize _cairo_matrix_is_identity
I have a few comments of review as well, see below:
> + * Copyright Â© 2006 Red Hat, Inc.
Instead of copying Red Hat copyright statements from existing files,
you should instead put a correct copyright statement for your
contribution, (whether personal or the name of your employer or
whomever has copyright interest in your work).
> + * appear in supporting documentation, and that the name of
> + * Red Hat, Inc. not be used in advertising or publicity pertaining
I should really fix all of these statements in the test suite to
change all occurrences of a specific entity such as "Red Hat, Inc." to
instead read "the copyright holders". That would make it much more
obvious that it is appropriate for others to add copyright statements
above these notices as they edit the files.
> - cairo_matrix_init_translate (&tmp, tx, ty);
> - cairo_matrix_multiply (&gstate->ctm, &tmp, &gstate->ctm);
> -
> - cairo_matrix_init_translate (&tmp, -tx, -ty);
> - cairo_matrix_multiply (&gstate->ctm_inverse, &gstate->ctm_inverse, &tmp);
> + cairo_matrix_translate (&gstate->ctm, tx, ty);
> + _cairo_matrix_translate_inverse (&gstate->ctm_inverse, tx, ty);
I'm not entirely against this change if it means a big performance
improvement. But I think we need to find a cleaner way to do it. The
old code had the benefit of being quite readable in that the same
functions were used for both matrices, and just inverted values passed
to the matrix initialization and inverted order to the matrix
multiplies.
I don't think that the names of "translate" and "translate_inverse"
are enough to capture all of that going on. (I'll touch more on this
when I get to the new _cairo_matrix functions below).
What might actually help here is to introduce a new notion into cairo
(perhaps cairo_transform_t ?) that would hold both a forward and an
inverse matrix. That would be useful both here and in other places
where dual matrix objects are kept in sync together. And if we did
that, would it also make sense to cache some bits of state about the
transformation as well, (is_identity, has_translation, etc.)?
> +static void
> +_cairo_gstate_transform_glyphs (cairo_gstate_t *gstate,
> + cairo_glyph_t *glyphs,
> + cairo_glyph_t *transformed_glyphs,
> + int num_glyphs)
This seems like a reasonable enough function, but it isn't as
self-documenting as I would like. The old usage made it very clear
what was going on:
> - transformed_glyphs[i].x = glyphs[i].x + gstate->font_matrix.x0;
> - transformed_glyphs[i].y = glyphs[i].y + gstate->font_matrix.y0;
> - _cairo_gstate_user_to_backend (gstate,
Namely, adding the offset from the font_matrix, then transforming from
user space to backend space. So it would be good to get some of that
information into the name of this new function, or at least into a
comment next to its definition.
> + if (_cairo_matrix_is_identity (&gstate->ctm) &&
> + _cairo_matrix_is_identity (&gstate->target->device_transform))
> + {
> +
> + /* Translation is constant for all glyphs, calculate it once */
> + tx += gstate->ctm.x0 + gstate->target->device_transform.x0;
> + ty += gstate->ctm.y0 + gstate->target->device_transform.y0;
I'm confused here. These identity matrices have translation components
of (0,0) by definition. So why are we adding them in here?
> + cairo_matrix_t transformation;
> + cairo_matrix_multiply (&transformation,
> + &gstate->ctm,
> + &gstate->target->device_transform);
Again, as far as making the code more self-documenting, a much better
name than transformation here would be user_to_backend. And would we
want to similar merging of ctm and device_transform in more general
ways?
Or you might also want to special-case for the (common) identity
device_transform.
> + /* Pre-calculate translation, so we can use
> + * ..._transform_distance() instead of ..._transform_point()
> + * and save 2 add-s per iteration */
> + cairo_matrix_transform_point (&transformation, &tx, &ty);
> +
> + for (i = 0; i < num_glyphs; ++i)
> + {
> + transformed_glyphs[i] = glyphs[i];
> + cairo_matrix_transform_distance (&transformation,
> + &transformed_glyphs[i].x,
> + &transformed_glyphs[i].y);
> + transformed_glyphs[i].x += tx;
> + transformed_glyphs[i].y += ty;
> + }
> + }
Is this a common-enough pattern in the cairo implementation that we
should write _cairo_matrix_transform_points to allow this trick to be
shared? Even if this is the only use, that might still improve the
readability of the current code.
> void
> cairo_matrix_translate (cairo_matrix_t *matrix, double tx, double ty)
> {
> - cairo_matrix_t tmp;
> -
> - cairo_matrix_init_translate (&tmp, tx, ty);
> + matrix->x0 += tx * matrix->xx + ty * matrix->xy;
> + matrix->y0 += tx * matrix->yx + ty * matrix->yy;
> +}
>
> - cairo_matrix_multiply (matrix, &tmp, matrix);
> +void
> +_cairo_matrix_translate_inverse (cairo_matrix_t *matrix, double tx, double ty)
> +{
> + matrix->x0 -= tx;
> + matrix->y0 -= ty;
> }
OK. I've got a big problem with the above two functions, (and the same
problem with cairo_matrix_scale and _cairo_matrix_scale_inverse). The
problem is that each pair of functions is too-similarly named for
having such different operations, (different not only in inverting the
translation/scale values but also in reversing the effective order of
the two matrices being implicitly multiplied here.
So, it's really hard to look at the two implementations above and not
be convinced that there are bugs in that one uses xx, xy, yx, and yy
while the other one does not.
And as I said before, this also makes the calling code harder to read
as well. I don't yet have a cleaner version to offer, but some ideas
are:
1. Explore my cairo_transform_t idea to see if it helps
2. Try to get create more informative names to capture the various
functional distinctions here.
3. Add comments explaining what's going on. Because the code alone now
looks quite confusing.
And of course, this is one reason the various items here need to be
broken up into independent patches. This little one, (which I don't
think is the most significant of your performance gains), is
definitely the one I have the most problems with.
> cairo_bool_t
> +_cairo_matrix_has_transform (const cairo_matrix_t *matrix)
> +{
> + return (matrix->xx != 1.0 || matrix->yx != 0.0 ||
> + matrix->xy != 0.0 || matrix->yy != 1.0 ||
> + matrix->x0 != 0.0 || matrix->y0 != 0.0);
> +}
Why is this function needed in addition to cairo_matrix_is_identity?
> +
> +cairo_bool_t
> _cairo_matrix_is_identity (const cairo_matrix_t *matrix)
> {
> +#if defined (__arm__)
> + /* memcmp() should be faster than 4 soft float operations */
> + static const cairo_matrix_t identity = {
> + 1.0, 0.0,
> + 0.0, 1.0,
> + 0.0, 0.0
> + };
> + return (memcmp (matrix, &identity, 4 * sizeof(double)) == 0);
> +#else
I think Behdad mentioned before that this optimization shouldn't be
restricted to arm architectures alone. And it would cleaner to use
sizeof(cairo_matrix_t) rather than 4 * sizeof(double).
And if checking this is so important, it might make sense to just
cache an "is_identity" bit, which would also eliminate any un-ease
someone might have over using memcmp on doubles.
Oh, and if you do throw in memcmp for _is_identity, then you might
also give the static identity matrix file scope and use it with
memcpy for cairo_matrix_init_identity.
I hope I haven't come across as too picky in all that. I really do
appreciate your contribution, and I look forward to seeing more soon!
Thanks,
-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20060922/3626bb33/attachment.pgp
More information about the cairo
mailing list