[cairo] Where's the bottleneck in using glitz/cairo/librsvg?

Jamey Sharp jamey at minilop.net
Fri Jun 9 11:58:34 PDT 2006


On Fri, Jun 09, 2006 at 11:16:11AM +0200, David Turner wrote:
> Hello Mirco, and all Cairo hackers,

I don't speak for the current Cairo hackers or anybody else, but I
thought I'd take a look through your patch and comment on it anyway. :-)
Short version: I think some of your proposed fixes are great ideas.

I love callgrind, especially with --dump-instr and kcachegrind, but I'd
suggest retesting each of these changes with oprofile as well. In
particular, your 3.5% improvement in _cairo_fixed_to_double and your
results from using an explicit stack in spline decomposition might be
entirely explained by callgrind overhead.

A series of small patches addressing each individual performance issue
would be preferable to this one large patch. When you have GIT set up,
you can commit each change to your local repository separately and then
use git-format-patch to easily generate Cairo's preferred patch style.

Stylistically, I don't think the #ifdefs are helpful for understanding
your changes. Changing the code directly and producing a patch from that
would probably be better. Please avoid whitespace changes in the midst
of your much more interesting contributions, as well.

_cairo_spline_decompose_direct and _cairo_polygon_init_static seem like
quite good ideas to me. They seem like small changes that ought to
provide decent improvements. I'd like to see detailed performance data
for them though.

My computational geometry background is ... non-existent, but assuming
that your stroke and fill extent computations are correct, then avoiding
the tesselator for those seems like a fine plan. I wonder how much
difference it will make when the tesselator is implemented using the new
algorithm though?

Specific code comments follow:

> diff -urNb cairo-1.1.6-org/pixman/src/icimage.c cairo-1.1.6-new/pixman/src/icimage.c
> --- cairo-1.1.6-org/pixman/src/icimage.c	2006-06-09 08:25:48.000000000 +0200
> +++ cairo-1.1.6-new/pixman/src/icimage.c	2006-06-09 08:25:33.000000000 +0200
> @@ -179,6 +238,100 @@
>  
>      gradient->stopRange = tableSize;
>  
> +#ifdef oldDAVID
> +    /* The position where the gradient begins and ends */
> +    begin_pos = (stops[0].x * gradient->colorTableSize) >> 16;
> +    end_pos = (stops[nstops - 1].x * gradient->colorTableSize) >> 16;
> ...

Did you mean to include this? The symbol oldDAVID is not being defined.

> @@ -212,10 +365,10 @@
>  
>      /* After last point */
>      while (pos < gradient->colorTableSize) {
> -	gradient->colorTable[pos] =
> -	    xRenderColorToCard32 (stops[nstops - 1].color);
> +        gradient->colorTable[pos] = xRenderColorToCard32(stops[nstops - 1].color);
>          ++pos;
>      }

This looks like a particularly annoying code style bug, but again that
belongs in a separate patch.

> diff -urNb cairo-1.1.6-org/src/cairo-polygon.c cairo-1.1.6-new/src/cairo-polygon.c
> --- cairo-1.1.6-org/src/cairo-polygon.c	2006-06-09 08:25:47.000000000 +0200
> +++ cairo-1.1.6-new/src/cairo-polygon.c	2006-06-09 08:25:32.000000000 +0200
> @@ -123,9 +135,7 @@
>      polygon->num_edges++;
>  
>    DONE:
> -    _cairo_polygon_move_to (polygon, p2);
> -
> -    return CAIRO_STATUS_SUCCESS;
> +    return _cairo_polygon_move_to (polygon, p2);
>  }
>  
>  cairo_status_t 

This looks like a bug-fix, perhaps? Some explanation of this change
might be nice.

> diff -urNb cairo-1.1.6-org/src/cairo-spline.c cairo-1.1.6-new/src/cairo-spline.c
> --- cairo-1.1.6-org/src/cairo-spline.c	2006-06-09 08:25:47.000000000 +0200
> +++ cairo-1.1.6-new/src/cairo-spline.c	2006-06-09 08:25:32.000000000 +0200
> @@ -225,6 +227,107 @@
>      return _PointDistanceSquaredToPoint (p, &px);
>  }
>  
> +#ifdef DAVID
> ...
> +cairo_status_t
> +_cairo_spline_decompose_direct (cairo_spline_t              *spline,
> +                                double                       tolerance,
> +                                cairo_spline_point_func_t   *topoint,
> +                                void                        *closure,
> +                                cairo_bool_t                 keep_first)
> +{
> +    cairo_point_t   stack[64];   /* should be largely enough */

As far as I can tell, you end the iteration early if your explicit stack
overflows. To me that seems inconsistent with what I perceive as Cairo's
philosophy of always giving correct results. Since I suspect that your
gains from eliminating the recursion are largely due to callgrind
overhead, I'm a bit skeptical of this change.

> diff -urNb cairo-1.1.6-org/src/cairo-traps.c cairo-1.1.6-new/src/cairo-traps.c
> --- cairo-1.1.6-org/src/cairo-traps.c	2006-06-09 08:25:47.000000000 +0200
> +++ cairo-1.1.6-new/src/cairo-traps.c	2006-06-09 08:25:32.000000000 +0200
> @@ -60,8 +60,36 @@
>  static int
>  _compare_cairo_edge_by_slope (const void *av, const void *bv);
>  
> -static cairo_fixed_16_16_t
> -_compute_x (cairo_line_t *line, cairo_fixed_t y);
> +#ifdef DAVID
> +#  if defined(__GNUC__) && defined(i386)
> +static __inline__ cairo_fixed_16_16_t
> +_compute_x (cairo_line_t *line, cairo_fixed_t y)
> +{
> +    cairo_fixed_16_16_t dx = line->p2.x - line->p1.x;
> +    cairo_fixed_16_16_t ey = y - line->p1.y;
> +    cairo_fixed_16_16_t dy = line->p2.y - line->p1.y;
> +    cairo_fixed_16_16_t ex;
> +
> +    __asm__ __volatile__ (
> +        "imul  %%edx\n"
> +        "idiv  %%ecx\n"
> +        : "=a" (ex)
> +        : "a"(dx), "d"(ey), "c"(dy)
> +    );
> +    return line->p1.x + ex;
> +}
> +#  else /* other platforms */
> +static __inline__ cairo_fixed_16_16_t
> +_compute_x (cairo_line_t *line, cairo_fixed_t y)
> +{
> +    cairo_fixed_16_16_t dx = line->p2.x - line->p1.x;
> +    cairo_fixed_32_32_t ex = (cairo_fixed_48_16_t) (y - line->p1.y) * (cairo_fixed_48_16_t) dx;
> +    cairo_fixed_16_16_t dy = line->p2.y - line->p1.y;
> +
> +    return line->p1.x + (ex / dy);
> +}
> +#  endif
> +#endif
>  
>  static int
>  _line_segs_intersect_ceil (cairo_line_t *left, cairo_line_t *right, cairo_fixed_t *y_ret);

As long as the definition of _compute_x precedes the call, GCC will
normally inline it since it's static and called only once. I expect that
simply moving the definition above _cairo_traps_tessellate_triangle will
get most of the performance gain you're seeing, without adding
__inline__.

As for the inline assembly, did you check what code GCC was actually
generating first? I'd be surprised if it was losing so badly that your
inline assembly made a real difference.

Interesting patch, anyway, and nice work on profiling. Making sure the
test suite still passes was a definite plus as well. :-)

--Jamey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.freedesktop.org/archives/cairo/attachments/20060609/c677f2a1/attachment.pgp


More information about the cairo mailing list