[cairo] Where's the bottleneck in using glitz/cairo/librsvg?

Mon Jun 12 09:34:15 PDT 2006

Jamey Sharp a écrit :
> My computational geometry background is ... non-existent, but assuming
> that your stroke and fill extent computations are correct, then avoiding
> the tesselator for those seems like a fine plan. I wonder how much
> difference it will make when the tesselator is implemented using the new
> algorithm though?
>
>   
Assuming the computations are correct, probably the difference between 
O(n) and O(n.logn)
and 'n' is generally large when decomposing beziers, more so when 
stroking paths.
> Specific code comments follow:
>
>   
>> diff -urNb cairo-1.1.6-org/pixman/src/icimage.c cairo-1.1.6-new/pixman/src/icimage.c
>> --- cairo-1.1.6-org/pixman/src/icimage.c	2006-06-09 08:25:48.000000000 +0200
>> +++ cairo-1.1.6-new/pixman/src/icimage.c	2006-06-09 08:25:33.000000000 +0200
>> @@ -179,6 +238,100 @@
>>  
>>      gradient->stopRange = tableSize;
>>  
>> +#ifdef oldDAVID
>> +    /* The position where the gradient begins and ends */
>> +    begin_pos = (stops[0].x * gradient->colorTableSize) >> 16;
>> +    end_pos = (stops[nstops - 1].x * gradient->colorTableSize) >> 16;
>> ...
>>     
>
> Did you mean to include this? The symbol oldDAVID is not being defined.
>
>   
It's something that should have been removed from the patch. I made a 
stupid mistake of optimizing some
code after reading the profile incorrectly; as such, I optimized a 
function that was never, or nearly never,
called. There is no point in keeping it then, hence the "oldDAVID".
>   
>> diff -urNb cairo-1.1.6-org/src/cairo-polygon.c cairo-1.1.6-new/src/cairo-polygon.c
>> --- cairo-1.1.6-org/src/cairo-polygon.c	2006-06-09 08:25:47.000000000 +0200
>> +++ cairo-1.1.6-new/src/cairo-polygon.c	2006-06-09 08:25:32.000000000 +0200
>> @@ -123,9 +135,7 @@
>>      polygon->num_edges++;
>>  
>>    DONE:
>> -    _cairo_polygon_move_to (polygon, p2);
>> -
>> -    return CAIRO_STATUS_SUCCESS;
>> +    return _cairo_polygon_move_to (polygon, p2);
>>  }
>>  
>>  cairo_status_t 
>>     
>
> This looks like a bug-fix, perhaps? Some explanation of this change
> might be nice.
>
>   
it looked like a bug-fix when I wrote it, but reading the code for 
_cairo_polygon_move_to
after that shows that the function cannot fail anyway, so the original 
code is OK too. Sorry
for that.

on the other hand, the call itself can be replaced by a simple 
"polygon->current_point = *p2",
saving a function call. ok, that's nit-picking :-)
>> diff -urNb cairo-1.1.6-org/src/cairo-spline.c cairo-1.1.6-new/src/cairo-spline.c
>> --- cairo-1.1.6-org/src/cairo-spline.c	2006-06-09 08:25:47.000000000 +0200
>> +++ cairo-1.1.6-new/src/cairo-spline.c	2006-06-09 08:25:32.000000000 +0200
>> @@ -225,6 +227,107 @@
>>      return _PointDistanceSquaredToPoint (p, &px);
>>  }
>>  
>> +#ifdef DAVID
>> ...
>> +cairo_status_t
>> +_cairo_spline_decompose_direct (cairo_spline_t              *spline,
>> +                                double                       tolerance,
>> +                                cairo_spline_point_func_t   *topoint,
>> +                                void                        *closure,
>> +                                cairo_bool_t                 keep_first)
>> +{
>> +    cairo_point_t   stack[64];   /* should be largely enough */
>>     
>
> As far as I can tell, you end the iteration early if your explicit stack
> overflows. To me that seems inconsistent with what I perceive as Cairo's
> philosophy of always giving correct results.
A 64-point stack is capable of managing (64-1)/3=21 bezier 
sub-divisions, which means that if
we consider a spline as large as 65536 pixels (the maximum path 
dimensions supported by Cairo),
we could have sub-arcs as small as 1/32 of a pixel (which can trivially 
be approximated by a line
segment without any perceivable loss of quality, isn't it ?)

Also, Bezier sub-divisions tend to approximate line segments pretty 
quickly, which means the back-off
code will stop the recursion well before you reach the bottom of your stack.

>> diff -urNb cairo-1.1.6-org/src/cairo-traps.c cairo-1.1.6-new/src/cairo-traps.c
>> --- cairo-1.1.6-org/src/cairo-traps.c	2006-06-09 08:25:47.000000000 +0200
>> +++ cairo-1.1.6-new/src/cairo-traps.c	2006-06-09 08:25:32.000000000 +0200
>> @@ -60,8 +60,36 @@
>>  static int
>>  _compare_cairo_edge_by_slope (const void *av, const void *bv);
>>  
>> -static cairo_fixed_16_16_t
>> -_compute_x (cairo_line_t *line, cairo_fixed_t y);
>> +#ifdef DAVID
>> +#  if defined(__GNUC__) && defined(i386)
>> +static __inline__ cairo_fixed_16_16_t
>> +_compute_x (cairo_line_t *line, cairo_fixed_t y)
>> +{
>> +    cairo_fixed_16_16_t dx = line->p2.x - line->p1.x;
>> +    cairo_fixed_16_16_t ey = y - line->p1.y;
>> +    cairo_fixed_16_16_t dy = line->p2.y - line->p1.y;
>> +    cairo_fixed_16_16_t ex;
>> +
>> +    __asm__ __volatile__ (
>> +        "imul  %%edx\n"
>> +        "idiv  %%ecx\n"
>> +        : "=a" (ex)
>> +        : "a"(dx), "d"(ey), "c"(dy)
>> +    );
>> +    return line->p1.x + ex;
>> +}
>> +#  else /* other platforms */
>> +static __inline__ cairo_fixed_16_16_t
>> +_compute_x (cairo_line_t *line, cairo_fixed_t y)
>> +{
>> +    cairo_fixed_16_16_t dx = line->p2.x - line->p1.x;
>> +    cairo_fixed_32_32_t ex = (cairo_fixed_48_16_t) (y - line->p1.y) * (cairo_fixed_48_16_t) dx;
>> +    cairo_fixed_16_16_t dy = line->p2.y - line->p1.y;
>> +
>> +    return line->p1.x + (ex / dy);
>> +}
>> +#  endif
>> +#endif
>>  
>>  static int
>>  _line_segs_intersect_ceil (cairo_line_t *left, cairo_line_t *right, cairo_fixed_t *y_ret);
>>     
>
> As long as the definition of _compute_x precedes the call, GCC will
> normally inline it since it's static and called only once.
Nope, it's called multiple times by the code, and hasn't been inlined by 
my version of GCC.
Moreover, it always calls a hidden function called __divi3, used to 
divide two 64-bits numbers
(which is slower than a 64-bits-by-32 division of the inlined assembly).

By the way, I've discovered that my inlined assembly is buggy, since it 
may result in a division
error in case of overflow (it seems it's possible to trigger this 
condition with the cairo-dock
benchmark), I've written an alternative version but will provide it 
through git.

> As for the inline assembly, did you check what code GCC was actually
> generating first? I'd be surprised if it was losing so badly that your
> inline assembly made a real difference.
>
>   
Yes, I did check, and it was losing badly. Mainly because __divi3 is a 
slow function for what's
needed here. Also, the overhead of a function call is not to be 
underestimated when your function
is in the top CPU eaters shown by the profiler. Ditto for 
_cairo_fixed_to_double()

Regards,

- David Turner
- The FreeType Project  (www.freetype.org)

***********************************************************************************
Information contained in this email message is confidential and may be privileged, and is intended only for use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the postmaster at nds.com and destroy the original message.
***********************************************************************************