[cairo] Cairo 1.3 performance loss

Mon Jan 29 03:25:13 PST 2007

Hi,

On Thu, 2007-01-25 at 09:06 -0800, Carl Worth wrote:
> On Wed, 24 Jan 2007 15:48:55 +0200, Jorn Baayen wrote:
> > Comparing cairo 1.2.4 and 1.3.12 on ARM, a performance loss of 2% is
> > observed when drawing GTK+ widgets[1]. On closer inspection, it turns
> > out that the new tessellator may be to blame.
> ...
> > [1] http://folks.o-hand.com/~jorn/cairo-benchmarks/
> 
> I see a lot of different profiles and things there. Can you guide me
> through your analysis in a bit more detail? Where do you see the 2%
> performance change? And where do you make a connection with the
> tessellator?

What we're looking here is at the output and profiles of the following
tests:

Constantly redrawing
 o an empty GtkEntry (results in $CAIRO_VERSION/gtkentry/)
 o a non-empty GtkLabel (results in $CAIRO_VERSION/gtklabel/)
for one minute. The amount of times the widget was redrawn is written to
timewidgets.txt.

While a test is run, oprofile runs. The full oprofile profile is dumped
in full-profile.txt (in the same dir as above), and for convenience the
individual profiles of cairo, pango, and gtk are also dumped in separate
files.

The versions of other libraries in the stack are constant. These
include:
o Pango 1.15.2
o Gtk+ 2.10.6 (And GTK+ 2.6.7 in 'no-cairo')

Now, if we look at the GtkEntry drawing test for cairo 1.2.4, we see
that a GtkEntry was drawn 6181 times. This figure is 6035 for cairo
1.3.12, which is a loss of ~ 2%.

(But if we look at GtkLabel for the same cairo versions, we get the
numbers 733 and 750, representing a performance gain of ~ 2%.)

The connection with the tessellator I make as follows. If we look at the
full profiles (full-report.txt) for GtkEntry drawing, we see that, for
cairo 1.2.4, cairo_traps_tessellate_polygon() takes up 0.17% of the time
spent in all functions over the whole system stack. For cairo 1.3.12,
cairo_bentley_ottmann_tessellate_polygon() takes 0.54%. More on this
later.

> 
> > Whereas _cairo_traps_tessellate_polygon() takes up 0.17% of the overall
> > system profile (1.9% in the cairo profile) in 1.2.4,
> > _cairo_bentley_ottmann_tessellate_polygon() takes up 0.54% of the
> > overall system profile (7.1% in the cairo profile) in 1.3.12.
> 
> Is it even meaningful at all to compare relative percentages (which is
> all you get from something like oprofile) between separate runs?
> At the very least you would have to normalize these numbers against
> something.

As the difference is (only) 2%, normalizing these figures shouldn't make
a significant difference. 

> I don't know the right numbers to use, but let me attempt with what
> you gave me. For convenience let's invent a time unit (bogoticks) such
> that the original run took 100 bogoticks. Then, if there's a
> performance loss of 2% the second run took 102 bogoticks.
> 
> Now, let's look at the contribution of the tessellation calls to those
> times:
> 
> Old:	0.0017 * 100 = 0.17 bogoticks
> New:	0.0054 * 102 = 0.55 bogoticks
>
> So, (if this analysis is correct), then, yeah, you can say that the
> tessellator is a little more than 3 times slower. But that still 
> accounts for less than one third of the overall slowdown.

I don't think this analysis is correct. The percentages we are looking
at represent what part of all the time spent in all functions over the
whole system stack was spent in our functions. The time the tests were
run is constant, only the number of draws achieved differ.

Oprofile also gives us the number of times it recorded a function (the
'samples' column in the profiles). If we divide the number of samples
by the number of draws, we get:

Old:     393 / 6181 = 0.064 samples/draw
New:    1253 / 6035 = 0.21  samples/draw

That is just like your analysis (as well as the raw percentages)
suggesting a just over 3 times tessellator slowdown.

Perhaps more interesting would be to look purely at the cairo profiles,
because the percentages here are relative to the time spent in cairo
only. Let's say that one draw using 1.2.4 takes 100 bogoticks inside
cairo, and one draw using the 2% slower 1.3.12 102 bogoticks inside
cairo. The relative percentages for the tessellator functions are 
2.5% in 1.2.4 and 7.1% in 1.3.12. Let's have a look at the bogoticks
spent tessellating:

Old:   0.025 * 100 = 2.5 bogoticks
New:   0.071 * 102 = 7.2 bogoticks

This would suggest that the tessellator is responsible for more than the
2% slowdown!

> I'd definitely like to reduce the overhead of the new tessellator,
> (particularly for simple and common shapes). What would be useful here
> is to find out exactly what's being drawn when the tessellator is
> getting called. Are these just rectangles?

It's drawing just rectangles. From cairo.txt we can see that the
tessellator is being called from:

14        0.2912  libcairo.so.2.10.3       _cairo_surface_fallback_fill
85        1.7679  libcairo.so.2.10.3       _cairo_clip_clip
4709     97.9409  libcairo.so.2.10.3       _cairo_path_fixed_fill_to_traps

> 
> One fairly easy way to extract the cairo calls being used is to use
> Jeff's excellent libcairowrap. It's available here:
> 
> http://people.freedesktop.org/~jrmuizel/libcairowrap-0.1.tar.gz

Thanks, this is being very helpful.

> 
> And you can ask on this list if you have any questions about using it.
> 
> > As suggested earlier[2], I added counters to _line_segs_intersect_ceil()
> > in 1.2.4 and to _cairo_bo_edge_intersect() in 1.3.12. It turns out that,
> > per widget draw, _line_segs_intersect_ceil() is called 4 times, but
> > _cairo_bo_edge_intersect() 7 times. Is this 75% increase according to
> > expectations?
> 
> Now that is some really useful information. Thanks! Again, to start
> answering the question, the first thing to do is to find out what is
> being drawn so we can replicate this, (and perhaps add a cairo/perf
> case that actually does report time measurements rather than just
> percentages).
>
> > I suspect that the decrease in tessellator performance is responsible
> > for more than the observed 2% slowdown, as 1.3 contains many
> > optimizations that are not in 1.2.
> 
> And I can't see any evidence of that. The calculations I made above
> show the tessellator slowdown contributing a minor part of the overall
> slowdown, and definitely not more. 

My calculations above support a different conclusion ..

> As for other optimizations in 1.3,
> yes there are a lot of improvement, but do you know that these little
> focused tests you're doing actually benefit from any?

At the very least the tests should benefit from the double-to-fixed
optimizations.

Thanks,

Jorn

-- 
OpenedHand Ltd.
http://o-hand.com/