the mysterious cairo-perf (was: Re: [cairo] [PATCH] Performance patch for bo intersections)

Wed Dec 13 12:22:44 PST 2006

Hi,

On Wed, 13 Dec 2006, Baz wrote:

> > > xlib-rgba    paint_solid_rgba_source-256    0.48 5.60% ->   0.22
> > > 5.30%:  2.20x speedup

> Unfortunately it may be fictional. Testing another patch for
> bo_end_edge I got up to 9x speedup, but only on xlib cases, which
> seemed mighty suspicious; I put it down to the general bustedness of
> xlib on OS X.  However, retrying cairo-perf-diff with
> CAIRO_TEST_BACKEND=image to get more trustworthy results is
> consistently seeing a 30% speedup for paint_solid tests.

You can get more consistent results (at least for the image
backend) by running cairo-perf with a high priority as root:

sudo nice -n -12 ./cairo-perf [...]

Note that since the tests are getting more love from the
scheduler they also run consistently faster, and so cairo-perfs
run w/ "sudo nice" probably aren't really comparable with timings
that weren't.

> I'm at a loss to explain why - it doesn't seem to hit the
> changed code in the profiles.

Some plausible causes:

- The test is extremely short, so it is quite possible that all
the runs of that test were run with the same background load &
noise to yield low variance results, while still having a large
constant term from ongoing processing on the system.

- cairo-perf is discarding the runs that it thinks went too
quickly.  This seems slightly odd to me -- aren't the runs that
went the fastest the ones that have the least error in them?
(assuming we can trust gettimeofday())

Having said that, the shorter runs definitely seem to be the ones
that are likely the most sensitive to conspiring accidents of
memory or code layout, and all that other nasty stuff which we
really don't have any control over in the source code.  So what
happens often is that you make a change in the code, recompile,
and see significant(ish) speedups & slowdowns in totally
unrelated code.

Sometimes I've seen extremely odd timings for some of the tests,
where the measured timings cluster tightly around two or more
times, resulting in very high variance (up to 22% stddev in one
particularly bad case.)  The variance won't go down even with
ever increasing iterations due to the multimodal distribution of
timings.  Again, this only seems to become an issue with shorter
tests.  My working hypothesis is that the tests are interacting
with the kernel scheduler in a bad way.

Joonas