[cairo] Pixman Performance Regressions

Thu Aug 6 15:07:35 PDT 2009

Recently I spent some time looking at performance regressions for
pixman master vs. pixman 0.14.0. What I did was run each of the
cairo-traces once against the image and once against the xlib
backend. The CPU was a 3.8 GHz Pentium 4; I used a fresh X server for
each trace.

The resulting data is here:

    http://www.daimi.au.dk/~sandmann/regression-data

The first two columns contain pixman-0.14.0 and current master
respectively. Comparing them, there is not much difference, but master
is a little slower for most of the benchmarks.

By far the biggest regressions are the swfdec-fill-rate benchmarks,
where master is approximately 30% slower. There are also regressions
for glyph-heavy benchmarks with the image backend, and a small general
slowdown with the xlib backend.

On some benchmarks, master is much faster, notably, evolution-20090607
and swfdec-youtube. These may just be outliers where the measurements
for pixman 0.14.0 went wrong. because I don't really know any reason
they would be faster. It could be that the reason is disk thrashing;
some of the profiles indicates a lot of time spent in the page fault
handler in the kernel. Other measurements also have the occasional
outlier. See firefox-36-20090609 for the pixel-fetch branch, for
example.

The gnome-terminal-20090601 benchmark ran out of memory on all
branches for the xlib backend.

Fixes

The other columns in the table show the data for branches with fixes
for these regressions. Each branch is a superset of the preceding one,
so for example "glyph-fast-path" branched off of "validate" which
branched off of "pixel-fetch".

The reason for the large fill-rate-2xaa regression is the change to
the general path described here:

    http://lists.x.org/archives/xorg-devel/2009-June/001014.html

The data from the regular performance test suite showed no real
impact, but the fill-rate-2xaa trace is a more realistic and
believable benchmark and it shows that the coordinate lists were just
not a good idea.

In the branch pixel-fetch here

    http://cgit.freedesktop.org/~sandmann/pixman/log/?h=pixel-fetch

I went back to processing pixels one at a time. The results are in the
third column of the table. Both of the swfdec-fill-rate-2xaa and the
-4xaa benchmarks are much improved and in both cases slightly better
than 0.14.0.

The small across-the-board X slowdown was a matter of speeding up
image creation. In the validate branch:

    http://cgit.freedesktop.org/~sandmann/pixman/log/?h=validate

setting image properties just cause the image to be marked dirty. It
is then validated before actually being used. This means setting
virtual functions etc. only happens once instead of on every property
change.

The glyph regresssion for the image backend is because the image
backend unlike the X backend is drawing glyphs in a way that doesn't
have fast path support in pixman. This means many calls to the general
path with small images. In master this is inherently slower because we
now do proper clipping instead of relying on region calculus. I'm not
sure there is a general fix for this, but we can fix the concrete
symptom by writing fast paths for glyph rendering. In this branch:

    http://cgit.freedesktop.org/~sandmann/pixman/log/?h=glyph-fast-path

there are sse2 and C fast paths.

With that change, pixman master is as fast or faster than 0.14.0 on
essentially all benchmarks. 

I did a couple of other experiments: adding a cache in front of the
fast path lookup and making the fast path table walking faster. These
were slight speedups in some cases, but didn't make a huge difference.
It may be interesting to look more at this in the 0.17 series, but I
don't intend to merge them for 0.16.0.

The branches are here:

    http://cgit.freedesktop.org/~sandmann/pixman/log/?h=fast-path-cache
    http://cgit.freedesktop.org/~sandmann/pixman/log/?h=faster-fast-path

Soren