[cairo] Automated testing of Cairo

Thu Aug 10 15:33:22 PDT 2006

On Thu, Aug 10, 2006 at 06:50:38AM -0700, Carl Worth wrote:
> On Thu, 10 Aug 2006 02:41:04 -0700, Bryce Harrington wrote:
> >
> > I'm happy to be able to announce that OSDL will be supporting Cairo by
> > providing automated testing on various platforms.
> 
> Hey Bryce,
> 
> Thank you so much for doing this. This looks fantastic and should be
> extremely valuable. I really appreciate it!
> 
> >  * Right now, once a day git snapshots are pulled and 'make check' run.
> >    Currently, they're being run on three x86 systems:
> >     - Gentoo P4 x86/32
> >     - Redhat P4 x86/32
> >     - Gentoo Xeon x86/64 (but in 32-bit mode)
> 
> Very nice. One thing that would help in the reports is to be able to
> determine which machine is which. I see names like nfs08, nfs09, and
> nfs11 but can't find any information about what the configuration of
> any machine is. Perhaps if each machine name were a link to a page
> describing its configuration?

Heh, you found the part that I was least satisfied with.  ;-)
Yes, that area does require some work.

For now, if you click on the run ID and go into the sysinfo dir you can
find a summary page for each machine.  E.g.:

  http://crucible.osdl.org/runs/1466/sysinfo/nfs08.1/summary.html
  http://crucible.osdl.org/runs/1466/sysinfo/nfs09.1/summary.html
  http://crucible.osdl.org/runs/1466/sysinfo/nfs11.1/summary.html

Basically, 8 and 9 are the same hardware (P4), but the former has debian
on it, and the latter has FC5.  11 is a Xeon-64, running in 32-bit mode,
with gentoo on it.

I'd like to eventually add in our itanium2 and amd64, but they tend to
be a bit fussier so I left them out for now.  We also have a ppc64 that
I may be able to use for this, but it's downright ornery and we've had a
lot of trouble with it.  Inkscape also had a G5 donated for testing, so
that may be a possibility too.

> >  * Cairo results can be found here:
> >     http://crucible.osdl.org/runs/cairo_branches.html
> 
> This looks great. I really like the compact overview it provides. I do
> have a few comments though:
> 
> 1) For a git pull such as c3c7068 it would be nice if the link it
>    provided was directly into cairo's gitweb. For example the link
>    I would expect is to:
> 
> 	http://gitweb.freedesktop.org/?p=cairo;a=commit;h=c3c706873ef6a0e1318b1d4b4d4b6841758ea18d
> 
>    Currently you're providing a link to a diff from the previous run
>    (I think) which is also valuable since I don't think git provides
>    an easy way to get at that information.

Okay, done

> 2) As a nit: The git pull is labelled 1.2.2 but it should be 1.2.3
>    since that's what the version advertises itself as now. So maybe
>    with this and the above comment what I would like to see is:
> 
> 	1.2.3-c3c7068 (diff)
> 
>    where the "1.2.3-c3c7068" links into gitweb and the "diff" is a
>    link to the diff you currently provide.

Hmm, I may have to punt on this one.  The patching system relies on
having the base to apply to in the label, so when the label is set to
refer to the upcoming version, it gets confused looking for non-existant
tarballs and goes out to lunch.  I know this could be done a lot better,
but the code is currently a bit thorny; we've done some hard-codes
around this issue but they tended to just make the code worse, so if
it's just a minor nit, it would be convenient for me to just leave it
this way for now.  If it gets really annoying, let me know; I'll need to
rewrite that code anyway at some point.

> 3) The results all say "OK" now regardless of what failures
>    exist. We're going to need to make that say something very
>    different than "OK" for failures if this is going to be useful. ;-)

Yes!  This was something I wanted to bring up.  Can you give me some
sort of heuristic to use for differentiating between OK and BAD runs?
For example, in other tests we've grepped for specific success lines at
the end of the test log, or counted the number of PASSES, or looked for
the presence / absence of certain output files.

What would you suggest we use to detect failures?  I see there are
pass/fail counts in the html file - maybe something tied to that?

> 4) The three different machines seem to be getting cairo compiled with
>    different backends. The first two are testing image and ps while
>    the third seems to also be compiling xlib, but not successfully
>    testing it, (which likely means that the X libraries are available
>    on that machine but that no X server is available).
> 
>    It would be nice to see all the machines testing as many of our
>    "supported" backends as possible, (which would be "image, ps, pdf,
>    svg, and xlib"). The win32 backend is also supported, but that
>    would obviously require a separate machine for testing. I don't
>    know if OSDL is interested in hosting such a machine, (but win32 is
>    one backend that could benefit a lot from automated testing since
>    most of the core cairo hackers don't have ready access to any win32
>    systems).

Hmm.  I would have to check with management on that.  OSDL engineering
is pretty firmly FOSS, but if Cairo is providing and administering the
machine, then in the sake of interoperability it might be possible.
Can you get your hands on a machine and an admin for this?

>    The pdf and svg backends should be getting compiled already, but
>    they're likely not getting tested since poppler and librsvg are not
>    available, (Behdad, care to fix the handling of CAIRO_CAN_TEST so
>    that the backend still shows up in the tests but as UNTESTED rather
>    than not appearing at all?). As for xlib, it should be quite
>    reasonable to get a headless X server, (Xfake), running so that
>    even xlib could be tested. I can help out with this some.

Okay; give me a listing of the prerequisites to install in order to get
all these things turned on, and I'll get it set up.  (We'll need to
update the images for the various machines, so it's easiest if we can do
all the deps at once.)

> 5) The current PS failures for nfs08 are showing a lot of false
>    positives, (the diff images just show a single pixel differing in
>    some trivial amount, for example). Interestingly enough, the same
>    false positives are not appearing on nfs11. One difficulty here is
>    that the test suite is extremely unforgiving, so it may require a
>    precise version of ghostscript to reproduce the expected
>    results. And we definitely haven't done a good job of documenting
>    the precise version needed.

Here's what gs --help is reporting:

nfs08:  GPL Ghostscript 8.50 (2005-12-31)
        Input formats: PostScript PostScriptLevel1 PostScriptLevel2
        PostScriptLevel3 PDF
        Default output device: x11
nfs09:  Not installed
nfs11:  ESP Ghostscript 8.15.2 (2006-04-19)
        Input formats: PostScript PostScriptLevel1 PostScriptLevel2
        PostScriptLevel3 PDF
        Default output device: bbox

Pretty much we're just installing whatever is stock from the respective
distro.

>    Similar undocumented version dependencies also exist for poppler
>    and librsvg, (though more recent versions are pretty much always
>    better---the cairo test suite has exposed bugs in those libraries
>    for which we've pushed for fixes upstream). With poppler and
>    librsvg though, we're in a better situation than with ghostscript,
>    since the final rasterization still goes through cairo. So it
>    really is reasonable to expect the PDF and SVG backends to be able
>    to pass the tests even with the current, strict, not-even-one-bit-
>    can-differ behavior of our test suite. And if someone wanted to
>    write a cairo-based backend for ghostscript we could be there too.
> 
>    The punchline there is that we should make the effort to get rid of
>    all these false positives in the failure reports.

If you can do this, then we can also talk about varying the dependency
versions, for checking compatibility with newer or older versions of the
underlying libs.

> 6) The only current failure I see that isn't an obvious false positive
>    like those described above is the failure of
>    ft-text-vertical-layout which can be seen here:
> 
> 	http://crucible.osdl.org/runs/1466/test_output/cairo-test/nfs11/test/
> 
>    This is a failure that's hitting the image backend, which is the
>    worst kind since there's no chance of a false positive being
>    introduced by some external conversion tool like with PDF, PS, and
>    SVG. Some false positives do hit the image backend because of a
>    missing font, (Bitstream Vera is about the only thing we use I
>    think), or perhaps a freetype-dependency. But usually a problem
>    like that will manifest itself as failures in every text-using
>    test. So I'm not sure what might be happening here. Again, it would
>    help to know the configuration of this machine.

Hopefully this has the info you need:

   http://crucible.osdl.org/runs/1466/sysinfo/nfs11.1/

if not let me know and we'll add it.

> >  * More tests can (and should) be added.  Point me at what you'd like to
> >    have run.
> 
> I think the easiest way to add more tests is to just keep adding them
> to cairo/test. We try to do this for every bug report against cairo,
> so the list of available tests should just keep growing.

Great, makes it easy for me.  :-)

> >  * We also have an amd64 and an itanium2, and we can run the Xeon in
> >    64-bit mode, if you wish to do 64-bit testing.
> 
> Yes, it would be very helpful to have these. The more variety in
> platforms the better! There are a lot of cases where 64-bit-specific
> bugs in cairo have gone latent, (the most recent being the
> truetype-subsetting problem), so it would be very helpful to have
> automated testing to help us catch these issues earlier.

Okay.  There's probably going to be a variety of different issues to
work through, but I've just now hooked them all up:

  http://crucible.osdl.org/runs/cairo_branches.html

amd01:  OK
ppc01:  Fails during configure 
ita01:  Bunch of interesting issues during make check
nfs12:  OK

All four of the above are 64 bit mode machines running gentoo.

nfs12 is a dual-proc Xeon x86_64 machine running in 64-bit mode.
http://crucible.osdl.org/runs/1467/sysinfo/nfs12.1/summary.html

> >  * Developer login access to the SUTs is available on request.
> >    I can provide full access + instruction for test developers.
> 
> Hook me up and I can start looking into the library version issues
> discussed above.

Okay, offlist, please send me your ssh2 keys and preferred username.

> >  * If you have hardware worth adding to the pool for testing Cairo
> >    against, we can host it in our test environment here in Beaverton, OR
> >    (near Carl and Keith).  We would just need someone identified as the
> >    admin for it (esp. if it's a non-Linux box).
> 
> Particularly when we get performance tracking added to this setup (see
> below) it would be helpful to have some ARM system(s) added to the
> mix. Those would be Linux, so admin should be simple, though I could
> obviously help.

Okay cool.

> >  * Crucible gives a lot of flexibility for changing Linux kernels, so if
> >    there's any kernel-variation worth doing, we're well set up for
> >    automating that.
> 
> I hope that for the most part cairo doesn't care about the kernel,
> (well, for the xlib backend, things like DRM drivers could obviously
> be relevant), but this is certainly good to know.
> 
> > With Carl's announcement yesterday that the Cairo team is turning
> > attention to performance improvement work, this seems like a great time
> > to jump in.  I've been doing a lot of NFS performance work, so am hoping
> > some of our analysis tools (e.g. historical performance graphing) may be
> > reusable for Cairo without too much trouble.
> 
> The idea of getting automated performance testing of cairo on a wide
> variety of hardware---with historical tracking---is very compelling to
> me. I was hoping someone would step up and offer this, but I wasn't
> expecting to see it so soon. This will be great!
> 
> Since you've already got some existing tools for tracking, perhaps you
> can give me some suggestions on what the report format should look
> like. We're starting from scratch, (to some extent), so we can output
> pretty much whatever would be most convenient.

Sure.  Here is an example of a performance test we use a lot for NFS:
http://crucible.osdl.org/runs/1205/test_output/iozone.sys.log.svg

The left chart is a cross-section of the 3D plot on the right.  This
test is done by doing reads or writes of increasingly larger files,
using a series of different record sizes.  This plot looks fairly normal
- for small files and small record sizes there is network overhead that
diminishes the performance a little.  For larger files and records,
performance should approach the theoretical maximum (the speed of the
network interface - 1 Gb/sec).  The fact that write performance gets
worse for file sizes over 10mB is "interesting".

A corollary for Cairo might be something that increases the "complexity"
of the image being rendered.  E.g., 1 object, 10, 100, ...  or with
varying complexities of each object - 2 nodes, 20, 200... 

For historical performance tracking, I took a weighted average at one of
the points and plotted that for each code release in just a basic bar
chart:

    http://inkscape.org/screenshots/svg_tt_graph_bar_after.png

Also, we've found it useful to plot runs that were done with different
options.  For instance, NFSv4 with/without various security mechanisms
turned on.  This helped emphasize some really horrid performance issues
when running kerberos with full encryption.  The developers are
experimenting with some patches to bring this performance up, and we'll
be providing them with charts like the above so they can track it.

Hopefully as we develop more performance charting tools, we'll be able
to apply them to Cairo too.

> >                                               Also, it sounds like Carl
> > is working on performance tests.  Does anyone else have performance (or
> > other) tests that could be used, that you'd be able to show me how to
> > run?
> 
> As you might expect, my "work" consists in large part of collecting
> good stuff that others have done before, (David Reveman, Billy Biggs,
> and Vladimir Vukicevic have each written a cairo benchmarking suite at
> some time in the past, and many others have written or suggested
> specific performance tests. Many of these are scattered throughout the
> cairo bugzilla and cairo mailing list. So I'm already in the process
> of collecting these and will continue to do so.

Excellent, I'll look forward to seeing these tests running.  :-)

> I'm very much looking forward to the rest of this. And I'd like to
> extend my appreciation to Bryce Harrington and OSDL for dedicating
> resources to support cairo development this way. Thanks so much!

Certainly!  Come visit us some time.  :-)

Bryce