[cairo] What does it take to get a make check to pass with the xcb target, using CAIRO_REF_DIR?

Fri Jul 8 02:20:26 UTC 2016

On Thu, Jul 07, 2016 at 09:00:53PM -0400, darxus at chaosreigns.com wrote:
> On 07/07, Bryce Harrington wrote:
> > It would be gratifying to ditch the tests, but it wouldn't be the proper
> > thing to do at this stage when the issue hasn't been root-caused (afaik).
> 
> I'm replying to this last thing you said first.
> 
> When and how does "make check" get run?  My assumption is that nobody ever
> runs it, because it always fails, and is therefore useless. 

I think I mentioned earlier that I used to run it every few weeks.  I
made some scripts to diff out tests that change betwee pass and fail
between releases.  I've seen scattered other folks run it too.

When doing releases I also try to do a run and compare with the prior
release to spot regressions.  I'd really love to be able to do a
thorough git bisect search to locate which changes caused which tests to
fail, but as the suite takes a while to run it'd be pretty time
consuming.

> So getting it into a state where it can possibly pass is worth doing,
> even if that requires disabling tests that show current important
> bugs.  So that people can be expected to get back into the habit of
> verifying that it passes before submitting code.  (At least sometimes,
> or at least with one of the targets.)
> 
> Do you disagree?  Is my assumption wrong?

Well, I can see several different points of view...

One point of view is that every test is important, and no issue should
be hidden.  From this point of view disabling tests would be cheating.

Another point of view emphasizes the importance of the test suite as a
whole for QA purposes, and would prioritize streamlining the set of
tests to ones that reliably pass so that regressions can be caught
swifter.

A third point of view would emphasize runnability.  Currently the test
suite is so chock full of test cases that it takes a significant amount
of time to run it.  As a maintainer it would be nice to run the suite
for every patch, but my patience runs out at about the 15 min mark
there.  :-)  If you run the suite against all backends, some of the
tests pop up X windows or trigger system level side effects, that makes
the test suite annoying to run on your devel box.

Anyway, I would not disagree with any of these points of view, they all
have perfectly valid justification to them.  But they're not necessarily
compatible.  As I understand it, the first point of view has been
dominant historically.  Personally I think it'd be better to have a
faster, cleaner test suite, but again, that's just one POV.

Frankly though, Cairo's rate of change is really, really slow.  And most
of the changes we get are one-off fixes or minor enhancements -- stuff
that's unlikely to trip test failures.  So even if we did simplify the
test suite to pass and run faster and cleaner, the magnitude of the
benefit overall would be on the modest side.

But you can argue the opposite side equally well too - these tests have
been intermittently failing for a *long* time, so if they were finding
legitimate issues they should have been dealt with long ago.

> > > bug pointing out what was disabled seems appropriate.
> > 
> > Yep.
> 
> Any suggestions on how these bugs should be reported?  One per failed png
> output file (so 14 bugs for the xcb target)?

If by looking at the output files the renderings are clearly different
problems, then yeah go for it.  Bug reports are pretty easy to dupe up
later if someone figures out they're the same underlying problem.

Just be aware that the bug tracker doesn't get a lot of love, so less is
more here.  It might be better to file bug reports for issues only after
someone's had a chance to study them enough to have a hypothesis or two
about what's going wrong.  If it's just raw test failures, then I'd
worry there'd be a strong chance the report would just bitrot.

> >  *  XFAIL -- An external failure. We believe the cairo output is perfect,                                        
> >  *           but an external renderer is causing gross failure.                                                  
> 
> Ah, I misunderstood the meaning of XFAIL.  I have no idea where these
> problems are occurring.
> 
> > So you might see if deleting the reference image would get it to pass.
> 
> I'm unenthusiastic about that idea, because it would be less convenient to
> verify output is as expected when the problem is eventually fixed.

True enough.

> > > Delete them from test/Makefile.sources ?
> > 
> > Checking my notes, the record* tests were indeed some of the ones I was
> > also seeing intermittently pass and fail.  The test case itself doesn't
> > look that terribly exotic, though, so I hesistate to chalk it up to a
> > broken test.  If it is indeed down to variations in the rendering for
> > transparency and anti-aliasing, then I wonder if the test could be
> > tweaked to either not use it or to be less sensitive in checking it.
> 
> The test output looks, to me, like something is significantly wrong.  Like
> in
> http://www.chaosreigns.com/tmp/cairo/cairo-ref.xcb.xvfb.3/record2x-paint-alpha-clip-mask.xcb.argb32.out.png
> The spots at the four corners of the octagon are not supposed to be there.

Yep.

> > I poked around a bit in the test runner code.  If we do want to disable
> > the test, another option to actually deleting it might be to have the
> > test routine, or the test's preamble, return one of the error codes.
> > There is CAIRO_TEST_XFAILURE, although only a couple tests use that
> > (path-precision.c and in-fill-empty-trapezoid.c).  CAIRO_TEST_UNTESTED
> > and CAIRO_TEST_ERROR are more widely used.  Again though, the proper
> 
> I feel like CAIRO_TEST_ERROR may be interpreted as a failure?
> CAIRO_TEST_UNTESTED seems like a decent option.

I didn't run across clear documentation as to when one would be used
over the other.  ERROR results are tallied separately from the failures,
and AIUI indicate that there is something wrong with the test itself or
the test suite.  I didn't spot explicit documentation though, so I could
be off.  For UNTESTED your guess is as good as mine what it means.

> > course of action would be to rule out that there is a legitimate issue
> > at stake first.  Then, second, identify whether it is caused by an
> > external factor.
> 
> Certainly.  But how many years has that not happened?  Until it happens,
> wouldn't it be better to have a test suite that is useful?

Maybe?  I do see your point but I'm rather on the fence.

Uli what do you think?

> > > Removing these test from test/Makefile.sources allows make check to pass
> > > (with an updated set of references):  record1414x.c record2x.c record90.c
> > > recordflip.c record.c text-rotate.c
> > 
> > Updating the references makes an assumption that all of the changes
> > being flagged are fixes rather than regressions.  When I last studied
> > the test output, I couldn't justify that to myself, but examining each
> > case and manually updating the references looked like a daunting task.
> 
> I'm not suggesting updating all of the references in the repo.  I'm just
> using updated references a minimum useful baseline.  Which is failing.

Gotcha.
Bryce