[cairo] [PATCH][pixman] Test program for stressing the use of different formats and operators

Tue Jul 21 06:22:29 PDT 2009

On Tuesday 14 July 2009, Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > The code and overall method is mostly based on scaling-test. This one
> > focuses on trying to stress as many different color format and types
> > off composition operations as possible.
> >
> > This is an initial implementation which may need more tuning. Also
> > not all color format and operator combinations are actually used.
> > The ones which crashed pixman or caused other problems even when
> > compiled without any cpu specific optimizations are enclosed
> > in '#if 0' blocks. Preferably they should be enabled later, once
> > the bugs in pixman core are fixed.
>
> This is clearly a very useful test. I fixed the 64 bit issues and the
> issues with CONJOINT_DST/DISJOINT_DST before releasing 0.15.16. (And I
> just realized that we need to compile with -fno-strict-aliasing; we
> haven't had that so far). I'm not sure what's up with the
> component-alpha ones.

It's good to know that it really helped to fix some problems.

> The reason it crashes for the g4, c4, etc. formats is that those
> require a palette to be set with
>
>         pixman_image_set_indexed().
>
> That call sets a pointer to a data type that maps palette entries to
> rgba pixels, and either rgb555 pixels (for c* formats) or y15 (for g*
> formats) to palette entries.

OK, thanks for the explanation.

BTW, is it expected/desired for pixman to crash when palette is not set?

> Some specific comments on the code:
>
> - The CRC32 and the random number generator could be shared with the
>   scaling-test. I'd suggest renaming the existing utils.[ch] to
>   something like show-image.[ch] and then make a new utils.[ch] that
>   contains code shared between tests. Maybe there is other code that
>   could be shared, like the endian swapping?

I would prefer to use CRC32 from zlib, it has a decently optimized 
implementation and should be available in all linux systems out of
the box. Dependency on zlib may cause some inconveniences for
windows users though.

Sharing code is a good idea, but I first tried to make something
very simple and efficient at finding problems. For now, as long
as 'scaling-test' works (tests clipping and transform) and finds
some problems whenever they get introduced, I did not want to touch
it yet. Surely all of this will have to be refactored at some point.
But tests also need to be tested in order to ensure that they still
do their job properly :-)

When the number of obvious bugs in pixman goes down, it may be easier
to see what should be stressed more thoroughly and the tests can be
updated.

> - It has a list of valid mask formats, but in fact all formats can be
>   used as masks.

I selected the mask formats, which are actually used in assembly optimized
fastpath functions for the start.

Extending the list of mask formats will proportionally increase the time,
required to find the same amount of bugs in current assembly code.

The permutations which are needed to try include (but are not necessarily
limited to):
- source format
- destination format
- mask format
- type of compositing operation
- extra features like solid mask, component alpha, etc.
- different sizes for compositing area (optimized code may handle small and
large sized blits using different branches of code)
- different alignments of the source and destination pointers (code may try to
align operations at 16-byte boundary, or some other boundary)
- different strides (to catch potential stride handling)
- different values for the pixel data, so that all the branches in the code
are executed

Finetuning the test (and tweaking the "magic" values for the probabilities of
different argument values) can be done with some coverage tool like
cachegrind. We want to be sure that even the branches like

	    m = *((uint32_t*)mask);

	    if (srca == 0xff && m == 0xffffffff)
	    {
		save_128_aligned ((__m128i*)dst, xmm_def);
	    }

are actually executed at least a couple of hundred times during the test (in 
the full variant of test, which could be run overnight).

Pixman is quite an easy subject for this kind of testing because even though
the number of permutations to try is quite large, it is still manageable.

It may be possible to even try to very roughly estimate the amount of time,
needed to find all the bugs in the tested code with let's say 99% probability.
But tests may need a better PRNG, well suitable for scientific simulations.
Theoretically, correlations in pseudorandom number sequences may result in
some combinations not being tried, no matter for how much time the test is
run.

> - It should follow the coding style and naming convention. (The coding
>   style is similar to cairo/HACKING; the main difference is that
>   braces always go on their own line). See the changes I made to
>   scaling-test.c.

OK

> Some more general comments about test suites:
>
> I'm not convinced that basing the test suite on CRC32 computations and
> reference binaries is ideal.

I just don't see any better solution yet.

The way the test is implemented now, it can compare the code separated
in "space" (different code paths from the same pixman library, when compiled
with different optimizations enabled), "time" (compare different versions of
pixman) or even "realms" (compare x86 vs. PPC or ARM). I actually have a
script which runs in a local network and can automate the process of searching
for the differences between the results obtained on different platforms so
that you even don't have to do "bisecting" for differences manually.

Being able to turn on and off some fastpath functions at runtime, while
somewhat useful, is still not completely versatile.

> The things I'd be looking for are:

> - Make it really easy to check if all tests pass.

The reference CRC32 value expected after running a certain amount of rounds
can be used to detect regressions or behavior changes, which are always
potential regressions.

> - Make it easy to update when pixman behavior changes.
>
>   With the CRC based checks, if you commit something that changes
>   pixman behavior, you have no easy way to verify that the only things
>   that change are those you expect to change.

It is possible to run some subsets of tests. For example, if you are adding
only the changes related to one compositing operation, you may expect that
all the other operations should provide the same result as before
modification.

But any decision is up to the developer anyway. If he thinks that the changes
are OK, he updates the reference CRC32 value. If it turns out that he was
actually wrong, the bug still can be found eventually in some other way and
the test can be updated again. Also any other developer can do "retrospective"
testing between different pixman versions with all the same test if he feels
that something is fishy.

> - Make it easy to generate regression tests.
>
>   The idea with that is to have a separate test program that ensures
>   that earlier bugs never happen again. So it would be useful if the
>   test suite could print out sufficient information to reproduce a
>   bug.

Yes, this extra feature would be very useful, no doubt. Tests already
attempt to print some information. But it will never be completely
sufficient, in some cases you will still have to run the debugger, add
some debugging print messages, etc.

And I expect that the number of bugs detected by these tests will
significantly go down soon, so the time spent on investigation of the
test failures should be also reduced.

I agree that the tests must be easy enough to be used and interpreted 
correctly by any "outsider", who is trying to submit some pixman patch,
but it fails to pass the tests for some reason.

> - Make it easy to pinpoint the failing test.
>
>   The procedure to get to the failing test is pretty cumbersome
>   because it involves compiling pixman twice and manually copying a
>   reference binary around, then running a script. It seems this should
>   be automatable. It also seems like it should be possible to compare
>   the various implementations directly if it were possible to turn
>   them on and off at runtime

This is actually not a problem at all. A simple additional script does this
work of double compiling and running tests easily.

> - Make it possible to *see* what is going wrong.
>
>   Quite often, you can tell from looking at the output what is going
>   on. Also when you expect behavior changes, visual output allows you
>   to verify that the new results look correct.

Agreed, it would be a useful feature for the supplementary utils.[ch] library.
But with random input, it is still hard to see differences between the
expected correct output and the wrong one (both would look like garbage in
many cases). In this case just a standard diff of the hex dumps is pretty
useful.

> > The following optimized pixman functions have been found to produce
> > results, at least different from C variant:
> >
> > ARM NEON: neon_composite_over_n_8_0565, neon_composite_over_n_0565,
> > neon_composite_over_8888_0565
> >
> > MMX: mmx_composite_over_n_8_0565, mmx_composite_over_n_8_8888,
> > mmx_composite_over_n_8888, mmx_composite_over_n_0565,
> > mmx_composite_src_n_8_8888, mmx_composite_in_n_8_8
> >
> > VMX: vmx_combine_atop_u, vmx_combine_atop_reverse_u,
> > vmx_combine_xor_u
>
> Very cool. We should get these fixed before 0.16.0. I'm a bit
> surprised that things like mmx_composite_over_n_8_8888() fails because
> this is the 'text' operation, and it was used very heavily before
> Andre wrote the sse2 backend.

Maybe these are just some corner cases which are not too often seen in
reality on 'text' operations? Or just a single bit error in the last digit?

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.cairographics.org/archives/cairo/attachments/20090721/6439828d/attachment.pgp