[cairo] Transform optimization

Tue Mar 24 11:21:30 PDT 2009

Hi André,

> > Some general comments on the patch:
> >
> > * I would like to see comments that explain the various phases that
> >  the code goes through. Just things like
> >
> >        /* Zero fill the part of the scanline that is to the left of
> >         * the image */
> 
> Done, but could you please check the English?

It's generally easy enough to understand, but I'll fix up a couple of
grammar issues before committing it.

> > * Can the code that is computing the bilinear interpolation where one
> >  of the columns is zero (the code for the left and right edge) be
> >  factored out? Maybe it could even be reused in the general-purpose
> >  bilinear code?
> 
> I separate the edge functions and the code became more readable, but I
> can't use it in general bilinear fetch now. The main problem is to
> identify where is the edge of the image in general code. Today the
> check is in do_fetch code, we need to change it. I will think about
> how I can do that.

I think it's fine to have separate fetching code for the edges and the
main part of the image. What I meant was if we could avoid the
special-case code for bilinear_left/rightEdge.

Ie., I'd delete bilinear_leftEdge() and just call

        bilinearInterpolation (0, tr, 0, br, distx, disty);

Similarly for bilinear_rightEdge(). 

If we make the bilinearInterpolation() functions force_inline, rather
than just inline, the compiler should be able to optimize it down to
what he special cases did.

> >> BTW, I installed a 32bits linux (in my old mobile P4) and ran the the
> >> cairo-perf with bilinearInterpolation optimized with uint64_t
> >> variables. And my worries was confirmed, this version is slower than
> >> the 32bits one. I don't know if keep this two versions is a problem,
> >> looks good for me.
> >
> > It's fine to have separate 64 and 32 bit versions of the code, but I
> > think we need a configure-time test for 64-bit here, so that the right
> > code is used on other 64 bit architectures than x86-64.
> 
> Done, I also created another configure switch to disable the 64bits
> optimization.

Actually, I think this can be done in a simpler way by using

        AC_CHECK_SIZEOF(long)

and then

        #if SIZEOF_LONG > 4
                ...
        #else
                ...
        #endif

While it might be nice to be able to turn 64 bit optimizations on or
off, I don't think it's essential, and we can add it later if
necessary.

> Since the "mask" it's also a pointer, it could not be register
> allocated all over the time. And yes, this code is faster than the old
> one, but just a bit (something from 5% to 10%). It's too little to
> make a big change in all code, I think.

I mean the _expression_ *pMask will always cause a memory reference,
whereas in the old code

         if (!mask || mask[i] & maskBits)

the variable "mask" could be register allocated, and if it was NULL,
no memory reference would take place. But it's not a huge issue.

The final comment I have is about the name

        fpFetchBilinear32h_affine_repeatNone

I'd like to move towards less complicated names. So can we just call
this

        fetch_bilinear_affine_repeat_none?

Thanks,
Soren