[cairo] Transform optimization

Wed Feb 11 16:11:41 PST 2009

Hi André,

Sorry about the extended absence. I finally got some time to read
through the patch.

Overall, I like the approach of doing this in the general scanline
fetcher since that provides the benefit for all the operator, and the
bilinearInterpolation functions make the existing bilinear code much
more readable. 

Some general comments on the patch:

* I would like to see comments that explain the various phases that
  the code goes through. Just things like

        /* Zero fill the part of the scanline that is to the left of
         * the image */

  Or alternatively, the phases could be made separate functions,
  though this may complicate the code more than its worth.

* Am I misreading the code, or is it missing a final zero-fill phase
  for the pixels that lie to the right of the image?

* Can the code that is computing the bilinear interpolation where one
  of the columns is zero (the code for the left and right edge) be
  factored out? Maybe it could even be reused in the general-purpose
  bilinear code?

> BTW, I installed a 32bits linux (in my old mobile P4) and ran the the
> cairo-perf with bilinearInterpolation optimized with uint64_t
> variables. And my worries was confirmed, this version is slower than
> the 32bits one. I don't know if keep this two versions is a problem,
> looks good for me.

It's fine to have separate 64 and 32 bit versions of the code, but I
think we need a configure-time test for 64-bit here, so that the right
code is used on other 64 bit architectures than x86-64.

Specific comments on the code:

> +    if (!mask)
> +    {
> +        maskInc = 0;
> +        maskBits = 1;
> +        pMask = &maskBits;
> +    }
> +    else
> +    {
> +        maskInc = 1;
> +        pMask = mask;
> +    }

I'm not completely convinced that the

        if (*pMask & maskBits)
           ...;

is a net win, since *pMask is now always a memory reference, whereas
"mask" could be register allocated. 

But in any case this idea seems to be unrelated to the rest of the
patch, so I'd like to see it happen in a separate patch, and then, if
it actually is a win, do it for convolutions and source pictures as
well, not just for fetchTransformed().

> +    if (vec0 < (int) 0xffff0000)
> +    {
> +        int points = ((0xffff0000 - vec0) / unit0) >> 16;

We need a new #define pixman_fixed_minus_1 in pixman.h, which should
then be used here.

> +    if ((pTop == zero) &&
> +        (pBottom == zero))
> +    {
> +        if (!mask)
> +        {
> +           memset (buffer, 0, width * sizeof(uint32_t));
> +        }
> +        else
> +        {
> +            for (i = 0; i<width; i++)
> +            {
> +                if (mask[i] & maskBits)
> +                    buffer[i] = 0;
> +            }
> +        }
> +
> +        return;
> +    }

I think this should just be a memset(). There is no real point
optimizing out memory writes, especially when the alternative involves
the mask[i] memory reads.

> +    pictWidth += 0x00010000;

0x00010000 should be pixman_fixed_1 here. 

>         }
>      }
> +    }
>      else if (pict->common.filter == PIXMAN_FILTER_CONVOLUTION)
>      {

The code above this brace needs to be indented.

Thanks,
Soren