[cairo] Transform optimization

Wed Nov 5 16:10:50 PST 2008

On Wed, Nov 05, 2008 at 09:27:17PM -0200, André Tupinambá wrote:
> Hi everyone,
> 
> I'm searching for some opportunities to optimize other pieces of code,
> and I'm working now with the transformation code. 

Thanks for looking into this; the transformation code could definitely
use some work.

> Checking with VTune, I saw that a great hotspot at fetching the pixel
> call (do_fetch). So I tried to reduce this fetch, checking if I just
> read this pixel before, and works well for magnifying (about 2x in
> Core2 and 1.66x in Turion).

What about when minifying? It seems like this patch would cause a slow down
because the fetches are always to different pixels.

It would also be good to know why fetches are so slow. In theory it
should only be a cached memory access which should be pretty quick.
However, I can see the advantage of avoiding fetching when we are up
scaling a large amount since the samples don't change for an entire
region of destination pixels. But I don't really like the idea of adding
more code to the inside of a inner loop, certainly not without some more
performance numbers for scaling to different sizes. It would also be
good to have some idea about what the cost of a fetch is, so that we know
how important they are to avoid.

Further, I wonder if a more implicit approach would work better. If we
could do a better job knowing when we need to read a new sample
we wouldn't need to test for it every time. Something like:

while (dest < dest_end) {
   compute_src_pixel_location();
   fetch_src_pixels()
   while (src_pixels_the_same) {
     dest = compute_dest_pixel(src_pixels);
     dest++;
   }
}

instead of your patch which doesn't something more like:

while (dest < dest_end) {
	compute_src_pixel_location();
	if (!src_pixels_the_same) {
		fetch_src_pixels()
	}
	dest = compute_dest_pixels(src_pixels);
	dest++;
}

-Jeff