[cairo] Another Gradient rendering speedup patch

Thu Feb 1 02:18:52 PST 2007

And here's a second one, which depends on the first, that will
speedup radial gradient fills (on my machine, up to 1.35 speedup
from the previous patch, and 1.44 from origin)

Regards,

- David Turner
- The FreeType Project  (www.freetype.org)



On Thu, 01 Feb 2007 09:30:20 +0100, "David Turner" <david at freetype.org> said:
> Hello,
> 
> here's a small patch that speeds up gradient rendering in the very common
> case
> where all color stops are opaque. We simply avoid performing un-needed
> alpha pre-multiplications :-)
> 
> cairo-perf-diff indicates, in the image backend, a 1.56 speedup for
> linear gradients,
> and a 1.15 one for radial ones.
> 
> it also fixes a small bug in cairo-perf-diff that prevented it to work
> properly
> when trying to compare perf files directly on my workstation.
> 
> Enjoy,
> 
> - David Turner
-------------- next part --------------
From ccb9d52b4ff9b65bc2382efab0e6ca1b3ca5e6c1 Mon Sep 17 00:00:00 2001
From: David Turner <digit at mounini.par.corp.google.com>
Date: Thu, 1 Feb 2007 11:15:01 +0100
Subject: [PATCH] simplify/optimize the radial gradients computations
---
 pixman/src/fbcompose.c |  144 ++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 119 insertions(+), 25 deletions(-)

diff --git a/pixman/src/fbcompose.c b/pixman/src/fbcompose.c
index 68a7cb3..2407612 100644
--- a/pixman/src/fbcompose.c
+++ b/pixman/src/fbcompose.c
@@ -3159,51 +3159,145 @@ static void fbFetchSourcePict(PicturePtr
         }
 
         if (pGradient->type == SourcePictTypeRadial) {
+
             if (!projective) {
+                double  alpha, beta, beta_incr, alpha_incr, alpha_incr2 ;
+
                 rx -= pGradient->radial.fx;
                 ry -= pGradient->radial.fy;
 
-                if (walker.all_opaque) {
+               /* assuming the following notation:
+                *
+                *   DX = pGradient->radial.dx
+                *   DY = pGradient->radial.dy
+                *   A  = pGradient->radial.a
+                *   M  = pGradient->radial.m
+                *   B  = pGradient->radial.b
+                *
+                * the original computation formula is the following:
+                *
+                *   foreach pixel {
+                *        b   = 2*(rx*DX + ry*DY)
+                *        c   = -(rx*rx + ry*ry)
+                *        det = (b*b) - 4*A*c
+                *        s   = (sqrt(det) - b) / (2*A)
+                *        t   = (xFixed_48_16)((s*M+B)*65536)
+                *        // use t for the pixel
+                *        rx += cx
+                *        ry += cy
+                *   }
+                *
+                * here's how we optimize it:
+                *
+                *  - first, get rid of the constant 2s and 4:
+                *
+                *    foreach pixel {
+                *        b'   = (rx*DX+ry*DY)
+                *        c    = -(rx*rx + ry*ry)
+                *        det  = (b'*b') - c*A
+                *        s    = (sqrt(det) - b)/A
+                *        ... same as before
+                *    }
+                *
+                *  - now, divide terms by A (A > 0 according to its definition in icimage.c):
+                *
+                *    foreach pixel {
+                *        b''  = (rx*(DX/A)+ry*(DY/A))
+                *        c'   = (rx*rx + ry*ry)/A
+                *        s    = sqrt(b''*b'' + c') - b''
+                *        ... same as before
+                *    }
+                *
+                *  - swallow the final multiplication by M into b'' and c', this is equivalent
+                *    to defining a new value K as
+                *
+                *    K = M/A
+                *
+                *    foreach pixel {
+                *        b''  = (rx*(DX*K)+ry*(DY*K))
+                *        c'   = (rx*rx + ry*ry)*K
+                *        s    = sqrt(b''*b'' + c') - b''
+                *        t    = (xFixed_48_16)((s+B)*65536)
+                *        ... same as before
+                *    }
+                *
+                *  - finally, use differential increments to compute b'' and c' which we rename to beta and
+                *    alpha, respectively:
+                *
+                *       alpha       = (rx*rx + ry*ry)*K
+                *       alpha_incr  = 2*((rx+cx)*cx + (ry+cy)*cy))*K
+                *       alpha_incr2 = 2*(cx*cx + cy*cy)*K
+                *
+                *       beta        = rx*(DX*K) + ry*(DY*K)
+                *       beta_incr   = cx*(DX*K) + cy*(DY*K)
+                *
+                *       foreach pixel {
+                *            s           = sqrt(beta*beta + alpha) - beta
+                *            t           = (xFixed_48_16)((s+B)*65536)
+                *            alpha      += alpha_incr
+                *            alpha_incr += alpha_incr2
+                *            beta       += beta_incr
+                *       }
+                *
+                *  - voila, we could use differential increments to compute beta*beta, but
+                *    I assume that floating point multiplications are as fast as additions,
+                *    so this would not necessarily gain speed.
+                *
+                *  - on some platforms (e.g. ARM), it might be faster to perform all computations
+                *    in 64 bits fixed points (eg 32.32); this is left as an exercise to the reader
+                *
+                *  - note that some of these computations can be performed when the gradient
+                *    is inited, instead of doing them on each span ! this requires bigger
+                *    changes to pixman though...
+                *
+                */
+                {
+                    double   k  = pGradient->radial.m / pGradient->radial.a;
+                    double   dx = pGradient->radial.dx * k;
+                    double   dy = pGradient->radial.dy * k;
+
+                    alpha       = (rx*rx + ry*ry)*k;
+                    alpha_incr  = 2*((rx+cx)*cx + (ry+cy)*cy)*k;
+                    alpha_incr2 = 2*(cx*cx + cy*cy)*k;
+
+                    beta      = rx*dx + ry*dy;
+                    beta_incr = cx*dx + cy*dy;
+                }
+
+                if (walker.all_opaque) 
+                {
                     while (buffer < end) {
-                        double b, c, det, s;
-    
                         if (!mask || *mask++ & maskBits)
                         {
                             xFixed_48_16  t;
-    
-                            b = 2*(rx*pGradient->radial.dx + ry*pGradient->radial.dy);
-                            c = -(rx*rx + ry*ry);
-                            det = (b * b) - (4 * pGradient->radial.a * c);
-                            s = (-b + sqrt(det))/(2. * pGradient->radial.a);
-    
-                            t = (xFixed_48_16)((s*pGradient->radial.m + pGradient->radial.b)*65536);
-    
+                            double        s;
+
+                            s = sqrt(beta*beta + alpha) - beta;
+                            t = (xFixed_48_16)((s + pGradient->radial.b)*65536);
+
                             *buffer = _gradient_walker_pixel_opaque (&walker, t);
                         }
                         ++buffer;
-                        rx += cx;
-                        ry += cy;
+                        alpha      += alpha_incr;
+                        alpha_incr += alpha_incr2;
+                        beta       += beta_incr;
                     }
                 } else {
                     while (buffer < end) {
-                        double b, c, det, s;
-    
                         if (!mask || *mask++ & maskBits)
                         {
                             xFixed_48_16  t;
-    
-                            b = 2*(rx*pGradient->radial.dx + ry*pGradient->radial.dy);
-                            c = -(rx*rx + ry*ry);
-                            det = (b * b) - (4 * pGradient->radial.a * c);
-                            s = (-b + sqrt(det))/(2. * pGradient->radial.a);
-    
-                            t = (xFixed_48_16)((s*pGradient->radial.m + pGradient->radial.b)*65536);
-    
+                            double        s;
+
+                            s = sqrt(beta*beta + alpha) - beta;
+                            t = (xFixed_48_16)((s + pGradient->radial.b)*65536);
+
                             *buffer = _gradient_walker_pixel (&walker, t);
                         }
                         ++buffer;
-                        rx += cx;
-                        ry += cy;
+                        beta       += beta_incr;
+                        alpha      += alpha_incr;
+                        alpha_incr += alpha_incr2;
                     }
                 }
             } else {
-- 
1.4.1