[cairo] Another Gradient rendering speedup patch
David Turner
david at freetype.org
Thu Feb 1 02:18:52 PST 2007
And here's a second one, which depends on the first, that will
speedup radial gradient fills (on my machine, up to 1.35 speedup
from the previous patch, and 1.44 from origin)
Regards,
- David Turner
- The FreeType Project (www.freetype.org)
On Thu, 01 Feb 2007 09:30:20 +0100, "David Turner" <david at freetype.org> said:
> Hello,
>
> here's a small patch that speeds up gradient rendering in the very common
> case
> where all color stops are opaque. We simply avoid performing un-needed
> alpha pre-multiplications :-)
>
> cairo-perf-diff indicates, in the image backend, a 1.56 speedup for
> linear gradients,
> and a 1.15 one for radial ones.
>
> it also fixes a small bug in cairo-perf-diff that prevented it to work
> properly
> when trying to compare perf files directly on my workstation.
>
> Enjoy,
>
> - David Turner
-------------- next part --------------
From ccb9d52b4ff9b65bc2382efab0e6ca1b3ca5e6c1 Mon Sep 17 00:00:00 2001
From: David Turner <digit at mounini.par.corp.google.com>
Date: Thu, 1 Feb 2007 11:15:01 +0100
Subject: [PATCH] simplify/optimize the radial gradients computations
---
pixman/src/fbcompose.c | 144 ++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 119 insertions(+), 25 deletions(-)
diff --git a/pixman/src/fbcompose.c b/pixman/src/fbcompose.c
index 68a7cb3..2407612 100644
--- a/pixman/src/fbcompose.c
+++ b/pixman/src/fbcompose.c
@@ -3159,51 +3159,145 @@ static void fbFetchSourcePict(PicturePtr
}
if (pGradient->type == SourcePictTypeRadial) {
+
if (!projective) {
+ double alpha, beta, beta_incr, alpha_incr, alpha_incr2 ;
+
rx -= pGradient->radial.fx;
ry -= pGradient->radial.fy;
- if (walker.all_opaque) {
+ /* assuming the following notation:
+ *
+ * DX = pGradient->radial.dx
+ * DY = pGradient->radial.dy
+ * A = pGradient->radial.a
+ * M = pGradient->radial.m
+ * B = pGradient->radial.b
+ *
+ * the original computation formula is the following:
+ *
+ * foreach pixel {
+ * b = 2*(rx*DX + ry*DY)
+ * c = -(rx*rx + ry*ry)
+ * det = (b*b) - 4*A*c
+ * s = (sqrt(det) - b) / (2*A)
+ * t = (xFixed_48_16)((s*M+B)*65536)
+ * // use t for the pixel
+ * rx += cx
+ * ry += cy
+ * }
+ *
+ * here's how we optimize it:
+ *
+ * - first, get rid of the constant 2s and 4:
+ *
+ * foreach pixel {
+ * b' = (rx*DX+ry*DY)
+ * c = -(rx*rx + ry*ry)
+ * det = (b'*b') - c*A
+ * s = (sqrt(det) - b)/A
+ * ... same as before
+ * }
+ *
+ * - now, divide terms by A (A > 0 according to its definition in icimage.c):
+ *
+ * foreach pixel {
+ * b'' = (rx*(DX/A)+ry*(DY/A))
+ * c' = (rx*rx + ry*ry)/A
+ * s = sqrt(b''*b'' + c') - b''
+ * ... same as before
+ * }
+ *
+ * - swallow the final multiplication by M into b'' and c', this is equivalent
+ * to defining a new value K as
+ *
+ * K = M/A
+ *
+ * foreach pixel {
+ * b'' = (rx*(DX*K)+ry*(DY*K))
+ * c' = (rx*rx + ry*ry)*K
+ * s = sqrt(b''*b'' + c') - b''
+ * t = (xFixed_48_16)((s+B)*65536)
+ * ... same as before
+ * }
+ *
+ * - finally, use differential increments to compute b'' and c' which we rename to beta and
+ * alpha, respectively:
+ *
+ * alpha = (rx*rx + ry*ry)*K
+ * alpha_incr = 2*((rx+cx)*cx + (ry+cy)*cy))*K
+ * alpha_incr2 = 2*(cx*cx + cy*cy)*K
+ *
+ * beta = rx*(DX*K) + ry*(DY*K)
+ * beta_incr = cx*(DX*K) + cy*(DY*K)
+ *
+ * foreach pixel {
+ * s = sqrt(beta*beta + alpha) - beta
+ * t = (xFixed_48_16)((s+B)*65536)
+ * alpha += alpha_incr
+ * alpha_incr += alpha_incr2
+ * beta += beta_incr
+ * }
+ *
+ * - voila, we could use differential increments to compute beta*beta, but
+ * I assume that floating point multiplications are as fast as additions,
+ * so this would not necessarily gain speed.
+ *
+ * - on some platforms (e.g. ARM), it might be faster to perform all computations
+ * in 64 bits fixed points (eg 32.32); this is left as an exercise to the reader
+ *
+ * - note that some of these computations can be performed when the gradient
+ * is inited, instead of doing them on each span ! this requires bigger
+ * changes to pixman though...
+ *
+ */
+ {
+ double k = pGradient->radial.m / pGradient->radial.a;
+ double dx = pGradient->radial.dx * k;
+ double dy = pGradient->radial.dy * k;
+
+ alpha = (rx*rx + ry*ry)*k;
+ alpha_incr = 2*((rx+cx)*cx + (ry+cy)*cy)*k;
+ alpha_incr2 = 2*(cx*cx + cy*cy)*k;
+
+ beta = rx*dx + ry*dy;
+ beta_incr = cx*dx + cy*dy;
+ }
+
+ if (walker.all_opaque)
+ {
while (buffer < end) {
- double b, c, det, s;
-
if (!mask || *mask++ & maskBits)
{
xFixed_48_16 t;
-
- b = 2*(rx*pGradient->radial.dx + ry*pGradient->radial.dy);
- c = -(rx*rx + ry*ry);
- det = (b * b) - (4 * pGradient->radial.a * c);
- s = (-b + sqrt(det))/(2. * pGradient->radial.a);
-
- t = (xFixed_48_16)((s*pGradient->radial.m + pGradient->radial.b)*65536);
-
+ double s;
+
+ s = sqrt(beta*beta + alpha) - beta;
+ t = (xFixed_48_16)((s + pGradient->radial.b)*65536);
+
*buffer = _gradient_walker_pixel_opaque (&walker, t);
}
++buffer;
- rx += cx;
- ry += cy;
+ alpha += alpha_incr;
+ alpha_incr += alpha_incr2;
+ beta += beta_incr;
}
} else {
while (buffer < end) {
- double b, c, det, s;
-
if (!mask || *mask++ & maskBits)
{
xFixed_48_16 t;
-
- b = 2*(rx*pGradient->radial.dx + ry*pGradient->radial.dy);
- c = -(rx*rx + ry*ry);
- det = (b * b) - (4 * pGradient->radial.a * c);
- s = (-b + sqrt(det))/(2. * pGradient->radial.a);
-
- t = (xFixed_48_16)((s*pGradient->radial.m + pGradient->radial.b)*65536);
-
+ double s;
+
+ s = sqrt(beta*beta + alpha) - beta;
+ t = (xFixed_48_16)((s + pGradient->radial.b)*65536);
+
*buffer = _gradient_walker_pixel (&walker, t);
}
++buffer;
- rx += cx;
- ry += cy;
+ beta += beta_incr;
+ alpha += alpha_incr;
+ alpha_incr += alpha_incr2;
}
}
} else {
--
1.4.1
More information about the cairo
mailing list