[cairo] Tessellator performance patch

Rafael Villar Burke pachi at rvburke.com
Mon Dec 4 16:35:14 PST 2006


Rafael Villar Burke escribió:
> M Joonas Pihlaja escribió:
>>
>> Hi Rafael, 

>> These multiplications are liable to overflow 32 bits and throw off 
>> the later sign checks.  Could you replace them with a 64 bit version 
>> and rerun your tests?
> I've just tried using int32x32_64 operations to do the cross products 
> and it shows a ~x1.10-1.20 speed difference with respect to origin. 
> When using det32_64 to do the math the speed gain is a bit lower, so 
> I'm trying now other strategies to avoid those functions.
Attached is an updated patch with a safe version AFAICT.

In the checks we are using that may have overflow problems we only need 
to know whether we get a positive value, a negative value or a zero 
value for the computation of the cross product of two vectors. We can 
avoid the computation of an exact value.

So far I've investigated some possibilities, such as using a modular 
Gauss elimination. This approach may be a good one, but there are no 
implementations using this method avaliable on the net, and it may be 
not very easy to implement. The paper "Evaluating signs of determinants 
using single-precision aritmetic", by Avnaim, Boissonnat, Devillers, 
Preparata and Yvinec is quite interesting and explains the method, just 
in case anybody wants to give this approach a try.

Another very promising approach would be reducing the avaliable 
precision by shifting bits, as keithp pointed. Also, if we could cheaply 
check that we don't overflow, so that the sum of the significant bits 
for each multiplication is less than 32 bits we could even shortcut to 
the integer expression...

Eventually, I have worked on an heuristics approach and the patch 
implements it. The checks are ordered to get maximum performance, both 
for worst and best case scenarios.

The speed up is not as impressive as with the unsafe int's 
implementation, but it still ranges from ~10% to ~900% and makes the 
tessellator scale better. The most complex cases make the biggest speed 
gains, and no case slows down. This is due to that the most complex 
cases have bigger active sets, and, as potential intersections grow 
exponentially, the speed gain of avoiding unneeded intersections increases.

Speedups
========
image-rgb              tessellate-256-100   63.02 0.91% ->   7.12 
6.23%:  8.86x speedup
ââââââââ
 xlib-rgba             tessellate-256-100   63.05 0.89% ->   7.43 
6.25%:  8.49x speedup
ââââââââ
image-rgba             tessellate-256-100   62.67 0.94% ->   7.93 
6.08%:  7.91x speedup
âââââââ
 xlib-rgb              tessellate-256-100   63.28 0.88% ->   8.06 
3.49%:  7.85x speedup
âââââââ
image-rgb               tessellate-64-100    2.36 2.61% ->   1.02 
8.42%:  2.32x speedup
ââ
 xlib-rgb               tessellate-64-100    2.46 1.62% ->   1.15 
7.59%:  2.15x speedup
ââ
image-rgba              tessellate-64-100    2.33 2.57% ->   1.09 
4.18%:  2.13x speedup
ââ
 xlib-rgba              tessellate-64-100    2.43 2.34% ->   1.29 
0.94%:  1.89x speedup
â
image-rgba              tessellate-16-100    0.13 1.20% ->   0.11 
2.50%:  1.21x speedup
â
 xlib-rgba  zrusin_another_tessellate-415   15.35 1.96% ->  12.88 
2.34%:  1.19x speedup
â
image-rgba  zrusin_another_tessellate-415   14.99 1.78% ->  12.78 
2.28%:  1.17x speedup
â
image-rgb   zrusin_another_tessellate-415   15.06 2.20% ->  12.86 
4.24%:  1.17x speedup
â
 xlib-rgb   zrusin_another_tessellate-415   15.36 1.43% ->  13.19 
0.73%:  1.17x speedup
â
image-rgb               tessellate-16-100    0.13 1.71% ->   0.11 
2.49%:  1.16x speedup
â
image-rgba        zrusin_another_fill-415   22.69 1.30% ->  20.11 
1.11%:  1.13x speedup
â
 xlib-rgb         zrusin_another_fill-415   31.03 0.91% ->  27.70 
1.07%:  1.12x speedup
â
image-rgb         zrusin_another_fill-415   22.78 1.40% ->  20.37 
1.15%:  1.12x speedup
â
 xlib-rgba        zrusin_another_fill-415   30.73 1.02% ->  27.95 
0.80%:  1.10x speedup
â
 xlib-rgba              tessellate-16-100    0.21 1.09% ->   0.19 
1.33%:  1.09x speedup
â
 xlib-rgb               tessellate-16-100    0.22 1.20% ->   0.21 
2.67%:  1.08x speedup
â


I'm looking into speed things up a bit avoiding the slope_comparison 
test that goes after these quick tests, as the middle point in segments 
could be used to avoid intersections over the sweep line (If segment1 
and segment2 share middle point and it's over the sweep line they can be 
discarded for intersection under the sweep line). This could benefit 
more simple cases...

--
Rafael Villar Burke
www.rvburke.com

 From 407ebcaf756bd406dd00bb1b2fc9c3db777461dc Mon Sep 17 00:00:00 2001
From: Rafael Villar Burke <pachi at rvburke.com>
Date: Mon, 4 Dec 2006 22:31:15 +0100
Subject: [PATCH] Improve tessellator performance using fast paths

---
 src/cairo-bentley-ottmann.c |   54 
+++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/src/cairo-bentley-ottmann.c b/src/cairo-bentley-ottmann.c
index 2c0f98c..faa8235 100644
--- a/src/cairo-bentley-ottmann.c
+++ b/src/cairo-bentley-ottmann.c
@@ -783,6 +783,57 @@ _cairo_bo_event_queue_fini (cairo_bo_event_queue_t 
*event_queue)
     free (event_queue->sorted_startstop_event_ptrs);
 }
 
+static int
+_crossp(int32_t const a, int32_t const b, int32_t const c, int32_t const d)
+{
+    /* Cross product qualitative evaluation.
+     * The function returns 1 for a positive value of the cross product,
+     * -1 for a negative one, and 0 for a zero value.
+     * Sign is computed using heuristics and the given return order has 
proved
+     * best results with the existing test cases. */
+    cairo_bool_t ad_eq_zero = (a==0 || d==0);
+    cairo_bool_t ad_gt_zero = (a>0 && d>0)||(a<0 && d<0);
+    cairo_bool_t ad_lt_zero = (a < 0) ^ (d < 0);
+    cairo_bool_t bc_eq_zero = (b==0 || c==0);
+    cairo_bool_t bc_gt_zero = (b>0 && c>0)||(b<0 && c<0);
+    cairo_bool_t bc_lt_zero = (b < 0) ^(c < 0);
+    cairo_bool_t ad_gt_bc = (a>=b && d>c) || (a>=c && d>b) || (a>b && 
d>=c) || (a>c && d>=b);
+    cairo_bool_t ad_lt_bc = (a<=b && d<c) || (a<=c && d<b) || (a<b && 
d<=c) || (a<c && d<=b);
+    /*cairo_bool_t ad_eq_bc = !ad_gt_bc && !ad_lt_bc;*/
+
+    if ((!ad_lt_zero && bc_lt_zero) || (ad_gt_zero && bc_eq_zero) || 
(ad_gt_zero && bc_gt_zero && ad_gt_bc) || (ad_lt_zero && bc_lt_zero && 
ad_gt_bc)) return 1;
+    if ((ad_eq_zero && bc_gt_zero) || (ad_lt_zero && !bc_lt_zero) || 
(ad_gt_zero && bc_gt_zero && ad_lt_bc) || (ad_lt_zero && bc_lt_zero && 
ad_lt_bc)) return -1;
+    /*if ((ad_eq_zero && bc_eq_zero) || (ad_eq_bc)) return 0;*/
+    return 0;
+}
+
+static cairo_bool_t
+_cairo_may_intersect (cairo_bo_edge_t const *edge1,
+        cairo_bo_edge_t const *edge2)
+{
+    /* Check wether the given edges MAY intersect using colinearity and 
proper
+     * intersection checks.
+     * If intersection is possible return TRUE, otherwise return FALSE */
+    cairo_bo_point32_t a = edge1->top;
+    cairo_bo_point32_t b = edge1->bottom;
+    cairo_bo_point32_t c = edge2->top;
+    cairo_bo_point32_t d = edge2->bottom;
+
+    int crossp_abc = _crossp((b.x - a.x), (c.x - a.x), (b.y - a.y), 
(c.y - a.y));
+    int crossp_abd = _crossp((b.x - a.x), (d.x - a.x), (b.y - a.y), 
(d.y - a.y));
+    int crossp_cda = _crossp((d.x - c.x), (a.x - c.x), (d.y - c.y), 
(a.y - c.y));
+    int crossp_cdb = _crossp((d.x - c.x), (b.x - c.x), (d.y - c.y), 
(b.y - c.y));
+
+    cairo_bool_t proper_intersection = (!(crossp_abc > 0) ^ 
!(crossp_abd > 0)) &&
+        (!(crossp_cda > 0) ^ !(crossp_cdb > 0));
+
+    /* Proper intersection check: if proper intersection return TRUE */
+    if (proper_intersection) return TRUE;
+    /* Colinearity check: if colinear, return True */
+    else if (crossp_abc== 0 || crossp_abd==0 || crossp_cda==0 || 
crossp_cdb==0) return TRUE;
+    else return FALSE;
+}
+
 static void
 _cairo_bo_event_queue_insert_if_intersect_below_current_y 
(cairo_bo_event_queue_t    *event_queue,
                                cairo_bo_edge_t    *left,
@@ -795,6 +846,9 @@ 
_cairo_bo_event_queue_insert_if_intersect_below_current_y 
(cairo_bo_event_queue_
     if (left == NULL || right == NULL)
     return;
 
+    if (!_cairo_may_intersect (left, right))
+    return;
+
     /* The names "left" and "right" here are correct descriptions of
      * the order of the two edges within the active edge list. So if a
      * slope comparison also puts left less than right, then we know
-- 
1.4.4.1


-------------- next part --------------
>From 407ebcaf756bd406dd00bb1b2fc9c3db777461dc Mon Sep 17 00:00:00 2001
From: Rafael Villar Burke <pachi at rvburke.com>
Date: Mon, 4 Dec 2006 22:31:15 +0100
Subject: [PATCH] Improve tessellator performance using fast paths

---
 src/cairo-bentley-ottmann.c |   54 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/src/cairo-bentley-ottmann.c b/src/cairo-bentley-ottmann.c
index 2c0f98c..faa8235 100644
--- a/src/cairo-bentley-ottmann.c
+++ b/src/cairo-bentley-ottmann.c
@@ -783,6 +783,57 @@ _cairo_bo_event_queue_fini (cairo_bo_event_queue_t *event_queue)
 	free (event_queue->sorted_startstop_event_ptrs);
 }
 
+static int
+_crossp(int32_t const a, int32_t const b, int32_t const c, int32_t const d)
+{
+    /* Cross product qualitative evaluation.
+     * The function returns 1 for a positive value of the cross product,
+     * -1 for a negative one, and 0 for a zero value.
+     * Sign is computed using heuristics and the given return order has proved
+     * best results with the existing test cases. */
+    cairo_bool_t ad_eq_zero = (a==0 || d==0);
+    cairo_bool_t ad_gt_zero = (a>0 && d>0)||(a<0 && d<0);
+    cairo_bool_t ad_lt_zero = (a < 0) ^ (d < 0);
+    cairo_bool_t bc_eq_zero = (b==0 || c==0);
+    cairo_bool_t bc_gt_zero = (b>0 && c>0)||(b<0 && c<0);
+    cairo_bool_t bc_lt_zero = (b < 0) ^(c < 0);
+    cairo_bool_t ad_gt_bc = (a>=b && d>c) || (a>=c && d>b) || (a>b && d>=c) || (a>c && d>=b);
+    cairo_bool_t ad_lt_bc = (a<=b && d<c) || (a<=c && d<b) || (a<b && d<=c) || (a<c && d<=b);
+    /*cairo_bool_t ad_eq_bc = !ad_gt_bc && !ad_lt_bc;*/
+
+    if ((!ad_lt_zero && bc_lt_zero) || (ad_gt_zero && bc_eq_zero) || (ad_gt_zero && bc_gt_zero && ad_gt_bc) || (ad_lt_zero && bc_lt_zero && ad_gt_bc)) return 1;
+    if ((ad_eq_zero && bc_gt_zero) || (ad_lt_zero && !bc_lt_zero) || (ad_gt_zero && bc_gt_zero && ad_lt_bc) || (ad_lt_zero && bc_lt_zero && ad_lt_bc)) return -1;
+    /*if ((ad_eq_zero && bc_eq_zero) || (ad_eq_bc)) return 0;*/
+    return 0;
+}
+
+static cairo_bool_t
+_cairo_may_intersect (cairo_bo_edge_t const *edge1,
+        cairo_bo_edge_t const *edge2)
+{
+    /* Check wether the given edges MAY intersect using colinearity and proper
+     * intersection checks.
+     * If intersection is possible return TRUE, otherwise return FALSE */
+    cairo_bo_point32_t a = edge1->top;
+    cairo_bo_point32_t b = edge1->bottom;
+    cairo_bo_point32_t c = edge2->top;
+    cairo_bo_point32_t d = edge2->bottom;
+
+    int crossp_abc = _crossp((b.x - a.x), (c.x - a.x), (b.y - a.y), (c.y - a.y));
+    int crossp_abd = _crossp((b.x - a.x), (d.x - a.x), (b.y - a.y), (d.y - a.y));
+    int crossp_cda = _crossp((d.x - c.x), (a.x - c.x), (d.y - c.y), (a.y - c.y));
+    int crossp_cdb = _crossp((d.x - c.x), (b.x - c.x), (d.y - c.y), (b.y - c.y));
+
+    cairo_bool_t proper_intersection = (!(crossp_abc > 0) ^ !(crossp_abd > 0)) &&
+        (!(crossp_cda > 0) ^ !(crossp_cdb > 0));
+
+    /* Proper intersection check: if proper intersection return TRUE */
+    if (proper_intersection) return TRUE;
+    /* Colinearity check: if colinear, return True */
+    else if (crossp_abc== 0 || crossp_abd==0 || crossp_cda==0 || crossp_cdb==0) return TRUE;
+    else return FALSE;
+}
+
 static void
 _cairo_bo_event_queue_insert_if_intersect_below_current_y (cairo_bo_event_queue_t	*event_queue,
 							   cairo_bo_edge_t	*left,
@@ -795,6 +846,9 @@ _cairo_bo_event_queue_insert_if_intersect_below_current_y (cairo_bo_event_queue_
     if (left == NULL || right == NULL)
 	return;
 
+    if (!_cairo_may_intersect (left, right))
+    return;
+
     /* The names "left" and "right" here are correct descriptions of
      * the order of the two edges within the active edge list. So if a
      * slope comparison also puts left less than right, then we know
-- 
1.4.4.1



More information about the cairo mailing list