[cairo] Tessellator performance patch
Rafael Villar Burke
pachi at rvburke.com
Mon Dec 4 16:35:14 PST 2006
Rafael Villar Burke escribió:
> M Joonas Pihlaja escribió:
>>
>> Hi Rafael,
>> These multiplications are liable to overflow 32 bits and throw off
>> the later sign checks. Could you replace them with a 64 bit version
>> and rerun your tests?
> I've just tried using int32x32_64 operations to do the cross products
> and it shows a ~x1.10-1.20 speed difference with respect to origin.
> When using det32_64 to do the math the speed gain is a bit lower, so
> I'm trying now other strategies to avoid those functions.
Attached is an updated patch with a safe version AFAICT.
In the checks we are using that may have overflow problems we only need
to know whether we get a positive value, a negative value or a zero
value for the computation of the cross product of two vectors. We can
avoid the computation of an exact value.
So far I've investigated some possibilities, such as using a modular
Gauss elimination. This approach may be a good one, but there are no
implementations using this method avaliable on the net, and it may be
not very easy to implement. The paper "Evaluating signs of determinants
using single-precision aritmetic", by Avnaim, Boissonnat, Devillers,
Preparata and Yvinec is quite interesting and explains the method, just
in case anybody wants to give this approach a try.
Another very promising approach would be reducing the avaliable
precision by shifting bits, as keithp pointed. Also, if we could cheaply
check that we don't overflow, so that the sum of the significant bits
for each multiplication is less than 32 bits we could even shortcut to
the integer expression...
Eventually, I have worked on an heuristics approach and the patch
implements it. The checks are ordered to get maximum performance, both
for worst and best case scenarios.
The speed up is not as impressive as with the unsafe int's
implementation, but it still ranges from ~10% to ~900% and makes the
tessellator scale better. The most complex cases make the biggest speed
gains, and no case slows down. This is due to that the most complex
cases have bigger active sets, and, as potential intersections grow
exponentially, the speed gain of avoiding unneeded intersections increases.
Speedups
========
image-rgb tessellate-256-100 63.02 0.91% -> 7.12
6.23%: 8.86x speedup
ââââââââ
xlib-rgba tessellate-256-100 63.05 0.89% -> 7.43
6.25%: 8.49x speedup
ââââââââ
image-rgba tessellate-256-100 62.67 0.94% -> 7.93
6.08%: 7.91x speedup
âââââââ
xlib-rgb tessellate-256-100 63.28 0.88% -> 8.06
3.49%: 7.85x speedup
âââââââ
image-rgb tessellate-64-100 2.36 2.61% -> 1.02
8.42%: 2.32x speedup
ââ
xlib-rgb tessellate-64-100 2.46 1.62% -> 1.15
7.59%: 2.15x speedup
ââ
image-rgba tessellate-64-100 2.33 2.57% -> 1.09
4.18%: 2.13x speedup
ââ
xlib-rgba tessellate-64-100 2.43 2.34% -> 1.29
0.94%: 1.89x speedup
â
image-rgba tessellate-16-100 0.13 1.20% -> 0.11
2.50%: 1.21x speedup
â
xlib-rgba zrusin_another_tessellate-415 15.35 1.96% -> 12.88
2.34%: 1.19x speedup
â
image-rgba zrusin_another_tessellate-415 14.99 1.78% -> 12.78
2.28%: 1.17x speedup
â
image-rgb zrusin_another_tessellate-415 15.06 2.20% -> 12.86
4.24%: 1.17x speedup
â
xlib-rgb zrusin_another_tessellate-415 15.36 1.43% -> 13.19
0.73%: 1.17x speedup
â
image-rgb tessellate-16-100 0.13 1.71% -> 0.11
2.49%: 1.16x speedup
â
image-rgba zrusin_another_fill-415 22.69 1.30% -> 20.11
1.11%: 1.13x speedup
â
xlib-rgb zrusin_another_fill-415 31.03 0.91% -> 27.70
1.07%: 1.12x speedup
â
image-rgb zrusin_another_fill-415 22.78 1.40% -> 20.37
1.15%: 1.12x speedup
â
xlib-rgba zrusin_another_fill-415 30.73 1.02% -> 27.95
0.80%: 1.10x speedup
â
xlib-rgba tessellate-16-100 0.21 1.09% -> 0.19
1.33%: 1.09x speedup
â
xlib-rgb tessellate-16-100 0.22 1.20% -> 0.21
2.67%: 1.08x speedup
â
I'm looking into speed things up a bit avoiding the slope_comparison
test that goes after these quick tests, as the middle point in segments
could be used to avoid intersections over the sweep line (If segment1
and segment2 share middle point and it's over the sweep line they can be
discarded for intersection under the sweep line). This could benefit
more simple cases...
--
Rafael Villar Burke
www.rvburke.com
From 407ebcaf756bd406dd00bb1b2fc9c3db777461dc Mon Sep 17 00:00:00 2001
From: Rafael Villar Burke <pachi at rvburke.com>
Date: Mon, 4 Dec 2006 22:31:15 +0100
Subject: [PATCH] Improve tessellator performance using fast paths
---
src/cairo-bentley-ottmann.c | 54
+++++++++++++++++++++++++++++++++++++++++++
1 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/src/cairo-bentley-ottmann.c b/src/cairo-bentley-ottmann.c
index 2c0f98c..faa8235 100644
--- a/src/cairo-bentley-ottmann.c
+++ b/src/cairo-bentley-ottmann.c
@@ -783,6 +783,57 @@ _cairo_bo_event_queue_fini (cairo_bo_event_queue_t
*event_queue)
free (event_queue->sorted_startstop_event_ptrs);
}
+static int
+_crossp(int32_t const a, int32_t const b, int32_t const c, int32_t const d)
+{
+ /* Cross product qualitative evaluation.
+ * The function returns 1 for a positive value of the cross product,
+ * -1 for a negative one, and 0 for a zero value.
+ * Sign is computed using heuristics and the given return order has
proved
+ * best results with the existing test cases. */
+ cairo_bool_t ad_eq_zero = (a==0 || d==0);
+ cairo_bool_t ad_gt_zero = (a>0 && d>0)||(a<0 && d<0);
+ cairo_bool_t ad_lt_zero = (a < 0) ^ (d < 0);
+ cairo_bool_t bc_eq_zero = (b==0 || c==0);
+ cairo_bool_t bc_gt_zero = (b>0 && c>0)||(b<0 && c<0);
+ cairo_bool_t bc_lt_zero = (b < 0) ^(c < 0);
+ cairo_bool_t ad_gt_bc = (a>=b && d>c) || (a>=c && d>b) || (a>b &&
d>=c) || (a>c && d>=b);
+ cairo_bool_t ad_lt_bc = (a<=b && d<c) || (a<=c && d<b) || (a<b &&
d<=c) || (a<c && d<=b);
+ /*cairo_bool_t ad_eq_bc = !ad_gt_bc && !ad_lt_bc;*/
+
+ if ((!ad_lt_zero && bc_lt_zero) || (ad_gt_zero && bc_eq_zero) ||
(ad_gt_zero && bc_gt_zero && ad_gt_bc) || (ad_lt_zero && bc_lt_zero &&
ad_gt_bc)) return 1;
+ if ((ad_eq_zero && bc_gt_zero) || (ad_lt_zero && !bc_lt_zero) ||
(ad_gt_zero && bc_gt_zero && ad_lt_bc) || (ad_lt_zero && bc_lt_zero &&
ad_lt_bc)) return -1;
+ /*if ((ad_eq_zero && bc_eq_zero) || (ad_eq_bc)) return 0;*/
+ return 0;
+}
+
+static cairo_bool_t
+_cairo_may_intersect (cairo_bo_edge_t const *edge1,
+ cairo_bo_edge_t const *edge2)
+{
+ /* Check wether the given edges MAY intersect using colinearity and
proper
+ * intersection checks.
+ * If intersection is possible return TRUE, otherwise return FALSE */
+ cairo_bo_point32_t a = edge1->top;
+ cairo_bo_point32_t b = edge1->bottom;
+ cairo_bo_point32_t c = edge2->top;
+ cairo_bo_point32_t d = edge2->bottom;
+
+ int crossp_abc = _crossp((b.x - a.x), (c.x - a.x), (b.y - a.y),
(c.y - a.y));
+ int crossp_abd = _crossp((b.x - a.x), (d.x - a.x), (b.y - a.y),
(d.y - a.y));
+ int crossp_cda = _crossp((d.x - c.x), (a.x - c.x), (d.y - c.y),
(a.y - c.y));
+ int crossp_cdb = _crossp((d.x - c.x), (b.x - c.x), (d.y - c.y),
(b.y - c.y));
+
+ cairo_bool_t proper_intersection = (!(crossp_abc > 0) ^
!(crossp_abd > 0)) &&
+ (!(crossp_cda > 0) ^ !(crossp_cdb > 0));
+
+ /* Proper intersection check: if proper intersection return TRUE */
+ if (proper_intersection) return TRUE;
+ /* Colinearity check: if colinear, return True */
+ else if (crossp_abc== 0 || crossp_abd==0 || crossp_cda==0 ||
crossp_cdb==0) return TRUE;
+ else return FALSE;
+}
+
static void
_cairo_bo_event_queue_insert_if_intersect_below_current_y
(cairo_bo_event_queue_t *event_queue,
cairo_bo_edge_t *left,
@@ -795,6 +846,9 @@
_cairo_bo_event_queue_insert_if_intersect_below_current_y
(cairo_bo_event_queue_
if (left == NULL || right == NULL)
return;
+ if (!_cairo_may_intersect (left, right))
+ return;
+
/* The names "left" and "right" here are correct descriptions of
* the order of the two edges within the active edge list. So if a
* slope comparison also puts left less than right, then we know
--
1.4.4.1
-------------- next part --------------
>From 407ebcaf756bd406dd00bb1b2fc9c3db777461dc Mon Sep 17 00:00:00 2001
From: Rafael Villar Burke <pachi at rvburke.com>
Date: Mon, 4 Dec 2006 22:31:15 +0100
Subject: [PATCH] Improve tessellator performance using fast paths
---
src/cairo-bentley-ottmann.c | 54 +++++++++++++++++++++++++++++++++++++++++++
1 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/src/cairo-bentley-ottmann.c b/src/cairo-bentley-ottmann.c
index 2c0f98c..faa8235 100644
--- a/src/cairo-bentley-ottmann.c
+++ b/src/cairo-bentley-ottmann.c
@@ -783,6 +783,57 @@ _cairo_bo_event_queue_fini (cairo_bo_event_queue_t *event_queue)
free (event_queue->sorted_startstop_event_ptrs);
}
+static int
+_crossp(int32_t const a, int32_t const b, int32_t const c, int32_t const d)
+{
+ /* Cross product qualitative evaluation.
+ * The function returns 1 for a positive value of the cross product,
+ * -1 for a negative one, and 0 for a zero value.
+ * Sign is computed using heuristics and the given return order has proved
+ * best results with the existing test cases. */
+ cairo_bool_t ad_eq_zero = (a==0 || d==0);
+ cairo_bool_t ad_gt_zero = (a>0 && d>0)||(a<0 && d<0);
+ cairo_bool_t ad_lt_zero = (a < 0) ^ (d < 0);
+ cairo_bool_t bc_eq_zero = (b==0 || c==0);
+ cairo_bool_t bc_gt_zero = (b>0 && c>0)||(b<0 && c<0);
+ cairo_bool_t bc_lt_zero = (b < 0) ^(c < 0);
+ cairo_bool_t ad_gt_bc = (a>=b && d>c) || (a>=c && d>b) || (a>b && d>=c) || (a>c && d>=b);
+ cairo_bool_t ad_lt_bc = (a<=b && d<c) || (a<=c && d<b) || (a<b && d<=c) || (a<c && d<=b);
+ /*cairo_bool_t ad_eq_bc = !ad_gt_bc && !ad_lt_bc;*/
+
+ if ((!ad_lt_zero && bc_lt_zero) || (ad_gt_zero && bc_eq_zero) || (ad_gt_zero && bc_gt_zero && ad_gt_bc) || (ad_lt_zero && bc_lt_zero && ad_gt_bc)) return 1;
+ if ((ad_eq_zero && bc_gt_zero) || (ad_lt_zero && !bc_lt_zero) || (ad_gt_zero && bc_gt_zero && ad_lt_bc) || (ad_lt_zero && bc_lt_zero && ad_lt_bc)) return -1;
+ /*if ((ad_eq_zero && bc_eq_zero) || (ad_eq_bc)) return 0;*/
+ return 0;
+}
+
+static cairo_bool_t
+_cairo_may_intersect (cairo_bo_edge_t const *edge1,
+ cairo_bo_edge_t const *edge2)
+{
+ /* Check wether the given edges MAY intersect using colinearity and proper
+ * intersection checks.
+ * If intersection is possible return TRUE, otherwise return FALSE */
+ cairo_bo_point32_t a = edge1->top;
+ cairo_bo_point32_t b = edge1->bottom;
+ cairo_bo_point32_t c = edge2->top;
+ cairo_bo_point32_t d = edge2->bottom;
+
+ int crossp_abc = _crossp((b.x - a.x), (c.x - a.x), (b.y - a.y), (c.y - a.y));
+ int crossp_abd = _crossp((b.x - a.x), (d.x - a.x), (b.y - a.y), (d.y - a.y));
+ int crossp_cda = _crossp((d.x - c.x), (a.x - c.x), (d.y - c.y), (a.y - c.y));
+ int crossp_cdb = _crossp((d.x - c.x), (b.x - c.x), (d.y - c.y), (b.y - c.y));
+
+ cairo_bool_t proper_intersection = (!(crossp_abc > 0) ^ !(crossp_abd > 0)) &&
+ (!(crossp_cda > 0) ^ !(crossp_cdb > 0));
+
+ /* Proper intersection check: if proper intersection return TRUE */
+ if (proper_intersection) return TRUE;
+ /* Colinearity check: if colinear, return True */
+ else if (crossp_abc== 0 || crossp_abd==0 || crossp_cda==0 || crossp_cdb==0) return TRUE;
+ else return FALSE;
+}
+
static void
_cairo_bo_event_queue_insert_if_intersect_below_current_y (cairo_bo_event_queue_t *event_queue,
cairo_bo_edge_t *left,
@@ -795,6 +846,9 @@ _cairo_bo_event_queue_insert_if_intersect_below_current_y (cairo_bo_event_queue_
if (left == NULL || right == NULL)
return;
+ if (!_cairo_may_intersect (left, right))
+ return;
+
/* The names "left" and "right" here are correct descriptions of
* the order of the two edges within the active edge list. So if a
* slope comparison also puts left less than right, then we know
--
1.4.4.1
More information about the cairo
mailing list