[cairo] Brand new _cairo_lround implementation
Daniel Amelang
daniel.amelang at gmail.com
Tue Dec 5 23:59:03 PST 2006
On 12/5/06, Carl Worth <cworth at cworth.org> wrote:
> On Tue, 05 Dec 2006 14:08:17 0800, Bill Spitzak wrote:
> > Daniels code does roundawayfromzero, apparently. I think floor or
> > ceil is needed consistently. Otherwise when something crosses zero it
> > may offset by 1 pixel. Could this be the problem?
>
> Yes, that's exactly the problem.
>
> Daniel has just tracked that down and is now working to rewrite his
> magic conversion code to consistently round toward one infinity or the
> other rather than away from zero.
And here it is. A new _cairo_lround that performs arithmetic rounding
(round toward pos. infinity in the .5 cases) without incurring
performance regressions. We did lose some of the valid input range,
though, as it only works on doubles in the range [INT_MIN / 4, INT_MAX
/ 4]. So we lose 2 bits at the top.
Tested on nautilus, no buggy behavior. I'll document the internal
workings of the function later. It's been a long day.
Dan
 next part 
From nobody Mon Sep 17 00:00:00 2001
From: Dan Amelang <dan at amelang.net>
Date: Tue Dec 5 23:45:15 2006 0800
Subject: [PATCH] Change _cairo_lround to use arithmetic rounding
This fixes the text rendering bug reported here:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217819
No performance impact on x86. On the 770, I see minor speedups in text_solid
and text_image (~1.05x).

src/cairo.c  55 ++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 44 insertions(+), 11 deletions()
b6a44bdee67f20ffb58e9ca2a44ea367cc610110
diff git a/src/cairo.c b/src/cairo.c
index ce6a728..30672d7 100644
 a/src/cairo.c
+++ b/src/cairo.c
@@ 3197,27 +3197,60 @@ _cairo_restrict_value (double *value, do
*value = max;
}
/* This function is identical to the C99 function lround, except that it
 * uses banker's rounding instead of arithmetic rounding. This implementation
 * is much faster (on the platforms we care about) than lround, round, rint,
 * lrint or float (d + 0.5).
+/* This function is identical to the C99 function lround(), except that it
+ * performs arithmetic rounding (instead of awayfromzero rounding) and
+ * has a valid input range of [INT_MIN / 4, INT_MAX / 4] instead of
+ * [INT_MIN, INT_MAX]. It is much faster on both x86 and FPUless systems
+ * than other commonly used methods for rounding (lround, round, rint, lrint
+ * or float (d + 0.5)).
*
 * For an explanation of the inner workings of this implemenation, see the
 * documentation for _cairo_fixed_from_double.
+ * The reason why this function is much faster on x86 than other
+ * methods is due to the fact that it avoids the fldcw instruction.
+ * This instruction incurs a large performance penalty on modern Intel
+ * processors due to how it prevents efficient instruction pipelining.
+ *
+ * The reason why this function is much faster on FPUless systems is for
+ * an entirely different reason. All common rounding methods involve multiple
+ * floatingpoint operations. Each one of these operations has to be
+ * emulated in software, which adds up to be a large performance penalty.
+ * This function doesn't perform any floatingpoint calculations, and thus
+ * avoids this penalty.
+ */
+/* XXX needs inline comments explaining the internal magic
*/
#define CAIRO_MAGIC_NUMBER_INT (6755399441055744.0)
int
_cairo_lround (double d)
{
union {
+ uint32_t ui32[2];
double d;
 int32_t i[2];
} u;
+ uint32_t exponent, most_significant_word, least_significant_word;
+ int32_t integer_result;
+
+ u.d = d;
 u.d = d + CAIRO_MAGIC_NUMBER_INT;
#ifdef FLOAT_WORDS_BIGENDIAN
 return u.i[1];
+ most_significant_word = u.ui32[0];
+ least_significant_word = u.ui32[1];
#else
 return u.i[0];
+ most_significant_word = u.ui32[1];
+ least_significant_word = u.ui32[0];
#endif
+
+ exponent = 1052  ((most_significant_word >> 20) & 0x7FF);
+ integer_result = ((most_significant_word & 0xFFFFF)  0x100000) << 10;
+ integer_result = (least_significant_word >> 22);
+
+ if (most_significant_word & 0x80000000)
+ integer_result = integer_result;
+
+ integer_result >>= exponent;
+
+ if (exponent > 30)
+ integer_result = 0;
+
+ integer_result = (integer_result + 1) >> 1;
+
+ return integer_result;
}

1.2.6
 next part 
From nobody Mon Sep 17 00:00:00 2001
From: Dan Amelang <dan at amelang.net>
Date: Tue Dec 5 23:49:52 2006 0800
Subject: [PATCH] Make _cairo_lround an inline function
This speeds up the text_solid and text_image perf tests on the 770 by another
~1.05x. No performance change on x86.

src/cairo.c  58 
src/cairoint.h  59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 60 deletions()
505670aaa40e59c700685d6e085721ee834fc5ab
diff git a/src/cairo.c b/src/cairo.c
index 30672d7..15230ed 100644
 a/src/cairo.c
+++ b/src/cairo.c
@@ 3196,61 +3196,3 @@ _cairo_restrict_value (double *value, do
else if (*value > max)
*value = max;
}

/* This function is identical to the C99 function lround(), except that it
 * performs arithmetic rounding (instead of awayfromzero rounding) and
 * has a valid input range of [INT_MIN / 4, INT_MAX / 4] instead of
 * [INT_MIN, INT_MAX]. It is much faster on both x86 and FPUless systems
 * than other commonly used methods for rounding (lround, round, rint, lrint
 * or float (d + 0.5)).
 *
 * The reason why this function is much faster on x86 than other
 * methods is due to the fact that it avoids the fldcw instruction.
 * This instruction incurs a large performance penalty on modern Intel
 * processors due to how it prevents efficient instruction pipelining.
 *
 * The reason why this function is much faster on FPUless systems is for
 * an entirely different reason. All common rounding methods involve multiple
 * floatingpoint operations. Each one of these operations has to be
 * emulated in software, which adds up to be a large performance penalty.
 * This function doesn't perform any floatingpoint calculations, and thus
 * avoids this penalty.
 */
/* XXX needs inline comments explaining the internal magic
 */
int
_cairo_lround (double d)
{
 union {
 uint32_t ui32[2];
 double d;
 } u;
 uint32_t exponent, most_significant_word, least_significant_word;
 int32_t integer_result;

 u.d = d;

#ifdef FLOAT_WORDS_BIGENDIAN
 most_significant_word = u.ui32[0];
 least_significant_word = u.ui32[1];
#else
 most_significant_word = u.ui32[1];
 least_significant_word = u.ui32[0];
#endif

 exponent = 1052  ((most_significant_word >> 20) & 0x7FF);
 integer_result = ((most_significant_word & 0xFFFFF)  0x100000) << 10;
 integer_result = (least_significant_word >> 22);

 if (most_significant_word & 0x80000000)
 integer_result = integer_result;

 integer_result >>= exponent;

 if (exponent > 30)
 integer_result = 0;

 integer_result = (integer_result + 1) >> 1;

 return integer_result;
}
diff git a/src/cairoint.h b/src/cairoint.h
index 1f74d62..4820e16 100755
 a/src/cairoint.h
+++ b/src/cairoint.h
@@ 1155,8 +1155,63 @@ typedef struct _cairo_stroke_face {
cairo_private void
_cairo_restrict_value (double *value, double min, double max);
cairo_private int
_cairo_lround (double d);
+/* This function is identical to the C99 function lround(), except that it
+ * performs arithmetic rounding (instead of awayfromzero rounding) and
+ * has a valid input range of [INT_MIN / 4, INT_MAX / 4] instead of
+ * [INT_MIN, INT_MAX]. It is much faster on both x86 and FPUless systems
+ * than other commonly used methods for rounding (lround, round, rint, lrint
+ * or float (d + 0.5)).
+ *
+ * The reason why this function is much faster on x86 than other
+ * methods is due to the fact that it avoids the fldcw instruction.
+ * This instruction incurs a large performance penalty on modern Intel
+ * processors due to how it prevents efficient instruction pipelining.
+ *
+ * The reason why this function is much faster on FPUless systems is for
+ * an entirely different reason. All common rounding methods involve multiple
+ * floatingpoint operations. Each one of these operations has to be
+ * emulated in software, which adds up to be a large performance penalty.
+ * This function doesn't perform any floatingpoint calculations, and thus
+ * avoids this penalty.
+ */
+/* XXX needs inline comments explaining the internal magic
+ */
+static inline int
+_cairo_lround (double d)
+{
+ union {
+ uint32_t ui32[2];
+ double d;
+ } u;
+ uint32_t exponent, most_significant_word, least_significant_word;
+ int32_t integer_result;
+
+ u.d = d;
+
+#ifdef FLOAT_WORDS_BIGENDIAN
+ most_significant_word = u.ui32[0];
+ least_significant_word = u.ui32[1];
+#else
+ most_significant_word = u.ui32[1];
+ least_significant_word = u.ui32[0];
+#endif
+
+ exponent = 1052  ((most_significant_word >> 20) & 0x7FF);
+ integer_result = ((most_significant_word & 0xFFFFF)  0x100000) << 10;
+ integer_result = (least_significant_word >> 22);
+
+ if (most_significant_word & 0x80000000)
+ integer_result = integer_result;
+
+ integer_result >>= exponent;
+
+ if (exponent > 30)
+ integer_result = 0;
+
+ integer_result = (integer_result + 1) >> 1;
+
+ return integer_result;
+}
/* cairo_fixed.c */
cairo_private cairo_fixed_t

1.2.6
More information about the cairo
mailing list