[cairo] rewriting libpixman
Jeff Muizelaar
jeff at infidigm.net
Wed Mar 28 09:44:46 PDT 2007
I have put up a copy of cairo that has a partially rewritten libpixman
in the pixman-new branch of my cairo tree.
The main change is that all of the hand written special cases have been
replaced with machine generated special cases. The impact of this is
that now all regular compositing operations will not use the
compositeGeneral case. In addition, the python script allows handwritten
substitutions to be made. In this case, I've added substitutions using
liboil for the operations that it supports. These substitutions cover
alot of the common cases and cause a lot of the speed up seen below.
I've also included a patch against liboil that adds another useful
operation for cairo.
Implementation
--------------
The code currently uses a very large table of function pointers to look
up the compositing kernel needed. I'm a little uncomfortable with this
approach, however the alternatives aren't great either. In the worst
case it should make it possible to determine the best case performance
of some of the less often used compositing operators.
The basic premise behind the python script/compiler is to take a tree
like the following and create the corresponding c code.
ast = op_loop(
op_pack(
op_over(
op_unpack(source_in),
op_unpack(dest_in)
),
dest_out
),
dest_out,
inputs
)
likewise here is the corresponding tree when a mask is being used.
ast = op_loop(
op_pack(
op_over(
op_in(
op_unpack(source_in),
op_unpack(mask_in)
),
op_unpack(dest_in)
),
dest_out
),
dest_out,
inputs
)
This is done for all the permutations of (operator, src_format,
dest_format, mask_format, solid_src) with an entry in the dispatch table
for each.
Currently, the generated code is messy as are the details of the
generator but the basic idea is there. A snapshot of the
generator/compiler is at
http://people.freedesktop.org/~jrmuizel/libcomposite-preview.tar.bz2
Results
-------
Some tests have sped up signficantly (4.5x). glitz-test with software
rendering goes from approx 20fps to 25fps. However, there are some
regressions as well. The most notible one that shows up with cairo-perf
is long-lines-uncropped. This test is doing the operation
ARGB32_over_ARGB32_in_A8 which does have a liboil substitute. This
substitute is usually much faster. However, the cairo equivalent has a
special case when both the SRC and and MASK pixels are opaque. This
special case allows the current cairo code to outpreform the liboil code
when it can take advantage of the special case about 85% or more of the
time.
Currently my feelings are that this should be dealt with at a level
above libpixman. It seems bad to ask the compositor to do OVER IN when
for > 85% of the time it could just be doing SRC. However, I have no
idea how practical it is to actually do this.
cairo-perf-diff with the smaller changes removed:
Speedups
========
image-rgba paint_solid_rgba_over-256 1.16 0.55% -> 0.25 2.07%: 4.54x speedup
███▌
image-rgb paint_solid_rgba_over-256 1.15 0.79% -> 0.26 0.53%: 4.45x speedup
███▌
image-rgba paint_solid_rgba_over-512 4.67 0.86% -> 1.19 1.41%: 3.94x speedup
███
image-rgb paint_solid_rgba_over-512 4.67 0.83% -> 1.20 1.19%: 3.91x speedup
██▉
image-rgba paint_image_rgba_over-256 0.81 1.18% -> 0.29 2.69%: 2.83x speedup
█▉
image-rgba paint_similar_rgba_over-256 0.81 0.92% -> 0.29 1.93%: 2.82x speedup
█▉
image-rgb paint_similar_rgba_over-256 0.80 1.35% -> 0.30 1.68%: 2.67x speedup
█▋
image-rgb paint_image_rgba_over-256 0.81 1.80% -> 0.32 1.12%: 2.55x speedup
█▌
image-rgba paint_image_rgba_over-512 3.29 0.18% -> 1.38 2.49%: 2.40x speedup
█▍
image-rgba paint_similar_rgba_over-512 3.28 1.23% -> 1.40 1.89%: 2.35x speedup
█▍
image-rgb paint_similar_rgba_over-512 3.29 1.23% -> 1.43 1.46%: 2.30x speedup
█▎
image-rgb paint_image_rgba_over-512 3.29 0.23% -> 1.47 0.32%: 2.25x speedup
█▎
image-rgb fill_solid_rgba_over-256 1.48 0.26% -> 0.76 0.49%: 1.94x speedup
▉
image-rgba fill_solid_rgba_over-256 1.44 1.37% -> 0.76 0.74%: 1.89x speedup
▉
image-rgba paint_solid_rgb_over-256 0.09 0.19% -> 0.05 0.20%: 1.62x speedup
▋
image-rgba paint_solid_rgb_source-256 0.09 0.49% -> 0.05 0.02%: 1.62x speedup
▋
image-rgba paint_solid_rgba_source-256 0.09 1.14% -> 0.05 0.10%: 1.60x speedup
▋
image-rgb paint_solid_rgb_over-256 0.08 0.45% -> 0.05 1.81%: 1.51x speedup
▌
image-rgb paint_solid_rgba_source-256 0.08 0.15% -> 0.06 1.87%: 1.51x speedup
▌
image-rgba fill_solid_rgba_over-128 0.47 0.91% -> 0.31 0.19%: 1.51x speedup
▌
image-rgb fill_solid_rgba_over-128 0.47 0.23% -> 0.32 0.15%: 1.50x speedup
▌
image-rgb paint_solid_rgb_source-256 0.08 0.98% -> 0.05 0.34%: 1.49x speedup
▌
image-rgba text_radial_rgba_source-128 3.07 0.14% -> 2.19 1.30%: 1.40x speedup
▍
image-rgba text_radial_rgb_source-128 3.00 0.13% -> 2.17 0.28%: 1.38x speedup
▍
image-rgba text_radial_rgb_source-256 12.06 0.11% -> 8.78 0.66%: 1.37x speedup
▍
image-rgb stroke_solid_rgba_over-256 2.93 1.10% -> 2.15 0.82%: 1.36x speedup
▍
image-rgba paint_similar_rgb_source-256 0.26 0.70% -> 0.19 5.21%: 1.36x speedup
▍
image-rgba stroke_solid_rgba_over-256 2.92 1.34% -> 2.17 1.69%: 1.35x speedup
▍
image-rgba text_radial_rgba_source-256 12.12 0.28% -> 9.05 0.48%: 1.34x speedup
▍
image-rgba paint_similar_rgb_over-256 0.25 2.48% -> 0.20 4.01%: 1.30x speedup
▎
image-rgba text_linear_rgba_source-256 10.37 0.48% -> 8.20 0.71%: 1.26x speedup
▎
image-rgba paint_image_rgb_over-256 0.26 1.26% -> 0.20 4.81%: 1.26x speedup
▎
image-rgba paint_image_rgb_source-256 0.25 0.85% -> 0.20 5.25%: 1.26x speedup
▎
image-rgba text_linear_rgb_source-256 10.39 0.57% -> 8.29 0.22%: 1.25x speedup
▎
image-rgba text_linear_rgba_source-128 2.62 1.99% -> 2.09 1.18%: 1.25x speedup
▎
image-rgba text_linear_rgb_source-128 2.62 0.38% -> 2.10 0.39%: 1.25x speedup
▎
image-rgb text_radial_rgba_source-128 3.05 0.79% -> 2.47 0.08%: 1.24x speedup
▎
image-rgb stroke_solid_rgba_over-128 1.27 1.00% -> 1.03 0.45%: 1.23x speedup
▎
image-rgb text_radial_rgb_source-128 3.00 0.28% -> 2.46 1.23%: 1.22x speedup
▎
image-rgb text_radial_rgb_source-256 12.05 0.53% -> 9.89 0.26%: 1.22x speedup
▎
image-rgba fill_solid_rgba_source-256 1.99 1.65% -> 1.64 0.32%: 1.21x speedup
▎
image-rgba fill_linear_rgba_source-256 3.04 1.45% -> 2.52 2.55%: 1.21x speedup
▎
image-rgba stroke_solid_rgba_over-128 1.25 0.35% -> 1.04 0.93%: 1.21x speedup
▎
image-rgba fill_solid_rgb_source-256 1.98 1.39% -> 1.65 1.51%: 1.20x speedup
▎
image-rgb fill_solid_rgba_over-64 0.24 0.37% -> 0.20 0.25%: 1.20x speedup
Slowdowns
=========
image-rgba subimage_copy-64 0.00 1.78% -> 0.00 0.31%: 1.46x slowdown
▌
image-rgb paint_image_rgba_source-256 0.19 5.22% -> 0.26 2.14%: 1.35x slowdown
▍
image-rgb paint_similar_rgba_source-256 0.19 0.63% -> 0.25 2.53%: 1.35x slowdown
▍
image-rgba long-lines-uncropped-100 4.61 0.85% -> 6.06 0.44%: 1.32x slowdown
▍
image-rgb long-lines-uncropped-100 4.61 0.80% -> 6.06 0.47%: 1.31x slowdown
▍
image-rgba fill_solid_rgb_over-256 0.58 0.74% -> 0.76 1.23%: 1.30x slowdown
▎
image-rgb fill_solid_rgb_over-256 0.59 2.15% -> 0.76 1.13%: 1.29x slowdown
▎
image-rgb fill_image_rgb_over-256 0.69 1.17% -> 0.87 0.59%: 1.26x slowdown
▎
image-rgb fill_similar_rgb_over-256 0.69 0.67% -> 0.87 1.47%: 1.26x slowdown
▎
image-rgba subimage_copy-128 0.00 1.97% -> 0.00 1.20%: 1.25x slowdown
▎
image-rgb paint_image_rgb_over-256 0.22 1.76% -> 0.27 2.77%: 1.24x slowdown
▎
image-rgba fill_similar_rgb_over-256 0.69 2.05% -> 0.85 1.61%: 1.23x slowdown
▎
image-rgba fill_image_rgb_over-256 0.70 1.28% -> 0.86 0.88%: 1.23x slowdown
▎
image-rgb fill_solid_rgb_over-128 0.26 0.44% -> 0.32 0.20%: 1.22x slowdown
▎
image-rgb long-lines-cropped-100 4.06 0.65% -> 4.93 0.66%: 1.22x slowdown
▎
image-rgb paint_image_rgb_source-256 0.22 2.63% -> 0.26 3.32%: 1.21x slowdown
▎
image-rgba long-lines-cropped-100 4.07 0.43% -> 4.94 0.75%: 1.21x slowdown
▎
image-rgba fill_solid_rgb_over-128 0.26 0.76% -> 0.31 0.22%: 1.21x slowdown
▎
image-rgb subimage_copy-64 0.00 1.58% -> 0.01 0.45%: 1.20x slowdown
▎
image-rgba subimage_copy-512 0.00 0.93% -> 0.00 1.86%: 1.20x slowdown
▎
image-rgba mosaic_fill_lines-800 97.33 0.07% -> 115.13 0.03%: 1.18x slowdown
▏
image-rgb mosaic_fill_lines-800 97.38 0.06% -> 114.97 0.04%: 1.18x slowdown
▏
image-rgb paint_similar_rgb_over-256 0.22 1.28% -> 0.25 3.59%: 1.18x slowdown
▏
image-rgb paint_similar_rgb_source-256 0.21 0.69% -> 0.25 4.23%: 1.18x slowdown
▏
image-rgba unaligned_clip-100 0.05 0.57% -> 0.07 0.82%: 1.16x slowdown
▏
image-rgb unaligned_clip-100 0.06 1.49% -> 0.06 0.77%: 1.16x slowdown
▏
image-rgb fill_similar_rgba_source-256 1.89 0.46% -> 2.17 0.45%: 1.15x slowdown
▏
image-rgb stroke_solid_rgb_over-256 1.89 3.32% -> 2.16 0.38%: 1.15x slowdown
▏
image-rgb fill_image_rgba_source-256 1.90 1.05% -> 2.17 0.35%: 1.14x slowdown
▏
image-rgba subimage_copy-256 0.00 2.47% -> 0.00 2.15%: 1.14x slowdown
▏
image-rgb fill_similar_rgb_source-256 1.91 1.52% -> 2.17 0.71%: 1.14x slowdown
▏
image-rgba subimage_copy-32 0.00 2.92% -> 0.00 8.91%: 1.13x slowdown
▏
image-rgb box-outline-stroke-100 0.01 1.69% -> 0.01 0.83%: 1.12x slowdown
▏
image-rgb fill_image_rgb_source-256 1.90 2.40% -> 2.13 1.87%: 1.12x slowdown
▏
image-rgba stroke_image_rgb_over-256 2.07 2.52% -> 2.31 0.49%: 1.11x slowdown
▏
image-rgba mosaic_fill_curves-800 169.25 0.06% -> 187.61 0.04%: 1.11x slowdown
▏
image-rgba stroke_similar_rgb_over-256 2.09 1.34% -> 2.32 0.36%: 1.11x slowdown
▏
image-rgb mosaic_fill_curves-800 169.35 0.23% -> 187.45 0.04%: 1.11x slowdown
▏
image-rgb stroke_similar_rgb_over-256 2.09 0.88% -> 2.31 0.24%: 1.11x slowdown
Index: liboil/liboilclasses.h
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboilclasses.h,v
retrieving revision 1.24
diff -u -r1.24 liboilclasses.h
--- liboil/liboilclasses.h 23 Mar 2007 00:54:49 -0000 1.24
+++ liboil/liboilclasses.h 27 Mar 2007 20:16:24 -0000
@@ -128,6 +128,7 @@
OIL_DECLARE_CLASS(composite_in_over_argb);
OIL_DECLARE_CLASS(composite_in_over_argb_const_mask);
OIL_DECLARE_CLASS(composite_in_over_argb_const_src);
+OIL_DECLARE_CLASS(composite_in_over_rgb);
OIL_DECLARE_CLASS(composite_over_argb);
OIL_DECLARE_CLASS(composite_over_argb_const_src);
OIL_DECLARE_CLASS(composite_over_u8);
@@ -274,15 +275,15 @@
OIL_DECLARE_CLASS(mas4_add_s16);
OIL_DECLARE_CLASS(mas8_across_add_s16);
OIL_DECLARE_CLASS(mas8_add_s16);
-OIL_DECLARE_CLASS(max_f32);
-OIL_DECLARE_CLASS(max_f64);
OIL_DECLARE_CLASS(maximum_f32);
+OIL_DECLARE_CLASS(maximum_f64);
OIL_DECLARE_CLASS(md5);
OIL_DECLARE_CLASS(mdct12_f64);
OIL_DECLARE_CLASS(mdct36_f64);
OIL_DECLARE_CLASS(merge_linear_argb);
OIL_DECLARE_CLASS(merge_linear_u8);
OIL_DECLARE_CLASS(minimum_f32);
+OIL_DECLARE_CLASS(minimum_f64);
OIL_DECLARE_CLASS(mix_u8);
OIL_DECLARE_CLASS(mt19937);
OIL_DECLARE_CLASS(mult8x8_s16);
Index: liboil/liboilfuncs-04.h
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboilfuncs-04.h,v
retrieving revision 1.16
diff -u -r1.16 liboilfuncs-04.h
--- liboil/liboilfuncs-04.h 23 Mar 2007 00:54:49 -0000 1.16
+++ liboil/liboilfuncs-04.h 27 Mar 2007 20:16:24 -0000
@@ -128,6 +128,7 @@
void oil_composite_in_over_argb (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n);
void oil_composite_in_over_argb_const_mask (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_1, int n);
void oil_composite_in_over_argb_const_src (uint32_t * i_n, const uint32_t * s1_1, const uint8_t * s2_n, int n);
+void oil_composite_in_over_rgb (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n);
void oil_composite_over_argb (uint32_t * i_n, const uint32_t * s1_n, int n);
void oil_composite_over_argb_const_src (uint32_t * i_n, const uint32_t * s1_1, int n);
void oil_composite_over_u8 (uint8_t * i_n, const uint8_t * s1_n, int n);
@@ -274,15 +275,15 @@
void oil_mas4_add_s16 (int16_t * d, const int16_t * s1, const int16_t * s2_np3, const int16_t * s3_4, const int16_t * s4_2, int n);
void oil_mas8_across_add_s16 (int16_t * d, const int16_t * s1, const int16_t * s2_nx8, int sstr2, const int16_t * s3_8, const int16_t * s4_2, int n);
void oil_mas8_add_s16 (int16_t * d, const int16_t * s1, const int16_t * s2_np7, const int16_t * s3_8, const int16_t * s4_2, int n);
-void oil_max_f32 (float * d, const float * s1, int n);
-void oil_max_f64 (double * d, const double * s1, int n);
void oil_maximum_f32 (float * d, const float * s1, const float * s2, int n);
+void oil_maximum_f64 (float * d, const float * s1, const float * s2, int n);
void oil_md5 (uint32_t * i_4, const uint32_t * s_16);
void oil_mdct12_f64 (double * d_6, const double * s_12);
void oil_mdct36_f64 (double * d_18, const double * s_36);
void oil_merge_linear_argb (uint32_t * d_n, const uint32_t * s_n, const uint32_t * s2_n, const uint32_t * s3_1, int n);
void oil_merge_linear_u8 (uint8_t * d_n, const uint8_t * s_n, const uint8_t * s2_n, const uint32_t * s3_1, int n);
void oil_minimum_f32 (float * d, const float * s1, const float * s2, int n);
+void oil_minimum_f64 (float * d, const float * s1, const float * s2, int n);
void oil_mix_u8 (uint8_t * dest, const uint8_t * src1, const uint8_t * src2, const uint8_t * src3, int n);
void oil_mt19937 (uint32_t * d_624, uint32_t * i_624);
void oil_mult8x8_s16 (int16_t * d_8x8, const int16_t * s1_8x8, const int16_t * s2_8x8, int ds, int ss1, int ss2);
Index: liboil/liboilfuncs.h
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboilfuncs.h,v
retrieving revision 1.49
diff -u -r1.49 liboilfuncs.h
--- liboil/liboilfuncs.h 23 Mar 2007 00:54:49 -0000 1.49
+++ liboil/liboilfuncs.h 27 Mar 2007 20:16:24 -0000
@@ -312,6 +312,9 @@
extern OilFunctionClass *oil_function_class_ptr_composite_in_over_argb_const_src;
typedef void (*_oil_type_composite_in_over_argb_const_src)(uint32_t * i_n, const uint32_t * s1_1, const uint8_t * s2_n, int n);
#define oil_composite_in_over_argb_const_src ((_oil_type_composite_in_over_argb_const_src)(*(void **)oil_function_class_ptr_composite_in_over_argb_const_src))
+extern OilFunctionClass *oil_function_class_ptr_composite_in_over_rgb;
+typedef void (*_oil_type_composite_in_over_rgb)(uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n);
+#define oil_composite_in_over_rgb ((_oil_type_composite_in_over_rgb)(*(void **)oil_function_class_ptr_composite_in_over_rgb))
extern OilFunctionClass *oil_function_class_ptr_composite_over_argb;
typedef void (*_oil_type_composite_over_argb)(uint32_t * i_n, const uint32_t * s1_n, int n);
#define oil_composite_over_argb ((_oil_type_composite_over_argb)(*(void **)oil_function_class_ptr_composite_over_argb))
@@ -750,15 +753,12 @@
extern OilFunctionClass *oil_function_class_ptr_mas8_add_s16;
typedef void (*_oil_type_mas8_add_s16)(int16_t * d, const int16_t * s1, const int16_t * s2_np7, const int16_t * s3_8, const int16_t * s4_2, int n);
#define oil_mas8_add_s16 ((_oil_type_mas8_add_s16)(*(void **)oil_function_class_ptr_mas8_add_s16))
-extern OilFunctionClass *oil_function_class_ptr_max_f32;
-typedef void (*_oil_type_max_f32)(float * d, const float * s1, int n);
-#define oil_max_f32 ((_oil_type_max_f32)(*(void **)oil_function_class_ptr_max_f32))
-extern OilFunctionClass *oil_function_class_ptr_max_f64;
-typedef void (*_oil_type_max_f64)(double * d, const double * s1, int n);
-#define oil_max_f64 ((_oil_type_max_f64)(*(void **)oil_function_class_ptr_max_f64))
extern OilFunctionClass *oil_function_class_ptr_maximum_f32;
typedef void (*_oil_type_maximum_f32)(float * d, const float * s1, const float * s2, int n);
#define oil_maximum_f32 ((_oil_type_maximum_f32)(*(void **)oil_function_class_ptr_maximum_f32))
+extern OilFunctionClass *oil_function_class_ptr_maximum_f64;
+typedef void (*_oil_type_maximum_f64)(float * d, const float * s1, const float * s2, int n);
+#define oil_maximum_f64 ((_oil_type_maximum_f64)(*(void **)oil_function_class_ptr_maximum_f64))
extern OilFunctionClass *oil_function_class_ptr_md5;
typedef void (*_oil_type_md5)(uint32_t * i_4, const uint32_t * s_16);
#define oil_md5 ((_oil_type_md5)(*(void **)oil_function_class_ptr_md5))
@@ -777,6 +777,9 @@
extern OilFunctionClass *oil_function_class_ptr_minimum_f32;
typedef void (*_oil_type_minimum_f32)(float * d, const float * s1, const float * s2, int n);
#define oil_minimum_f32 ((_oil_type_minimum_f32)(*(void **)oil_function_class_ptr_minimum_f32))
+extern OilFunctionClass *oil_function_class_ptr_minimum_f64;
+typedef void (*_oil_type_minimum_f64)(float * d, const float * s1, const float * s2, int n);
+#define oil_minimum_f64 ((_oil_type_minimum_f64)(*(void **)oil_function_class_ptr_minimum_f64))
extern OilFunctionClass *oil_function_class_ptr_mix_u8;
typedef void (*_oil_type_mix_u8)(uint8_t * dest, const uint8_t * src1, const uint8_t * src2, const uint8_t * src3, int n);
#define oil_mix_u8 ((_oil_type_mix_u8)(*(void **)oil_function_class_ptr_mix_u8))
Index: liboil/liboiltrampolines.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboiltrampolines.c,v
retrieving revision 1.22
diff -u -r1.22 liboiltrampolines.c
--- liboil/liboiltrampolines.c 23 Mar 2007 00:54:49 -0000 1.22
+++ liboil/liboiltrampolines.c 27 Mar 2007 20:16:24 -0000
@@ -951,6 +951,16 @@
((void (*)(uint32_t * i_n, const uint32_t * s1_1, const uint8_t * s2_n, int n))(_oil_function_class_composite_in_over_argb_const_src.func))(i_n, s1_1, s2_n, n);
}
+#undef oil_composite_in_over_rgb
+void
+oil_composite_in_over_rgb (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n)
+{
+ if (_oil_function_class_composite_in_over_rgb.func == NULL) {
+ oil_class_optimize (&_oil_function_class_composite_in_over_rgb);
+ }
+ ((void (*)(uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n))(_oil_function_class_composite_in_over_rgb.func))(i_n, s1_n, s2_n, n);
+}
+
#undef oil_composite_over_argb
void
oil_composite_over_argb (uint32_t * i_n, const uint32_t * s1_n, int n)
@@ -2411,26 +2421,6 @@
((void (*)(int16_t * d, const int16_t * s1, const int16_t * s2_np7, const int16_t * s3_8, const int16_t * s4_2, int n))(_oil_function_class_mas8_add_s16.func))(d, s1, s2_np7, s3_8, s4_2, n);
}
-#undef oil_max_f32
-void
-oil_max_f32 (float * d, const float * s1, int n)
-{
- if (_oil_function_class_max_f32.func == NULL) {
- oil_class_optimize (&_oil_function_class_max_f32);
- }
- ((void (*)(float * d, const float * s1, int n))(_oil_function_class_max_f32.func))(d, s1, n);
-}
-
-#undef oil_max_f64
-void
-oil_max_f64 (double * d, const double * s1, int n)
-{
- if (_oil_function_class_max_f64.func == NULL) {
- oil_class_optimize (&_oil_function_class_max_f64);
- }
- ((void (*)(double * d, const double * s1, int n))(_oil_function_class_max_f64.func))(d, s1, n);
-}
-
#undef oil_maximum_f32
void
oil_maximum_f32 (float * d, const float * s1, const float * s2, int n)
@@ -2441,6 +2431,16 @@
((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_maximum_f32.func))(d, s1, s2, n);
}
+#undef oil_maximum_f64
+void
+oil_maximum_f64 (float * d, const float * s1, const float * s2, int n)
+{
+ if (_oil_function_class_maximum_f64.func == NULL) {
+ oil_class_optimize (&_oil_function_class_maximum_f64);
+ }
+ ((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_maximum_f64.func))(d, s1, s2, n);
+}
+
#undef oil_md5
void
oil_md5 (uint32_t * i_4, const uint32_t * s_16)
@@ -2501,6 +2501,16 @@
((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_minimum_f32.func))(d, s1, s2, n);
}
+#undef oil_minimum_f64
+void
+oil_minimum_f64 (float * d, const float * s1, const float * s2, int n)
+{
+ if (_oil_function_class_minimum_f64.func == NULL) {
+ oil_class_optimize (&_oil_function_class_minimum_f64);
+ }
+ ((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_minimum_f64.func))(d, s1, s2, n);
+}
+
#undef oil_mix_u8
void
oil_mix_u8 (uint8_t * dest, const uint8_t * src1, const uint8_t * src2, const uint8_t * src3, int n)
Index: liboil/c/composite.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/c/composite.c,v
retrieving revision 1.2
diff -u -r1.2 composite.c
--- liboil/c/composite.c 22 May 2006 22:31:47 -0000 1.2
+++ liboil/c/composite.c 27 Mar 2007 20:16:24 -0000
@@ -313,6 +313,61 @@
OIL_DEFINE_IMPL (composite_in_over_argb_fast, composite_in_over_argb);
static void
+composite_in_over_rgb_fast (uint32_t *dest, const uint32_t *src,
+ const uint8_t *mask, int n)
+{
+ for (; n > 0; n--) {
+ uint32_t d = *dest, s = *src++;
+ uint32_t s1, s2, d1, d2, sa;
+ uint8_t m = *mask++;
+
+ s1 = s & 0x00ff00ff;
+ /* fill the missing alpha byte */
+ s2 = (((s | (0xff<<24)) >> 8) & 0x00ff00ff);
+
+ /* in */
+ s1 *= m;
+ s1 += 0x00800080;
+ s1 += (s1 >> 8) & 0x00ff00ff;
+ s1 >>= 8;
+ s1 &= 0x00ff00ff;
+
+ s2 *= m;
+ s2 += 0x00800080;
+ s2 += (s2 >> 8) & 0x00ff00ff;
+ s2 >>= 8;
+ s2 &= 0x00ff00ff;
+
+ /* over */
+ sa = (~s2 >> 16) & 0xff;
+
+ d1 = d & 0x00ff00ff;
+ d1 *= sa;
+ d1 += 0x00800080;
+ d1 += (d1 >> 8) & 0x00ff00ff;
+ d1 >>= 8;
+ d1 &= 0x00ff00ff;
+ d1 += s1;
+ d1 |= 0x01000100 - ((d1 >> 8) & 0x00ff00ff);
+ d1 &= 0x00ff00ff;
+
+ d2 = (d >> 8) & 0x00ff00ff;
+ d2 *= sa;
+ d2 += 0x00800080;
+ d2 += (d2 >> 8) & 0x00ff00ff;
+ d2 >>= 8;
+ d2 &= 0x00ff00ff;
+ d2 += s2;
+ d2 |= 0x01000100 - ((d2 >> 8) & 0x00ff00ff);
+ d2 &= 0x00ff00ff;
+
+ *dest++ = d1 | (d2 << 8);
+ }
+}
+OIL_DEFINE_IMPL (composite_in_over_rgb_fast, composite_in_over_rgb);
+
+
+static void
composite_in_over_argb_const_src_fast (uint32_t *dest, const uint32_t *src,
const uint8_t *mask, int n)
{
Index: liboil/i386/composite_i386.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/i386/composite_i386.c,v
retrieving revision 1.4
diff -u -r1.4 composite_i386.c
--- liboil/i386/composite_i386.c 29 Jan 2006 02:55:37 -0000 1.4
+++ liboil/i386/composite_i386.c 27 Mar 2007 20:16:24 -0000
@@ -36,10 +36,12 @@
OIL_DECLARE_CLASS (composite_in_argb_const_src);
OIL_DECLARE_CLASS (composite_in_argb_const_mask);
OIL_DECLARE_CLASS (composite_over_argb);
+OIL_DECLARE_CLASS (composite_over_rgb);
OIL_DECLARE_CLASS (composite_over_argb_const_src);
OIL_DECLARE_CLASS (composite_add_argb);
OIL_DECLARE_CLASS (composite_add_argb_const_src);
OIL_DECLARE_CLASS (composite_in_over_argb);
+OIL_DECLARE_CLASS (composite_in_over_rgb);
OIL_DECLARE_CLASS (composite_in_over_argb_const_src);
OIL_DECLARE_CLASS (composite_in_over_argb_const_mask);
@@ -911,6 +913,49 @@
OIL_DEFINE_IMPL_FULL (composite_in_over_argb_mmx, composite_in_over_argb, OIL_IMPL_FLAG_MMX | OIL_IMPL_FLAG_MMXEXT);
static void
+composite_in_over_rgb_mmx (uint32_t *dest, uint32_t *src, uint8_t *mask, int n)
+{
+ __asm__ __volatile__ (
+ MMX_LOAD_CONSTANTS
+ "1:\n"
+ " movd (%2), %%mm0\n"
+ " punpcklbw %%mm7, %%mm0\n"
+ " pshufw $0x00, %%mm0, %%mm1\n"
+
+ " movl (%1), %%eax\n"
+ " or $0xff000000, %%eax\n"
+ " movd %%eax, %%mm2\n"
+ " punpcklbw %%mm7, %%mm2\n"
+
+ MMX_MULDIV255(mm2, mm1)
+
+ " movd (%0), %%mm0\n"
+ " punpcklbw %%mm7, %%mm0\n"
+
+ " pshufw $0xff, %%mm2, %%mm1\n"
+ " pxor %%mm5, %%mm1\n"
+
+ MMX_MULDIV255(mm0, mm1)
+
+ " paddw %%mm0, %%mm2\n"
+ " packuswb %%mm2, %%mm2\n"
+
+ " movd %%mm2, (%0)\n"
+ " addl $4, %0\n"
+ " addl $4, %1\n"
+ " addl $1, %2\n"
+ " decl %3\n"
+ " jnz 1b\n"
+ " emms\n"
+ :"+r" (dest), "+r" (src), "+r" (mask), "+r" (n)
+ :
+ :"eax");
+
+}
+OIL_DEFINE_IMPL_FULL (composite_in_over_rgb_mmx, composite_in_over_rgb, OIL_IMPL_FLAG_MMX | OIL_IMPL_FLAG_MMXEXT);
+
+
+static void
composite_in_over_argb_const_src_mmx (uint32_t *dest, uint32_t *src, uint8_t *mask, int n)
{
__asm__ __volatile__ (
Index: liboil/ref/composite.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/ref/composite.c,v
retrieving revision 1.7
diff -u -r1.7 composite.c
--- liboil/ref/composite.c 20 Dec 2005 01:28:18 -0000 1.7
+++ liboil/ref/composite.c 27 Mar 2007 20:16:24 -0000
@@ -180,6 +180,20 @@
OIL_DEFINE_CLASS_FULL (composite_in_over_argb,
"uint32_t *i_n, uint32_t *s1_n, uint8_t *s2_n, int n",
composite_test);
+
+/**
+ * oil_composite_in_over_rgb:
+ * @i_n: DEST
+ * @s1_n: SRC
+ * @s2_n: MASK
+ * @n: number of elements
+ *
+ * Performs the compositing operation DEST = (SRC IN MASK) OVER DEST.
+ */
+OIL_DEFINE_CLASS_FULL (composite_in_over_rgb,
+ "uint32_t *i_n, uint32_t *s1_n, uint8_t *s2_n, int n",
+ composite_test);
+
/**
* oil_composite_in_over_argb_const_src:
* @i_n: DEST
@@ -378,6 +392,31 @@
OIL_DEFINE_IMPL_REF (composite_in_over_argb_ref, composite_in_over_argb);
static void
+composite_in_over_rgb_ref (uint32_t *dest, const uint32_t *src, const uint8_t *mask, int n)
+{
+ int i;
+ uint8_t a;
+ uint32_t color;
+
+ for(i=0;i<n;i++){
+ color = oil_argb(
+ COMPOSITE_IN(0xff, mask[i]),
+ COMPOSITE_IN(oil_argb_R(src[i]), mask[i]),
+ COMPOSITE_IN(oil_argb_G(src[i]), mask[i]),
+ COMPOSITE_IN(oil_argb_B(src[i]), mask[i]));
+ a = oil_argb_A(color);
+ dest[i] = oil_argb(
+ COMPOSITE_OVER(oil_argb_A(dest[i]),oil_argb_A(color),a),
+ COMPOSITE_OVER(oil_argb_R(dest[i]),oil_argb_R(color),a),
+ COMPOSITE_OVER(oil_argb_G(dest[i]),oil_argb_G(color),a),
+ COMPOSITE_OVER(oil_argb_B(dest[i]),oil_argb_B(color),a));
+ }
+
+}
+OIL_DEFINE_IMPL_REF (composite_in_over_rgb_ref, composite_in_over_rgb);
+
+
+static void
composite_in_over_argb_const_src_ref (uint32_t *dest, const uint32_t *src, const uint8_t *mask, int n)
{
int i;
More information about the cairo
mailing list