[cairo] rewriting libpixman

Jeff Muizelaar jeff at infidigm.net
Wed Mar 28 09:44:46 PDT 2007


I have put up a copy of cairo that has a partially rewritten libpixman
in the pixman-new branch of my cairo tree.

The main change is that all of the hand written special cases have been
replaced with machine generated special cases. The impact of this is
that now all regular compositing operations will not use the
compositeGeneral case. In addition, the python script allows handwritten
substitutions to be made. In this case, I've added substitutions using
liboil for the operations that it supports. These substitutions cover
alot of the common cases and cause a lot of the speed up seen below.
I've also included a patch against liboil that adds another useful
operation for cairo.

Implementation
--------------
The code currently uses a very large table of function pointers to look
up the compositing kernel needed. I'm a little uncomfortable with this
approach, however the alternatives aren't great either. In the worst
case it should make it possible to determine the best case performance
of some of the less often used compositing operators.

The basic premise behind the python script/compiler is to take a tree
like the following and create the corresponding c code.

          ast = op_loop(
                    op_pack(
                        op_over(            
                            op_unpack(source_in),       
                            op_unpack(dest_in)          
                        ),                      
                        dest_out                
                    ),
                    dest_out,           
                    inputs              
                )

likewise here is the corresponding tree when a mask is being used.

          ast = op_loop(
                    op_pack(
                        op_over(            
                            op_in(                      
                                op_unpack(source_in),           
                                op_unpack(mask_in)
                            ),
                            op_unpack(dest_in)    
                        ),
                        dest_out
                    ),
                    dest_out,
                    inputs
                )

This is done for all the permutations of (operator, src_format,
dest_format, mask_format, solid_src) with an entry in the dispatch table
for each.

Currently, the generated code is messy as are the details of the
generator but the basic idea is there. A snapshot of the
generator/compiler is at
http://people.freedesktop.org/~jrmuizel/libcomposite-preview.tar.bz2

Results
-------
Some tests have sped up signficantly (4.5x). glitz-test with software
rendering goes from approx 20fps to 25fps. However, there are some
regressions as well.  The most notible one that shows up with cairo-perf
is long-lines-uncropped.  This test is doing the operation
ARGB32_over_ARGB32_in_A8 which does have a liboil substitute. This
substitute is usually much faster. However, the cairo equivalent has a
special case when both the SRC and and MASK pixels are opaque.  This
special case allows the current cairo code to outpreform the liboil code
when it can take advantage of the special case about 85% or more of the
time.

Currently my feelings are that this should be dealt with at a level
above libpixman. It seems bad to ask the compositor to do OVER IN when
for > 85% of the time it could just be doing SRC. However, I have no
idea how practical it is to actually do this.

cairo-perf-diff with the smaller changes removed:

Speedups
========
image-rgba      paint_solid_rgba_over-256    1.16 0.55% ->   0.25 2.07%:  4.54x speedup
███▌
image-rgb       paint_solid_rgba_over-256    1.15 0.79% ->   0.26 0.53%:  4.45x speedup
███▌
image-rgba      paint_solid_rgba_over-512    4.67 0.86% ->   1.19 1.41%:  3.94x speedup
███
image-rgb       paint_solid_rgba_over-512    4.67 0.83% ->   1.20 1.19%:  3.91x speedup
██▉
image-rgba      paint_image_rgba_over-256    0.81 1.18% ->   0.29 2.69%:  2.83x speedup
█▉
image-rgba    paint_similar_rgba_over-256    0.81 0.92% ->   0.29 1.93%:  2.82x speedup
█▉
image-rgb     paint_similar_rgba_over-256    0.80 1.35% ->   0.30 1.68%:  2.67x speedup
█▋
image-rgb       paint_image_rgba_over-256    0.81 1.80% ->   0.32 1.12%:  2.55x speedup
█▌
image-rgba      paint_image_rgba_over-512    3.29 0.18% ->   1.38 2.49%:  2.40x speedup
█▍
image-rgba    paint_similar_rgba_over-512    3.28 1.23% ->   1.40 1.89%:  2.35x speedup
█▍
image-rgb     paint_similar_rgba_over-512    3.29 1.23% ->   1.43 1.46%:  2.30x speedup
█▎
image-rgb       paint_image_rgba_over-512    3.29 0.23% ->   1.47 0.32%:  2.25x speedup
█▎
image-rgb        fill_solid_rgba_over-256    1.48 0.26% ->   0.76 0.49%:  1.94x speedup
▉
image-rgba       fill_solid_rgba_over-256    1.44 1.37% ->   0.76 0.74%:  1.89x speedup
▉
image-rgba       paint_solid_rgb_over-256    0.09 0.19% ->   0.05 0.20%:  1.62x speedup
▋
image-rgba     paint_solid_rgb_source-256    0.09 0.49% ->   0.05 0.02%:  1.62x speedup
▋
image-rgba    paint_solid_rgba_source-256    0.09 1.14% ->   0.05 0.10%:  1.60x speedup
▋
image-rgb        paint_solid_rgb_over-256    0.08 0.45% ->   0.05 1.81%:  1.51x speedup
▌
image-rgb     paint_solid_rgba_source-256    0.08 0.15% ->   0.06 1.87%:  1.51x speedup
▌
image-rgba       fill_solid_rgba_over-128    0.47 0.91% ->   0.31 0.19%:  1.51x speedup
▌
image-rgb        fill_solid_rgba_over-128    0.47 0.23% ->   0.32 0.15%:  1.50x speedup
▌
image-rgb      paint_solid_rgb_source-256    0.08 0.98% ->   0.05 0.34%:  1.49x speedup
▌
image-rgba    text_radial_rgba_source-128    3.07 0.14% ->   2.19 1.30%:  1.40x speedup
▍
image-rgba     text_radial_rgb_source-128    3.00 0.13% ->   2.17 0.28%:  1.38x speedup
▍
image-rgba     text_radial_rgb_source-256   12.06 0.11% ->   8.78 0.66%:  1.37x speedup
▍
image-rgb      stroke_solid_rgba_over-256    2.93 1.10% ->   2.15 0.82%:  1.36x speedup
▍
image-rgba   paint_similar_rgb_source-256    0.26 0.70% ->   0.19 5.21%:  1.36x speedup
▍
image-rgba     stroke_solid_rgba_over-256    2.92 1.34% ->   2.17 1.69%:  1.35x speedup
▍
image-rgba    text_radial_rgba_source-256   12.12 0.28% ->   9.05 0.48%:  1.34x speedup
▍
image-rgba     paint_similar_rgb_over-256    0.25 2.48% ->   0.20 4.01%:  1.30x speedup
▎
image-rgba    text_linear_rgba_source-256   10.37 0.48% ->   8.20 0.71%:  1.26x speedup
▎
image-rgba       paint_image_rgb_over-256    0.26 1.26% ->   0.20 4.81%:  1.26x speedup
▎
image-rgba     paint_image_rgb_source-256    0.25 0.85% ->   0.20 5.25%:  1.26x speedup
▎
image-rgba     text_linear_rgb_source-256   10.39 0.57% ->   8.29 0.22%:  1.25x speedup
▎
image-rgba    text_linear_rgba_source-128    2.62 1.99% ->   2.09 1.18%:  1.25x speedup
▎
image-rgba     text_linear_rgb_source-128    2.62 0.38% ->   2.10 0.39%:  1.25x speedup
▎
image-rgb     text_radial_rgba_source-128    3.05 0.79% ->   2.47 0.08%:  1.24x speedup
▎
image-rgb      stroke_solid_rgba_over-128    1.27 1.00% ->   1.03 0.45%:  1.23x speedup
▎
image-rgb      text_radial_rgb_source-128    3.00 0.28% ->   2.46 1.23%:  1.22x speedup
▎
image-rgb      text_radial_rgb_source-256   12.05 0.53% ->   9.89 0.26%:  1.22x speedup
▎
image-rgba     fill_solid_rgba_source-256    1.99 1.65% ->   1.64 0.32%:  1.21x speedup
▎
image-rgba    fill_linear_rgba_source-256    3.04 1.45% ->   2.52 2.55%:  1.21x speedup
▎
image-rgba     stroke_solid_rgba_over-128    1.25 0.35% ->   1.04 0.93%:  1.21x speedup
▎
image-rgba      fill_solid_rgb_source-256    1.98 1.39% ->   1.65 1.51%:  1.20x speedup
▎
image-rgb        fill_solid_rgba_over-64     0.24 0.37% ->   0.20 0.25%:  1.20x speedup

Slowdowns
=========
image-rgba              subimage_copy-64     0.00 1.78% ->   0.00 0.31%:  1.46x slowdown
▌
image-rgb     paint_image_rgba_source-256    0.19 5.22% ->   0.26 2.14%:  1.35x slowdown
▍
image-rgb   paint_similar_rgba_source-256    0.19 0.63% ->   0.25 2.53%:  1.35x slowdown
▍
image-rgba       long-lines-uncropped-100    4.61 0.85% ->   6.06 0.44%:  1.32x slowdown
▍
image-rgb        long-lines-uncropped-100    4.61 0.80% ->   6.06 0.47%:  1.31x slowdown
▍
image-rgba        fill_solid_rgb_over-256    0.58 0.74% ->   0.76 1.23%:  1.30x slowdown
▎
image-rgb         fill_solid_rgb_over-256    0.59 2.15% ->   0.76 1.13%:  1.29x slowdown
▎
image-rgb         fill_image_rgb_over-256    0.69 1.17% ->   0.87 0.59%:  1.26x slowdown
▎
image-rgb       fill_similar_rgb_over-256    0.69 0.67% ->   0.87 1.47%:  1.26x slowdown
▎
image-rgba              subimage_copy-128    0.00 1.97% ->   0.00 1.20%:  1.25x slowdown
▎
image-rgb        paint_image_rgb_over-256    0.22 1.76% ->   0.27 2.77%:  1.24x slowdown
▎
image-rgba      fill_similar_rgb_over-256    0.69 2.05% ->   0.85 1.61%:  1.23x slowdown
▎
image-rgba        fill_image_rgb_over-256    0.70 1.28% ->   0.86 0.88%:  1.23x slowdown
▎
image-rgb         fill_solid_rgb_over-128    0.26 0.44% ->   0.32 0.20%:  1.22x slowdown
▎
image-rgb          long-lines-cropped-100    4.06 0.65% ->   4.93 0.66%:  1.22x slowdown
▎
image-rgb      paint_image_rgb_source-256    0.22 2.63% ->   0.26 3.32%:  1.21x slowdown
▎
image-rgba         long-lines-cropped-100    4.07 0.43% ->   4.94 0.75%:  1.21x slowdown
▎
image-rgba        fill_solid_rgb_over-128    0.26 0.76% ->   0.31 0.22%:  1.21x slowdown
▎
image-rgb               subimage_copy-64     0.00 1.58% ->   0.01 0.45%:  1.20x slowdown
▎
image-rgba              subimage_copy-512    0.00 0.93% ->   0.00 1.86%:  1.20x slowdown
▎
image-rgba          mosaic_fill_lines-800   97.33 0.07% -> 115.13 0.03%:  1.18x slowdown
▏
image-rgb           mosaic_fill_lines-800   97.38 0.06% -> 114.97 0.04%:  1.18x slowdown
▏
image-rgb      paint_similar_rgb_over-256    0.22 1.28% ->   0.25 3.59%:  1.18x slowdown
▏
image-rgb    paint_similar_rgb_source-256    0.21 0.69% ->   0.25 4.23%:  1.18x slowdown
▏
image-rgba             unaligned_clip-100    0.05 0.57% ->   0.07 0.82%:  1.16x slowdown
▏
image-rgb              unaligned_clip-100    0.06 1.49% ->   0.06 0.77%:  1.16x slowdown
▏
image-rgb    fill_similar_rgba_source-256    1.89 0.46% ->   2.17 0.45%:  1.15x slowdown
▏
image-rgb       stroke_solid_rgb_over-256    1.89 3.32% ->   2.16 0.38%:  1.15x slowdown
▏
image-rgb      fill_image_rgba_source-256    1.90 1.05% ->   2.17 0.35%:  1.14x slowdown
▏
image-rgba              subimage_copy-256    0.00 2.47% ->   0.00 2.15%:  1.14x slowdown
▏
image-rgb     fill_similar_rgb_source-256    1.91 1.52% ->   2.17 0.71%:  1.14x slowdown
▏
image-rgba              subimage_copy-32     0.00 2.92% ->   0.00 8.91%:  1.13x slowdown
▏
image-rgb          box-outline-stroke-100    0.01 1.69% ->   0.01 0.83%:  1.12x slowdown
▏
image-rgb       fill_image_rgb_source-256    1.90 2.40% ->   2.13 1.87%:  1.12x slowdown
▏
image-rgba      stroke_image_rgb_over-256    2.07 2.52% ->   2.31 0.49%:  1.11x slowdown
▏
image-rgba         mosaic_fill_curves-800  169.25 0.06% -> 187.61 0.04%:  1.11x slowdown
▏
image-rgba    stroke_similar_rgb_over-256    2.09 1.34% ->   2.32 0.36%:  1.11x slowdown
▏
image-rgb          mosaic_fill_curves-800  169.35 0.23% -> 187.45 0.04%:  1.11x slowdown
▏
image-rgb     stroke_similar_rgb_over-256    2.09 0.88% ->   2.31 0.24%:  1.11x slowdown


Index: liboil/liboilclasses.h
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboilclasses.h,v
retrieving revision 1.24
diff -u -r1.24 liboilclasses.h
--- liboil/liboilclasses.h	23 Mar 2007 00:54:49 -0000	1.24
+++ liboil/liboilclasses.h	27 Mar 2007 20:16:24 -0000
@@ -128,6 +128,7 @@
 OIL_DECLARE_CLASS(composite_in_over_argb);
 OIL_DECLARE_CLASS(composite_in_over_argb_const_mask);
 OIL_DECLARE_CLASS(composite_in_over_argb_const_src);
+OIL_DECLARE_CLASS(composite_in_over_rgb);
 OIL_DECLARE_CLASS(composite_over_argb);
 OIL_DECLARE_CLASS(composite_over_argb_const_src);
 OIL_DECLARE_CLASS(composite_over_u8);
@@ -274,15 +275,15 @@
 OIL_DECLARE_CLASS(mas4_add_s16);
 OIL_DECLARE_CLASS(mas8_across_add_s16);
 OIL_DECLARE_CLASS(mas8_add_s16);
-OIL_DECLARE_CLASS(max_f32);
-OIL_DECLARE_CLASS(max_f64);
 OIL_DECLARE_CLASS(maximum_f32);
+OIL_DECLARE_CLASS(maximum_f64);
 OIL_DECLARE_CLASS(md5);
 OIL_DECLARE_CLASS(mdct12_f64);
 OIL_DECLARE_CLASS(mdct36_f64);
 OIL_DECLARE_CLASS(merge_linear_argb);
 OIL_DECLARE_CLASS(merge_linear_u8);
 OIL_DECLARE_CLASS(minimum_f32);
+OIL_DECLARE_CLASS(minimum_f64);
 OIL_DECLARE_CLASS(mix_u8);
 OIL_DECLARE_CLASS(mt19937);
 OIL_DECLARE_CLASS(mult8x8_s16);
Index: liboil/liboilfuncs-04.h
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboilfuncs-04.h,v
retrieving revision 1.16
diff -u -r1.16 liboilfuncs-04.h
--- liboil/liboilfuncs-04.h	23 Mar 2007 00:54:49 -0000	1.16
+++ liboil/liboilfuncs-04.h	27 Mar 2007 20:16:24 -0000
@@ -128,6 +128,7 @@
 void oil_composite_in_over_argb (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n);
 void oil_composite_in_over_argb_const_mask (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_1, int n);
 void oil_composite_in_over_argb_const_src (uint32_t * i_n, const uint32_t * s1_1, const uint8_t * s2_n, int n);
+void oil_composite_in_over_rgb (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n);
 void oil_composite_over_argb (uint32_t * i_n, const uint32_t * s1_n, int n);
 void oil_composite_over_argb_const_src (uint32_t * i_n, const uint32_t * s1_1, int n);
 void oil_composite_over_u8 (uint8_t * i_n, const uint8_t * s1_n, int n);
@@ -274,15 +275,15 @@
 void oil_mas4_add_s16 (int16_t * d, const int16_t * s1, const int16_t * s2_np3, const int16_t * s3_4, const int16_t * s4_2, int n);
 void oil_mas8_across_add_s16 (int16_t * d, const int16_t * s1, const int16_t * s2_nx8, int sstr2, const int16_t * s3_8, const int16_t * s4_2, int n);
 void oil_mas8_add_s16 (int16_t * d, const int16_t * s1, const int16_t * s2_np7, const int16_t * s3_8, const int16_t * s4_2, int n);
-void oil_max_f32 (float * d, const float * s1, int n);
-void oil_max_f64 (double * d, const double * s1, int n);
 void oil_maximum_f32 (float * d, const float * s1, const float * s2, int n);
+void oil_maximum_f64 (float * d, const float * s1, const float * s2, int n);
 void oil_md5 (uint32_t * i_4, const uint32_t * s_16);
 void oil_mdct12_f64 (double * d_6, const double * s_12);
 void oil_mdct36_f64 (double * d_18, const double * s_36);
 void oil_merge_linear_argb (uint32_t * d_n, const uint32_t * s_n, const uint32_t * s2_n, const uint32_t * s3_1, int n);
 void oil_merge_linear_u8 (uint8_t * d_n, const uint8_t * s_n, const uint8_t * s2_n, const uint32_t * s3_1, int n);
 void oil_minimum_f32 (float * d, const float * s1, const float * s2, int n);
+void oil_minimum_f64 (float * d, const float * s1, const float * s2, int n);
 void oil_mix_u8 (uint8_t * dest, const uint8_t * src1, const uint8_t * src2, const uint8_t * src3, int n);
 void oil_mt19937 (uint32_t * d_624, uint32_t * i_624);
 void oil_mult8x8_s16 (int16_t * d_8x8, const int16_t * s1_8x8, const int16_t * s2_8x8, int ds, int ss1, int ss2);
Index: liboil/liboilfuncs.h
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboilfuncs.h,v
retrieving revision 1.49
diff -u -r1.49 liboilfuncs.h
--- liboil/liboilfuncs.h	23 Mar 2007 00:54:49 -0000	1.49
+++ liboil/liboilfuncs.h	27 Mar 2007 20:16:24 -0000
@@ -312,6 +312,9 @@
 extern OilFunctionClass *oil_function_class_ptr_composite_in_over_argb_const_src;
 typedef void (*_oil_type_composite_in_over_argb_const_src)(uint32_t * i_n, const uint32_t * s1_1, const uint8_t * s2_n, int n);
 #define oil_composite_in_over_argb_const_src ((_oil_type_composite_in_over_argb_const_src)(*(void **)oil_function_class_ptr_composite_in_over_argb_const_src))
+extern OilFunctionClass *oil_function_class_ptr_composite_in_over_rgb;
+typedef void (*_oil_type_composite_in_over_rgb)(uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n);
+#define oil_composite_in_over_rgb ((_oil_type_composite_in_over_rgb)(*(void **)oil_function_class_ptr_composite_in_over_rgb))
 extern OilFunctionClass *oil_function_class_ptr_composite_over_argb;
 typedef void (*_oil_type_composite_over_argb)(uint32_t * i_n, const uint32_t * s1_n, int n);
 #define oil_composite_over_argb ((_oil_type_composite_over_argb)(*(void **)oil_function_class_ptr_composite_over_argb))
@@ -750,15 +753,12 @@
 extern OilFunctionClass *oil_function_class_ptr_mas8_add_s16;
 typedef void (*_oil_type_mas8_add_s16)(int16_t * d, const int16_t * s1, const int16_t * s2_np7, const int16_t * s3_8, const int16_t * s4_2, int n);
 #define oil_mas8_add_s16 ((_oil_type_mas8_add_s16)(*(void **)oil_function_class_ptr_mas8_add_s16))
-extern OilFunctionClass *oil_function_class_ptr_max_f32;
-typedef void (*_oil_type_max_f32)(float * d, const float * s1, int n);
-#define oil_max_f32 ((_oil_type_max_f32)(*(void **)oil_function_class_ptr_max_f32))
-extern OilFunctionClass *oil_function_class_ptr_max_f64;
-typedef void (*_oil_type_max_f64)(double * d, const double * s1, int n);
-#define oil_max_f64 ((_oil_type_max_f64)(*(void **)oil_function_class_ptr_max_f64))
 extern OilFunctionClass *oil_function_class_ptr_maximum_f32;
 typedef void (*_oil_type_maximum_f32)(float * d, const float * s1, const float * s2, int n);
 #define oil_maximum_f32 ((_oil_type_maximum_f32)(*(void **)oil_function_class_ptr_maximum_f32))
+extern OilFunctionClass *oil_function_class_ptr_maximum_f64;
+typedef void (*_oil_type_maximum_f64)(float * d, const float * s1, const float * s2, int n);
+#define oil_maximum_f64 ((_oil_type_maximum_f64)(*(void **)oil_function_class_ptr_maximum_f64))
 extern OilFunctionClass *oil_function_class_ptr_md5;
 typedef void (*_oil_type_md5)(uint32_t * i_4, const uint32_t * s_16);
 #define oil_md5 ((_oil_type_md5)(*(void **)oil_function_class_ptr_md5))
@@ -777,6 +777,9 @@
 extern OilFunctionClass *oil_function_class_ptr_minimum_f32;
 typedef void (*_oil_type_minimum_f32)(float * d, const float * s1, const float * s2, int n);
 #define oil_minimum_f32 ((_oil_type_minimum_f32)(*(void **)oil_function_class_ptr_minimum_f32))
+extern OilFunctionClass *oil_function_class_ptr_minimum_f64;
+typedef void (*_oil_type_minimum_f64)(float * d, const float * s1, const float * s2, int n);
+#define oil_minimum_f64 ((_oil_type_minimum_f64)(*(void **)oil_function_class_ptr_minimum_f64))
 extern OilFunctionClass *oil_function_class_ptr_mix_u8;
 typedef void (*_oil_type_mix_u8)(uint8_t * dest, const uint8_t * src1, const uint8_t * src2, const uint8_t * src3, int n);
 #define oil_mix_u8 ((_oil_type_mix_u8)(*(void **)oil_function_class_ptr_mix_u8))
Index: liboil/liboiltrampolines.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/liboiltrampolines.c,v
retrieving revision 1.22
diff -u -r1.22 liboiltrampolines.c
--- liboil/liboiltrampolines.c	23 Mar 2007 00:54:49 -0000	1.22
+++ liboil/liboiltrampolines.c	27 Mar 2007 20:16:24 -0000
@@ -951,6 +951,16 @@
   ((void (*)(uint32_t * i_n, const uint32_t * s1_1, const uint8_t * s2_n, int n))(_oil_function_class_composite_in_over_argb_const_src.func))(i_n, s1_1, s2_n, n);
 }
 
+#undef oil_composite_in_over_rgb
+void
+oil_composite_in_over_rgb (uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n)
+{
+  if (_oil_function_class_composite_in_over_rgb.func == NULL) {
+    oil_class_optimize (&_oil_function_class_composite_in_over_rgb);
+  }
+  ((void (*)(uint32_t * i_n, const uint32_t * s1_n, const uint8_t * s2_n, int n))(_oil_function_class_composite_in_over_rgb.func))(i_n, s1_n, s2_n, n);
+}
+
 #undef oil_composite_over_argb
 void
 oil_composite_over_argb (uint32_t * i_n, const uint32_t * s1_n, int n)
@@ -2411,26 +2421,6 @@
   ((void (*)(int16_t * d, const int16_t * s1, const int16_t * s2_np7, const int16_t * s3_8, const int16_t * s4_2, int n))(_oil_function_class_mas8_add_s16.func))(d, s1, s2_np7, s3_8, s4_2, n);
 }
 
-#undef oil_max_f32
-void
-oil_max_f32 (float * d, const float * s1, int n)
-{
-  if (_oil_function_class_max_f32.func == NULL) {
-    oil_class_optimize (&_oil_function_class_max_f32);
-  }
-  ((void (*)(float * d, const float * s1, int n))(_oil_function_class_max_f32.func))(d, s1, n);
-}
-
-#undef oil_max_f64
-void
-oil_max_f64 (double * d, const double * s1, int n)
-{
-  if (_oil_function_class_max_f64.func == NULL) {
-    oil_class_optimize (&_oil_function_class_max_f64);
-  }
-  ((void (*)(double * d, const double * s1, int n))(_oil_function_class_max_f64.func))(d, s1, n);
-}
-
 #undef oil_maximum_f32
 void
 oil_maximum_f32 (float * d, const float * s1, const float * s2, int n)
@@ -2441,6 +2431,16 @@
   ((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_maximum_f32.func))(d, s1, s2, n);
 }
 
+#undef oil_maximum_f64
+void
+oil_maximum_f64 (float * d, const float * s1, const float * s2, int n)
+{
+  if (_oil_function_class_maximum_f64.func == NULL) {
+    oil_class_optimize (&_oil_function_class_maximum_f64);
+  }
+  ((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_maximum_f64.func))(d, s1, s2, n);
+}
+
 #undef oil_md5
 void
 oil_md5 (uint32_t * i_4, const uint32_t * s_16)
@@ -2501,6 +2501,16 @@
   ((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_minimum_f32.func))(d, s1, s2, n);
 }
 
+#undef oil_minimum_f64
+void
+oil_minimum_f64 (float * d, const float * s1, const float * s2, int n)
+{
+  if (_oil_function_class_minimum_f64.func == NULL) {
+    oil_class_optimize (&_oil_function_class_minimum_f64);
+  }
+  ((void (*)(float * d, const float * s1, const float * s2, int n))(_oil_function_class_minimum_f64.func))(d, s1, s2, n);
+}
+
 #undef oil_mix_u8
 void
 oil_mix_u8 (uint8_t * dest, const uint8_t * src1, const uint8_t * src2, const uint8_t * src3, int n)
Index: liboil/c/composite.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/c/composite.c,v
retrieving revision 1.2
diff -u -r1.2 composite.c
--- liboil/c/composite.c	22 May 2006 22:31:47 -0000	1.2
+++ liboil/c/composite.c	27 Mar 2007 20:16:24 -0000
@@ -313,6 +313,61 @@
 OIL_DEFINE_IMPL (composite_in_over_argb_fast, composite_in_over_argb);
 
 static void
+composite_in_over_rgb_fast (uint32_t *dest, const uint32_t *src,
+    const uint8_t *mask, int n)
+{
+  for (; n > 0; n--) {
+    uint32_t d = *dest, s = *src++;
+    uint32_t s1, s2, d1, d2, sa;
+    uint8_t m = *mask++;
+
+    s1 = s & 0x00ff00ff;
+    /* fill the missing alpha byte */
+    s2 = (((s | (0xff<<24)) >> 8) & 0x00ff00ff);
+
+    /* in */
+    s1 *= m;
+    s1 += 0x00800080;
+    s1 += (s1 >> 8) & 0x00ff00ff;
+    s1 >>= 8;
+    s1 &= 0x00ff00ff;
+
+    s2 *= m;
+    s2 += 0x00800080;
+    s2 += (s2 >> 8) & 0x00ff00ff;
+    s2 >>= 8;
+    s2 &= 0x00ff00ff;
+
+    /* over */
+    sa = (~s2 >> 16) & 0xff;
+
+    d1 = d & 0x00ff00ff;
+    d1 *= sa;
+    d1 += 0x00800080;
+    d1 += (d1 >> 8) & 0x00ff00ff;
+    d1 >>= 8;
+    d1 &= 0x00ff00ff;
+    d1 += s1;
+    d1 |= 0x01000100 - ((d1 >> 8) & 0x00ff00ff);
+    d1 &= 0x00ff00ff;
+
+    d2 = (d >> 8) & 0x00ff00ff;
+    d2 *= sa;
+    d2 += 0x00800080;
+    d2 += (d2 >> 8) & 0x00ff00ff;
+    d2 >>= 8;
+    d2 &= 0x00ff00ff;
+    d2 += s2;
+    d2 |= 0x01000100 - ((d2 >> 8) & 0x00ff00ff);
+    d2 &= 0x00ff00ff;
+
+    *dest++ = d1 | (d2 << 8);
+  }
+}
+OIL_DEFINE_IMPL (composite_in_over_rgb_fast, composite_in_over_rgb);
+
+
+static void
 composite_in_over_argb_const_src_fast (uint32_t *dest, const uint32_t *src,
     const uint8_t *mask, int n)
 {
Index: liboil/i386/composite_i386.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/i386/composite_i386.c,v
retrieving revision 1.4
diff -u -r1.4 composite_i386.c
--- liboil/i386/composite_i386.c	29 Jan 2006 02:55:37 -0000	1.4
+++ liboil/i386/composite_i386.c	27 Mar 2007 20:16:24 -0000
@@ -36,10 +36,12 @@
 OIL_DECLARE_CLASS (composite_in_argb_const_src);
 OIL_DECLARE_CLASS (composite_in_argb_const_mask);
 OIL_DECLARE_CLASS (composite_over_argb);
+OIL_DECLARE_CLASS (composite_over_rgb);
 OIL_DECLARE_CLASS (composite_over_argb_const_src);
 OIL_DECLARE_CLASS (composite_add_argb);
 OIL_DECLARE_CLASS (composite_add_argb_const_src);
 OIL_DECLARE_CLASS (composite_in_over_argb);
+OIL_DECLARE_CLASS (composite_in_over_rgb);
 OIL_DECLARE_CLASS (composite_in_over_argb_const_src);
 OIL_DECLARE_CLASS (composite_in_over_argb_const_mask);
 
@@ -911,6 +913,49 @@
 OIL_DEFINE_IMPL_FULL (composite_in_over_argb_mmx, composite_in_over_argb, OIL_IMPL_FLAG_MMX | OIL_IMPL_FLAG_MMXEXT);
 
 static void
+composite_in_over_rgb_mmx (uint32_t *dest, uint32_t *src, uint8_t *mask, int n)
+{
+  __asm__ __volatile__ (
+      MMX_LOAD_CONSTANTS
+      "1:\n"
+      "  movd (%2), %%mm0\n"
+      "  punpcklbw %%mm7, %%mm0\n"
+      "  pshufw $0x00, %%mm0, %%mm1\n"
+
+      "  movl (%1), %%eax\n"
+      "  or $0xff000000, %%eax\n"
+      "  movd %%eax, %%mm2\n"
+      "  punpcklbw %%mm7, %%mm2\n"
+
+      MMX_MULDIV255(mm2, mm1)
+
+      "  movd (%0), %%mm0\n"
+      "  punpcklbw %%mm7, %%mm0\n"
+
+      "  pshufw $0xff, %%mm2, %%mm1\n"
+      "  pxor %%mm5, %%mm1\n"
+
+      MMX_MULDIV255(mm0, mm1)
+
+      "  paddw %%mm0, %%mm2\n"
+      "  packuswb %%mm2, %%mm2\n"
+
+      "  movd %%mm2, (%0)\n"
+      "  addl $4, %0\n"
+      "  addl $4, %1\n"
+      "  addl $1, %2\n"
+      "  decl %3\n"
+      "  jnz 1b\n"
+      "  emms\n"
+      :"+r" (dest), "+r" (src), "+r" (mask), "+r" (n)
+      :
+      :"eax");
+
+}
+OIL_DEFINE_IMPL_FULL (composite_in_over_rgb_mmx, composite_in_over_rgb, OIL_IMPL_FLAG_MMX | OIL_IMPL_FLAG_MMXEXT);
+
+
+static void
 composite_in_over_argb_const_src_mmx (uint32_t *dest, uint32_t *src, uint8_t *mask, int n)
 {
   __asm__ __volatile__ (
Index: liboil/ref/composite.c
===================================================================
RCS file: /cvs/liboil/liboil/liboil/ref/composite.c,v
retrieving revision 1.7
diff -u -r1.7 composite.c
--- liboil/ref/composite.c	20 Dec 2005 01:28:18 -0000	1.7
+++ liboil/ref/composite.c	27 Mar 2007 20:16:24 -0000
@@ -180,6 +180,20 @@
 OIL_DEFINE_CLASS_FULL (composite_in_over_argb,
     "uint32_t *i_n, uint32_t *s1_n, uint8_t *s2_n, int n",
     composite_test);
+
+/**
+ * oil_composite_in_over_rgb:
+ * @i_n: DEST
+ * @s1_n: SRC
+ * @s2_n: MASK
+ * @n: number of elements
+ *
+ * Performs the compositing operation DEST = (SRC IN MASK) OVER DEST.
+ */
+OIL_DEFINE_CLASS_FULL (composite_in_over_rgb,
+    "uint32_t *i_n, uint32_t *s1_n, uint8_t *s2_n, int n",
+    composite_test);
+
 /**
  * oil_composite_in_over_argb_const_src:
  * @i_n: DEST
@@ -378,6 +392,31 @@
 OIL_DEFINE_IMPL_REF (composite_in_over_argb_ref, composite_in_over_argb);
 
 static void
+composite_in_over_rgb_ref (uint32_t *dest, const uint32_t *src, const uint8_t *mask, int n)
+{
+  int i;
+  uint8_t a;
+  uint32_t color;
+
+  for(i=0;i<n;i++){
+    color = oil_argb(
+        COMPOSITE_IN(0xff, mask[i]),
+        COMPOSITE_IN(oil_argb_R(src[i]), mask[i]),
+        COMPOSITE_IN(oil_argb_G(src[i]), mask[i]),
+        COMPOSITE_IN(oil_argb_B(src[i]), mask[i]));
+    a = oil_argb_A(color);
+    dest[i] = oil_argb(
+        COMPOSITE_OVER(oil_argb_A(dest[i]),oil_argb_A(color),a),
+        COMPOSITE_OVER(oil_argb_R(dest[i]),oil_argb_R(color),a),
+        COMPOSITE_OVER(oil_argb_G(dest[i]),oil_argb_G(color),a),
+        COMPOSITE_OVER(oil_argb_B(dest[i]),oil_argb_B(color),a));
+  }
+
+}
+OIL_DEFINE_IMPL_REF (composite_in_over_rgb_ref, composite_in_over_rgb);
+
+
+static void
 composite_in_over_argb_const_src_ref (uint32_t *dest, const uint32_t *src, const uint8_t *mask, int n)
 {
   int i;


More information about the cairo mailing list