[cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions
siarhei.siamashka at gmail.com
Wed Sep 15 04:21:23 PDT 2010
On Monday 13 September 2010 19:56:16 Georgi Beloev wrote:
> > It's not directly related to your patch. But I wonder if it makes sense
> > to also add a manual loop unrolling to the C variant of pixman_fill32
> > and the other similar functions in order to get better general
> > performance on most of SIMD-less simple processors such as MIPS32R2,
> > older ARMs and the others (who knows, maybe even opencores.org ones)?
> Yes, it is just simple loop unrolling. The code may also benefit from using
> "restrict" pointers to tell the compiler that it is safe to unroll the
> loops. Unfortunately, this is a C99 keyword and we are not compiling in
> C99 mode. Another useful optimization is adding prefetch-for-store
> instructions. However, in some cases these instructions can degrade
> performance rather than improve it.
If you are using gcc, it is still possible to use __restrict keyword (with two
underscores) even not in C99 mode.
Though unfortunately 'restrict' keyword only started to be somewhat useful in
gcc 4.4, and they have broken it in gcc 4.5 (which otherwise seems to be a
good release): http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45176
It's more like a chicken/egg problem. Before the software developers start
using 'restrict' keyword and actually verifying whether it helps, gcc is
going to have poor support for it.
The other useful thing for C code is probably __builtin_expect()
> >> +// mips32r2_composite_over_n_8_8888_inner(uint32_t *dest, const
> >> uint32_t src, +// const uint8_t *mask, int width)
> > <snip>
> > The over_n_8_8888 operation is primarily used for rendering text glyphs,
> > so it: - typically works with small images, maybe having size somewhere
> > around 10x20 - typically has a lot of 0x00 and 0xFF values in the mask
> > - quite often uses opaque source
> Very useful to know! Is there a place where things like that are
> summarized? I couldn't find any pixman documentation.
AFAIK it is not summarized anywhere. But generally the work-flow may be similar
to the following:
1. You find some function which shows up high in oprofile report and is still
2. Try to optimize it
3. Run profiling again to verify that the optimizations really provide the
4. If the performance did not improve (much), then you start scratching your
head and trying to actually understand in detail what that particular function
is doing, what kind of input data it gets, how many iteration runs, etc.
The step 4. may be actually done before 2. :)
The fact that something is written in assembly does not automatically ensure
that it will actually run faster. So confirming the effect of optimizations
and getting some numbers is always nice.
PS. There is another add_8000_8000 operation which typically accompanies
over_n_8_8888 when dealing with fonts, it makes sense to optimize it too.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: This is a digitally signed message part.
More information about the cairo