[cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions

Mon Sep 13 09:21:13 PDT 2010

Hi Soeren,

Thanks for pushing the patch to the pixman git branch and for cross-posting on the pixman mailing list. I wasn't aware of this mailing list as it isn't mentioned on cairographics.org. Please find answers to your specific questions below.

On Sep 12, 2010, at 5:08 AM, Soeren Sandmann wrote:

> Georgi Beloev <gb at beloev.net> writes:
> 
>> From 118b1f5596f72be7fed85ba408ff2961b3308038 Mon Sep 17 00:00:00 2001
>> From: Georgi Beloev <gb at beloev.net>
>> Date: Wed, 8 Sep 2010 17:34:22 -0700
>> Subject: [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions.
>> 
>> The following functions were implemented for MIPS32R2:
>>  - pixman_fill32()
>>  - fast_composite_over_n_8_8888()
>> 
>> The following functions were implemented for MIPS DSP ASE:
>>  - combine_over_u()
>>  - fast_composite_over_n_8_8888()
>> 
>> Additionally, MIPS DSP ASE uses the MIPS32R2 pixman_fill32() function.
>> 
>> Use configure commands similar to the ones below to select the target
>> processor and, correspondingly, the target instruction set:
> 
> Thanks for the patch - new CPU specific fast paths are always
> welcome. As Dmitri mentioned, please CC
> 
>    pixman at lists.freedesktop.org 
> 
> on Pixman patches. I have pushed it to a git branch here:
> 
>    http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=mips&id=8beceb87436c222a9581d48154075db4cc7e8234
> 
> Some comments:
> 
> - Does "make check" pass? I'm a little suspicious about these:
> 
> +       addu            $t3, $t3, $t9   // can't overflow; rev2: addu_s.ph
> +       addu            $t4, $t4, $t9   // can't overflow; rev2: addu_s.ph
> 
>  because overflow definitely can happen if the source has color
>  components that are larger than the alpha channel (which is legal,
>  though unusual). But I don't know the DSP ASE instruction set, so I
>  could be wrong about this.

Yes, all tests pass. Registers $t3 and $t4 above contain packed 16-bit multiplication results. Each of these results is computed as the product of an 8-bit color component and an 8-bit alpha value. The maximum value of the product is 0xFF x 0xFF = 0xFE01. Adding 0x80 (from register $t9) gives us a maximum result of 0xFE81. Hence, there is no overflow from the 16 LSBs into the 16 MSBs of the result.

> - I think this needs to be split into (at least) two commits, one that
>  adds the mips32r2 part, and one that adds the DSP ASE 1 part. It
>  would probably also be useful to enable the combiner and the
>  n_8_8888 fast paths separately so that they can be bisected.

The DSP ASE version uses the MIPS32R2 version of pixman_fill32(). Initially the plan was to develop separate versions of each function for each target instruction set (and DSP ASE Rev 2 was also on the list) but the DSP ASE version of fill32 was eventually dropped. There are no MIPS processors that implement DSP ASE but are not MIPS32R2-compatible. It is convenient to be able to fall back to MIPS32R2 implementation if the DSP ASE code does not provide enough speedup to justify its use.

> - Is there a reason to not do runtime checking? I realize that most
>  people using MIPS will likely do so on an embedded system where they
>  know ahead of time what the CPU supports, but we do have runtime
>  checking for the other CPU specific implementations.

Yes, at the present time this isn't very easy to do on MIPS Linux -- checking if DSP ASE is implemented requires the use of privileged instructions. MIPS is aware of this problem and will push an update to the kernel that enables the use of AT_HWCAP for this purpose.

> 
> - This:
> 
>> pixman_implementation_t *
>> _pixman_choose_implementation (void)
>> {
>> @@ -593,6 +604,19 @@ _pixman_choose_implementation (void)
>> 	return _pixman_implementation_create_vmx ();
>> #endif
>> 
>> +#ifdef USE_MIPS32R2
>> +	pixman_implementation_t *imp = NULL;
>> +    if (pixman_have_mips32r2 ())
>> +		imp = _pixman_implementation_create_mips32r2 (imp);
>> +
>> +#ifdef USE_MIPS_DSPASE1
>> +	if (pixman_have_mips_dspase1 ())
>> +		imp = _pixman_implementation_create_mips_dspase1 (imp);
>> +#endif // USE_MIPS_DSPASE1
>> +
>> +	return imp;
>> +#endif //  USE_MIPS32R2
>> +
>>     return _pixman_implementation_create_fast_path ();
>> }
> 
>  doesn't look quite right to me. If both have_mips32r2() and
>  have_dspase1() return FALSE, then this will return a NULL
>  implementation. 

True, I'll have to fix it. Right now it is not possible to happen because the runtime checks are hardwired to TRUE but when they become "live" there will be a prboem.

>  Also, if it is intentional that DSPASE1 is only used when MIPS32R2
>  is used, then it seems that this:
> 
>> +pixman_implementation_t *
>> +_pixman_implementation_create_mips_dspase1 (pixman_implementation_t *delegate)
>> +{
>> +	if (delegate == NULL)
>> +		delegate = _pixman_implementation_create_fast_path ();
>> +
>> +    pixman_implementation_t *imp =
>> +		_pixman_implementation_create (delegate, mips_dspase1_fast_paths);
>> +
>> +	imp->combine_32[PIXMAN_OP_OVER] = mips_dspase1_combine_over_u;
>> +
>> +    return imp;
>> +}
> 
>  doesn't need to check for NULL. (Also, please avoid variable
>  declaration in the middle of code; that applies to the mips32r2
>  version too).
> 
> - There are some style issues (see CODING_STYLE)
> 
>  - Use /* */ comments instead of //
> 
>  - Indents are four spaces
> 
>  - Put a space before parentheses
> 
>  - Don't leave in commented-out code like this:
> 
>     //              b = _pixman_implementation_fill(imp->delegate, bits, stride, bpp, x, y, width, height, xor);
> 
> I'm hoping someone with more experience with MIPS than I have can give
> more detailed comments on the assembly.

Thanks for the all the comments! I'll update the code per your comments and submit another patch. Should I mail it just to the pixman list or CC both cairo and pixman?

Cheers,
--
Georgi Beloev
gb at beloev.net