[cairo] Pixman refactoring, ARM and Altivec implementations needed

Sat May 30 10:23:06 PDT 2009

On 30-05-09 17:14, Siarhei Siamashka wrote:
> On Saturday 30 May 2009 01:46:28 ext Koen Kooi wrote:
>> On 29-05-09 22:16, Siarhei Siamashka wrote:
>>> Would it be a good idea to compile pixman-arm-simd.c with '-march=armv6',
>>> ignoring CFLAGS completely?
>>
>> Ignoring CFLAGS would be bad, since you might want for force an ABI that
>> isn't in the default compiler spec (like armv6+ isn't in yours).
>
> Thanks for joining the discussion. I hope that you can provide some useful
> input regarding the subject.
>
> Yes, I also have some worries about potential ABI related issues, that
> was the part of my post which you decided was not worth quoting.
> We may have PIC and TLS stuff, EABI/OABI, etc.
>
>> Having said that, if armv6+ isn't in the compiler spec, how did armv6
>> simd get enabled in the first place?
>
> The problem is that armv6 simd is *not* getting enabled in this case.
>
> I'll try to make it a bit more clear. Basically in pixman all the cpu specific
> optimizations are isolated into their own source files and these source
> files are compiled with gcc flags that are different from the rest of the
> library. Basically gcc flags are the same, but have extra flags added
> like  "-mmmx -Winline" (for MMX), "-mmmx -msse2 -Winline" (for
> SSE2), "-maltivec -mabi=altivec" (for altivec)

That would make sense for x86 and powerpc where there is practically 
only on ABI in use.

> and "-mfpu=neon -mfloat-abi=softfp" (for ARM NEON).

But for arm that would be the wrong thing to do, there are way too many 
ABIs around for pixman to guess CFLAGS. If someone is crazy enough to 
want an OABI build, he has to patch out '-mfloat-abi=softfp' in pixman 
makefiles. If I am crazy enough to use gcc 4.4.0 or CSL 2009q1 I am able 
to use hardfp, but I have to patch out '-mfloat-abi=softfp' as well.

> Support for ARMv6 optimizations is different in pixman, no extra flags get
> added for compiling 'pixman-arm-simd.c' at all.

What ffmpeg does is that is tests $CC $CFLAGS on a source file with neon 
instructions to see wether the (cross)compiler wants neon or not. If you 
have '-mcpu=arm1136jf-s' in CFLAGS, the test should fail and config.h 
will have CONFIG_NEON (or whatever it's called) disabled. Same goes for 
armv4 which would miss the 'bx' instruction (I sadly still have 
strongarm platforms to care for, but gcc 4.4.x should support EABI on that).

> The responsibility of configure script is to check if the compiler can support
> these extra cpu extensions. So if you have a very old toolchain (not
> supporting NEON for example), a small test snippet of code will fail to
> compile in configure script and the support for these extensions will not be
> compiled in.

That sounds sane enough :)

> But if your compiler is recent enough, support for NEON can be
> compiled as part of pixman even if your main target is some older cpu core.
> NEON optimizations will be only used if NEON support is detected at runtime.

As I said above the test should fail *if* you have '-mcpu=arm1136jf-s' 
in CFLAGS. If you're using a toolchain with NEON capabilities and an 
armv6+ in the default spec, you really, *really* need to be passing the 
correct -mcpu or -march (-mtune is not enough).

> NEON optimizations currently imply that ARMv6 optimizations are supported too
> and contain references top ARMv6 code. So the configuration when either you
> artificially disable ARMv6 optimizations but enable NEON, or this
> configuration is selected automatically (having toolchain tuned for
> armv4/armv5te/whatever else in gcc specs).
>
> Things to try may be:
> 1. try adding something like "-mcpu=arm1136jf-s" to gcc flags (it seems to
> be able to override -march in current versions of gcc).

I'm unsure how that would interact with non inline-asm code in those 
files. If there isn't any c code it should be safe. Still ugly though.

> 2. add some hack to 'pixman-arm-simd.c' like the line "asm(".arch armv6")" to
> the very begginning of it, and use the same hack in configure code snippet
> naturally.

I have no idea what asm(".arch armv6") would do, I'm a buildsystem 
person, not a coder, so no comment on this option :)

> 3. Implement ARMv6 optimizations fully in assembly and just use gas without
> having to deal with gcc inline assembly woes

I think that's what ffmpeg is doing, but please check that yourself, 
you're an ffmpeg hacker :)

> Actually 3. may be not so bad idea. The problem with ARMv6 is that it uses
> standard ARM registers for data and can easily run out of them, the number of
> available registers for inline assembly is unpredictable. Some of my ARMv6
> optimizations use inline assembly, but are in fact complete assembly
> implementations (using 'naked' attribute) to have full control over registers
> allocation. This is kind of error prone, because I have seen a version of gcc
> which can miscompile 'naked' functions when building code
> with -fno-omit-frame-pointers option (it just screws up 'naked' functions
> by inserting function prologue/epilogue instruction sequences).

I think it would make armcc (RVCT) and tms470 (the TI compiler for arm) 
a lot happier.

>  ARMv6 is the only problematic one, because all the other media extensions have
> their own separate registers and inline assembly is just fine for them.

I suspect that's because ARM ABIs are such a mess compared to x86 where 
there's only one ABI to rule them all.

> (Constructive) feedback is very much welcome.
>
> That said, I'm more interested in NEON optimizations at the moment.

Me too :)

> Additionally, I myself always compile pixman with cortex-a8 optimizations for
> the whole library.

Same here. I do wonder how this all will work out for cortex-a9, where 
ARM decided to make NEON a bit slower to make room for a full VFPv3. 
Hopefully someone at ARM with access to real A9 silicon could speak up?

regards,

Koen