[Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

Siarhei Siamashka siarhei.siamashka at gmail.com
Sat Aug 21 10:48:51 PDT 2010


On Friday 20 August 2010 18:39:47 Liu, Xinyun wrote:
> Hi Siarhei Siamashka,
> 
> Here is a new patch, can you review it? Thank you!

Sure, thanks for the updated patch. Some comments follow.

> From 9783651899a2763d7fcca2960fc354bd1f541980 Mon Sep 17 00:00:00 2001
> From: root <root at d501.localdomain>

A minor nitpick here. The 'From:' header seems to be wrong, it does not look 
like a real patch author name.

> +dnl Check for SSSE3
> +
> +if test "x$SSSE3_CFLAGS" = "x" ; then
> +   if test "x$SUNCC" = "xyes"; then
> +      # SSSE3 is enabled by default in the Sun Studio 64-bit environment

Is it a valid statement with the regards to Sun Studio? Looks like it is a 
direct copy/paste from SSE2. Can we be sure that there is nothing redundant or 
just plain wrong here? 

> +#if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 
3))
> +#   if !defined(__amd64__) && !defined(__x86_64__)

Is this 'defined(__amd64__)' check needed here? Looks like it is again a 
copy/paste from SSE2 code, but SSE2 is guaranteed to be supported for x86-64 
systems.

> +#      error "Need GCC >= 4.3 for SSSE3 intrinsics on x86"
> +#   endif
> +#endif


> diff --git a/pixman/pixman-access-ssse3_x86-64.S b/pixman/pixman-access-
ssse3_x86-64.S
> new file mode 100644
> index 0000000..fec93df
> --- /dev/null
> +++ b/pixman/pixman-access-ssse3_x86-64.S

Copyright notice seems to be still missing in the newly added source files.

> +	cmp	%dil, %sil
> +	jb	L(bk_write)
> +	add	%rdx, %rsi
> +	add	%rdx, %rdi
> +	BRANCH_TO_JMPTBL_ENTRY (L(table_48bytes_fwd), %rdx, 4)
> +L(bk_write):
> +	BRANCH_TO_JMPTBL_ENTRY (L(table_48_bytes_bwd), %rdx, 4)

Would it make sense to also purge this 'bk_write' and 'table_48_bytes_bwd' 
stuff?

> diff --git a/pixman/pixman-cpu.c b/pixman/pixman-cpu.c
> index e4fb1e4..ab2d69a 100644

[...]
 
> +#else         /* end dt_cpu32() */
> +
> +static unsigned int
> +detect_cpu_features (void)
> +{
> +    unsigned int features = 0;
> +    unsigned int result = 0;
> +    unsigned int result_c = 0;

This looks like a duplication of the code to make a new 64-bit version of 
'detect_cpu_features' function. Could we assume that all 64-bit processors 
support CPUID instruction (so that the check for CPUID support can be omitted), 
and also support MMX and SSE2? In this case SSSE3 check will be just a tiny 
block of inline assembly using CPUID instruction.

> diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
> index 3dd7967..0cfd554 100644
> --- a/pixman/pixman-sse2.c
> +++ b/pixman/pixman-sse2.c
> @@ -3520,6 +3520,56 @@ sse2_composite_over_8888_n_8888 
(pixman_implementation_t *imp,
>  /*---------------------------------------------------------------------
>   * composite_over_8888_n_8888
>   */
> +#if defined(USE_SSSE3) && !defined(_MSC_VER)
> +
> +extern void *composite_src_x888_8888_ssse3_fast_path( uint32_t * dest,  
uint32_t *src,  int32_t count);
> +extern pixman_bool_t choose_ssse3_fast_patch;
> +static void
> +ssse3_composite_src_x888_8888 (pixman_implementation_t *imp,

The addition of 'ssse3_composite_src_x888_8888' into 'pixman-sse2.c' does not 
look like a good idea. Adding a new 'pixman-ssse3.c' file for SSSE3 
optimizations would have been a better choice.

Especially considering the following part of code:

> +#if defined(USE_SSSE3) && !defined(_MSC_VER)
> +    PIXMAN_STD_FAST_PATH (SRC, x8r8g8b8, null, a8r8g8b8, 
ssse3_composite_src_x888_8888),
> +    PIXMAN_STD_FAST_PATH (SRC, x8b8g8r8, null, a8b8g8r8, 
ssse3_composite_src_x888_8888),
> +#else
>      PIXMAN_STD_FAST_PATH (SRC, x8r8g8b8, null, a8r8g8b8, 
sse2_composite_src_x888_8888),
>      PIXMAN_STD_FAST_PATH (SRC, x8b8g8r8, null, a8b8g8r8, 
sse2_composite_src_x888_8888),
> +#endif

By using "#if defined(USE_SSSE3)" here, you effectively disable 
'sse2_composite_src_x888_8888' fast path for the processors which do not have 
SSSE3 support. If the runtime detection of SSSE3 fails, the code fallbacks to C 
implementation and we have no chance to benefit from SSE2 optimizations.

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20100821/d9495579/attachment.pgp>


More information about the Pixman mailing list