profile/ivi/pixman.git
12 years agoUse AC_LANG_SOURCE for DSPr2 configure program
Matt Turner [Wed, 14 Mar 2012 20:48:00 +0000 (16:48 -0400)]
Use AC_LANG_SOURCE for DSPr2 configure program

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoJust include xmmintrin.h on MSVC as well
Chun-wei Fan [Fri, 9 Mar 2012 07:54:06 +0000 (15:54 +0800)]
Just include xmmintrin.h on MSVC as well

The xmmintrin.h as shipped with recent Visual C++ (2003+) provides
_mm_shuffle_pi16 and _mm_mulhi_pu16, so including that header
will do for using these functions, and MSVC does not like the GCC-specific
implementations of _mm_shuffle_pi16 and _mm_mulhi_pu16 that is
currently in the code.

_MM_SHUFFLE is declared in the same way in MSVC's xmmintrin.h, so don't
re-define it here to avoid a compilation warning.

12 years agoFix a false-negative in MMX check
Jeremy Huddleston [Wed, 14 Mar 2012 17:26:18 +0000 (10:26 -0700)]
Fix a false-negative in MMX check

Silence warnings that could make -Werror give a false negative
Use signed char to avoid cases where int8_t isn't declared

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
12 years agoMIPS: DSPr2: Added over_n_8888_8888_ca and over_n_8888_0565_ca fast paths.
Nemanja Lukic [Sun, 11 Mar 2012 17:52:25 +0000 (18:52 +0100)]
MIPS: DSPr2: Added over_n_8888_8888_ca and over_n_8888_0565_ca fast paths.

Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
     over_n_8888_8888_ca =  L1:   8.32  L2:   7.65  M:  6.38 ( 51.08%)  HT:  5.78  VT:  5.74  R:  5.84  RT:  4.39 (  37Kops/s)
     over_n_8888_0565_ca =  L1:   7.40  L2:   6.95  M:  6.16 ( 41.06%)  HT:  5.72  VT:  5.52  R:  5.63  RT:  4.28 (  36Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.3
[  0]    image            xfce4-terminal-a1  138.223  139.070   0.33%    6/6
[ # ]  image16: pixman 0.25.3
[  0]  image16            xfce4-terminal-a1  132.763  132.939   0.06%    5/6

Optimized:

lowlevel-blt-bench:
     over_n_8888_8888_ca =  L1:  19.35  L2:  23.84  M: 13.68 (109.39%)  HT: 11.39  VT: 11.19  R: 11.27  RT:  6.90 (  47Kops/s)
     over_n_8888_0565_ca =  L1:  18.68  L2:  17.00  M: 12.56 ( 83.70%)  HT: 10.72  VT: 10.45  R: 10.43  RT:  5.79 (  43Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.3
[  0]    image            xfce4-terminal-a1  130.400  131.720   0.46%    6/6
[ # ]  image16: pixman 0.25.3
[  0]  image16            xfce4-terminal-a1  125.830  126.604   0.34%    6/6

12 years agoExpand TLS support beyond __thread to __declspec(thread)
Jeremy Huddleston [Thu, 8 Mar 2012 17:41:34 +0000 (09:41 -0800)]
Expand TLS support beyond __thread to __declspec(thread)

This code was pretty much coppied from a similar commit that I made to
xorg-server in April.

cf: xorg/xserver: bb4d145bd25e2aee988b100ecf1105ea3b6a40b8

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
12 years agoDisable MMX when incompatible clang is being used.
Jeremy Huddleston [Thu, 8 Mar 2012 17:41:32 +0000 (09:41 -0800)]
Disable MMX when incompatible clang is being used.

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
12 years agoSilence a warning about unused pixman_have_mmx
Jeremy Huddleston [Thu, 8 Mar 2012 17:41:33 +0000 (09:41 -0800)]
Silence a warning about unused pixman_have_mmx

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
12 years agoRevert "Disable MMX when Clang is being used."
Jeremy Huddleston [Thu, 8 Mar 2012 17:41:31 +0000 (09:41 -0800)]
Revert "Disable MMX when Clang is being used."

This reverts commit 5eb4c12a79b3017ec6cc22ab756f53f225731533.

12 years agoPost-release version bump to 0.25.3
Søren Sandmann Pedersen [Thu, 8 Mar 2012 15:11:20 +0000 (10:11 -0500)]
Post-release version bump to 0.25.3

12 years agoPre-release version bump to 0.25.2
Søren Sandmann Pedersen [Thu, 8 Mar 2012 14:33:16 +0000 (09:33 -0500)]
Pre-release version bump to 0.25.2

12 years agommx: Squash a warning by making the argument to ldl_u() const
Søren Sandmann Pedersen [Thu, 8 Mar 2012 14:29:46 +0000 (09:29 -0500)]
mmx: Squash a warning by making the argument to ldl_u() const

12 years agoJust use xmmintrin.h when building with Solaris Studio compilers
Alan Coopersmith [Sat, 25 Feb 2012 02:02:56 +0000 (18:02 -0800)]
Just use xmmintrin.h when building with Solaris Studio compilers

Since the Solaris Studio compilers don't have a mode where MMX
instructions are available and SSE instructions are not, we can
just use the <xmmintrin.h> header directly.

Fixes build failure due to Studio not supporting the __gnu_inline__
or __artificial__ attributes.

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Matt Turner <mattst88@gmail.com>
12 years agoMIPS: DSPr2: Added mips_dspr2_blt and mips_dspr2_fill routines.
Nemanja Lukic [Wed, 29 Feb 2012 11:04:33 +0000 (12:04 +0100)]
MIPS: DSPr2: Added mips_dspr2_blt and mips_dspr2_fill routines.

Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
              src_n_0565 =  L1: 238.14  L2: 233.15  M: 57.88 ( 77.23%)  HT: 53.22  VT: 49.99  R: 47.73  RT: 24.79 (  91Kops/s)
              src_n_8888 =  L1: 190.19  L2: 187.57  M: 28.94 ( 77.23%)  HT: 27.91  VT: 27.33  R: 26.64  RT: 14.68 (  77Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image         gnome-system-monitor  268.460  269.712   0.22%    6/6

Optimized:

lowlevel-blt-bench:
              src_n_0565 =  L1:1081.39  L2: 258.22  M:189.59 (252.91%)  HT: 60.23  VT: 55.01  R: 53.44  RT: 23.68 (  89Kops/s)
              src_n_8888 =  L1: 653.46  L2: 113.55  M:135.26 (360.86%)  HT: 38.99  VT: 37.38  R: 34.95  RT: 18.67 (  84Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image         gnome-system-monitor  246.565  246.706   0.04%    6/6

12 years agopixman-access.c: Remove some unused macros
Søren Sandmann Pedersen [Thu, 1 Mar 2012 07:24:54 +0000 (02:24 -0500)]
pixman-access.c: Remove some unused macros

The macros related to palette entries:

RGB15_TO_ENTRY,
RGB24_TO_ENTRY,
RGB24_TO_ENTRY_Y

are not used anywhere.

12 years agopixman-accessors.h: Delete unused macros
Søren Sandmann Pedersen [Wed, 29 Feb 2012 09:44:46 +0000 (04:44 -0500)]
pixman-accessors.h: Delete unused macros

The MEMCPY_WRAPPED and ACCESS macros are not used anymore.

12 years agoMove fetching for solid bits images to pixman-noop.c
Søren Sandmann Pedersen [Sun, 26 Feb 2012 22:35:20 +0000 (17:35 -0500)]
Move fetching for solid bits images to pixman-noop.c

This should be a bit faster because it can reuse the scanline on each iteration.

12 years agolowlevel-blt-bench: add in_8_8 and in_n_8_8
Matt Turner [Sat, 25 Feb 2012 01:11:11 +0000 (20:11 -0500)]
lowlevel-blt-bench: add in_8_8 and in_n_8_8

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoDisable implementations mentioned in the PIXMAN_DISABLE environment variable.
Søren Sandmann Pedersen [Wed, 26 Jan 2011 18:16:09 +0000 (13:16 -0500)]
Disable implementations mentioned in the PIXMAN_DISABLE environment variable.

With this, it becomes possible to do

     PIXMAN_DISABLE="sse2 mmx" some_app

which will run some_app without SSE2 and MMX enabled. This is useful
for benchmarking, testing and narrowing down bugs.

The current list of implementations that can be disabled:

    fast
    mmx
    sse2
    arm-simd
    arm-iwmmxt
    arm-neon
    mips-dspr2
    vmx

The general and noop implementations can't be disabled because pixman
depends on those being available for correct operation.

Reviewed-by: Matt Turner <mattst88@gmail.com>
12 years agoMIPS: DSPr2: Added fast-paths for SRC operation.
Nemanja Lukic [Wed, 22 Feb 2012 13:23:48 +0000 (14:23 +0100)]
MIPS: DSPr2: Added fast-paths for SRC operation.

Following fast-path functions are implemented (routines 4, 5 and 6 utilize
same fast-memcpy routine):
    1. src_x888_8888
    2. src_8888_0565
    3. src_0565_8888
    4. src_0565_0565
    5. src_8888_8888
    6. src_0888_0888

Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
        src_x888_8888 =  L1: 199.35  L2:  96.54  M: 18.87 (100.68%)  HT: 17.12  VT: 16.24  R: 15.43  RT:  9.33 (  61Kops/s)
        src_8888_0565 =  L1:  71.22  L2:  51.95  M: 24.19 ( 96.17%)  HT: 20.71  VT: 19.92  R: 18.15  RT:  9.92 (  63Kops/s)
        src_0565_8888 =  L1:  38.82  L2:  36.22  M: 18.60 ( 73.95%)  HT: 14.47  VT: 13.19  R: 12.97  RT:  6.61 (  49Kops/s)
        src_0565_0565 =  L1: 286.05  L2: 155.02  M: 37.68 (100.54%)  HT: 31.08  VT: 28.07  R: 26.26  RT: 11.93 (  68Kops/s)
        src_8888_8888 =  L1: 454.32  L2: 139.15  M: 19.30 (102.98%)  HT: 17.73  VT: 16.08  R: 16.62  RT: 10.45 (  64Kops/s)
        src_0888_0888 =  L1: 190.47  L2: 106.14  M: 25.26 (101.08%)  HT: 21.88  VT: 20.32  R: 18.83  RT: 10.10 (  63Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image            firefox-asteroids  421.215  421.325   0.01%    4/6
[  1]    image         firefox-planet-gnome  647.708  648.486   0.13%    6/6
[  2]    image         gnome-system-monitor  276.073  277.506   0.38%    6/6
[  3]    image           gnome-terminal-vim  263.866  265.229   0.39%    6/6
[  4]    image                      poppler  123.576  124.003   0.15%    6/6

Optimized (with these optimizations):

lowlevel-blt-bench:
        src_x888_8888 =  L1: 369.50  L2:  99.37  M: 27.19 (145.07%)  HT: 20.24  VT: 19.48  R: 19.00  RT: 10.22 (  63Kops/s)
        src_8888_0565 =  L1: 105.65  L2:  67.87  M: 25.41 (101.00%)  HT: 20.78  VT: 19.84  R: 18.52  RT:  9.81 (  63Kops/s)
        src_0565_8888 =  L1:  77.10  L2:  63.04  M: 23.37 ( 92.90%)  HT: 20.29  VT: 19.37  R: 18.14  RT: 10.02 (  63Kops/s)
        src_0565_0565 =  L1: 519.02  L2: 241.32  M: 62.35 (166.34%)  HT: 33.74  VT: 27.63  R: 26.12  RT: 11.70 (  67Kops/s)
        src_8888_8888 =  L1: 390.48  L2: 113.99  M: 30.32 (161.77%)  HT: 19.55  VT: 17.05  R: 17.13  RT: 10.19 (  63Kops/s)
        src_0888_0888 =  L1: 349.74  L2: 156.68  M: 40.68 (162.78%)  HT: 25.58  VT: 20.57  R: 20.20  RT:  9.96 (  63Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image            firefox-asteroids  400.050  400.308   0.04%    6/6
[  1]    image         firefox-planet-gnome  628.978  629.364   0.07%    6/6
[  2]    image         gnome-system-monitor  270.247  270.313   0.03%    6/6
[  3]    image           gnome-terminal-vim  256.413  257.641   0.21%    6/6
[  4]    image                      poppler  119.540  120.023   0.21%    6/6

12 years agoMIPS: DSPr2: Basic infrastructure for MIPS architecture
Nemanja Lukic [Wed, 22 Feb 2012 13:23:47 +0000 (14:23 +0100)]
MIPS: DSPr2: Basic infrastructure for MIPS architecture

MIPS DSP instruction set extensions

12 years agolowlevel-blt: add over_x888_n_8888
Matt Turner [Sat, 25 Feb 2012 01:02:55 +0000 (20:02 -0500)]
lowlevel-blt: add over_x888_n_8888

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agolowlevel-blt: add over_8888_8888
Matt Turner [Sat, 25 Feb 2012 00:58:09 +0000 (19:58 -0500)]
lowlevel-blt: add over_8888_8888

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoDisable MMX when Clang is being used.
Søren Sandmann Pedersen [Thu, 23 Feb 2012 23:36:04 +0000 (18:36 -0500)]
Disable MMX when Clang is being used.

There are several issues with the Clang compiler and pixman-mmx.c:

- When not optimizing, it doesn't seem to recognize that an argument
  to an __always_inline__ function is compile-time constant. This
  results in this error being produced:

      fatal error: error in backend: Invalid operand for inline asm
              constraint 'K'!

- This inline assembly:

      asm ("pmulhuw %1, %0\n\t"
          : "+y" (__A)
          : "y" (__B)
      );

  results in

      fatal error: error in backend: Unsupported asm: input constraint
              with a matching output constraint of incompatible type!

So disable MMX when the compiler is Clang.

12 years agommx: make load8888 take a pointer to data instead of the data itself
Matt Turner [Wed, 22 Feb 2012 04:33:02 +0000 (23:33 -0500)]
mmx: make load8888 take a pointer to data instead of the data itself

Allows us to tune how we load data into the vector registers.

Signed-off-by: Matt Turner <mattst88@gmail.com>
And squashed in:

mmx: define and use load8888u function

For unaligned loads.

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agommx: make store8888 take uint32_t *dest as argument
Matt Turner [Wed, 22 Feb 2012 00:29:59 +0000 (19:29 -0500)]
mmx: make store8888 take uint32_t *dest as argument

Allows us to tune how we store data from the vector registers.

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoUpdate .gitignore with more demos and tests
Matt Turner [Wed, 22 Feb 2012 21:32:21 +0000 (16:32 -0500)]
Update .gitignore with more demos and tests

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agommx: Delete unused function in_over_full_src_alpha()
Søren Sandmann Pedersen [Wed, 22 Feb 2012 00:30:04 +0000 (19:30 -0500)]
mmx: Delete unused function in_over_full_src_alpha()

Also a few minor formatting fixes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
12 years agommx: Enable over_x888_8_8888() for x86 as well
Søren Sandmann Pedersen [Wed, 22 Feb 2012 00:23:33 +0000 (19:23 -0500)]
mmx: Enable over_x888_8_8888() for x86 as well

It used to be slower than the generic code (with the gcc that was
current in 2007), but that doesn't seem to be the case anymore:

over_x888_8_8888 =  L1:  22.97  L2:  22.88  M: 22.27 (  5.29%)  HT: 18.30  VT: 15.81  R: 15.54  RT: 10.35 ( 131Kops/s)
over_x888_8_8888 =  L1:  53.56  L2:  53.20  M: 50.50 ( 11.99%)  HT: 38.60  VT: 31.19  R: 29.00  RT: 17.37 ( 208Kops/s)

Reviewed-by: Matt Turner <mattst88@gmail.com>
12 years agommx: fix typo in pix_add_mul on MSVC
Matt Turner [Tue, 21 Feb 2012 21:28:37 +0000 (16:28 -0500)]
mmx: fix typo in pix_add_mul on MSVC

Typo introduced in commit a075a870.

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agommx: Use _mm_shuffle_pi16
Matt Turner [Sun, 19 Feb 2012 23:10:03 +0000 (18:10 -0500)]
mmx: Use _mm_shuffle_pi16

The pshufw x86 instruction is part of Extended 3DNow! and SSE1. The
equivalent ARM wshufh instruction was available from the first iwMMXt
instrucion set.

This instruction is already used in the SSE2 code.

Reduces code size by ~9%.

amd64
  text    data     bss     dec     hex filename
 29925    2240       0   32165    7da5 .libs/libpixman_mmx_la-pixman-mmx.o
 27237    2240       0   29477    7325 .libs/libpixman_mmx_la-pixman-mmx.o

x86
  text    data     bss     dec     hex filename
 27677    1792       0   29469    731d .libs/libpixman_mmx_la-pixman-mmx.o
 24959    1792       0   26751    687f .libs/libpixman_mmx_la-pixman-mmx.o

arm
  text    data     bss     dec     hex filename
 30176    1792       0   31968    7ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o
 27384    1792       0   29176    71f8 .libs/libpixman_iwmmxt_la-pixman-mmx.o

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agommx: Use _mm_mulhi_pu16
Matt Turner [Sun, 19 Feb 2012 06:32:31 +0000 (01:32 -0500)]
mmx: Use _mm_mulhi_pu16

The pmulhuw x86 instruction is part of Extended 3DNow! and SSE1. The
equivalent ARM wmuluh instruction was available from the first iwMMXt
instrucion set.

This instruction is already used in the SSE2 code.

Reduces code size by ~5%.

amd64
  text    data     bss     dec     hex filename
 31325    2240       0   33565    831d .libs/libpixman_mmx_la-pixman-mmx.o
 29925    2240       0   32165    7da5 .libs/libpixman_mmx_la-pixman-mmx.o

x86
  text    data     bss     dec     hex filename
 29165    1792       0   30957    78ed .libs/libpixman_mmx_la-pixman-mmx.o
 27677    1792       0   29469    731d .libs/libpixman_mmx_la-pixman-mmx.o

arm
  text    data     bss     dec     hex filename
 31632    1792       0   33424    8290 .libs/libpixman_iwmmxt_la-pixman-mmx.o
 30176    1792       0   31968    7ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agommx: enable over_x888_8_8888 on ARM/iwMMXt
Matt Turner [Tue, 21 Feb 2012 00:05:45 +0000 (00:05 +0000)]
mmx: enable over_x888_8_8888 on ARM/iwMMXt

before: over_x888_8_8888 =  L1:   7.63  L2:   7.72  M:  6.44 ( 19.17%)  HT: 6.24  VT:  6.11  R:  5.87  RT:  4.61 (  51Kops/s)
after : over_x888_8_8888 =  L1:  11.88  L2:  11.11  M:  8.70 ( 26.01%)  HT: 8.15  VT:  8.07  R:  7.76  RT:  5.62 (  61Kops/s)

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoautoconf: use #error instead of error
Matt Turner [Mon, 20 Feb 2012 23:36:24 +0000 (18:36 -0500)]
autoconf: use #error instead of error

We'd rather see the actual #error message rather than a syntax error in
config.log.

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoConvert while (w) to if (w) when possible
Matt Turner [Fri, 17 Feb 2012 23:17:49 +0000 (18:17 -0500)]
Convert while (w) to if (w) when possible

Missed in commit 57fd8c37.

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoMake sure to run AC_SUBST IWMMXT_CFLAGS
Matt Turner [Wed, 15 Feb 2012 23:16:42 +0000 (18:16 -0500)]
Make sure to run AC_SUBST IWMMXT_CFLAGS

Allows you to compile without -flax-vector-conversions in your CFLAGS,
though -march=iwmmxt2 is still necessary since specifying some other
-march= value will override it, and disable iwmmxt.

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoconfigure.ac: Add an --enable-libpng option
Jeremy Huddleston [Sat, 11 Feb 2012 09:04:13 +0000 (01:04 -0800)]
configure.ac: Add an --enable-libpng option

Now there is a way to not link against libpng even if it's available.

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
12 years agoUse AC_LANG_SOURCE for iwMMXt configure program
Matt Turner [Sun, 12 Feb 2012 04:21:45 +0000 (23:21 -0500)]
Use AC_LANG_SOURCE for iwMMXt configure program

Signed-off-by: Matt Turner <mattst88@gmail.com>
12 years agoRevert "Reject trapezoids where top (botttom) is above (below) the edges"
Søren Sandmann Pedersen [Wed, 25 Jan 2012 19:03:05 +0000 (14:03 -0500)]
Revert "Reject trapezoids where top (botttom) is above (below) the edges"

Cairo 1.10 will sometimes generate trapezoids like this, so we can't
consider them invalid. Fixes bug 45009, reported by Michael Biebl.

This reverts commit 2437ae80e5066dec9fe52f56b016bf136d7cea06.

12 years agoiOS Runtime Detection Support For ARM NEON
Bobby Salazar [Thu, 26 Jan 2012 18:19:18 +0000 (13:19 -0500)]
iOS Runtime Detection Support For ARM NEON

This patch adds runtime detection support for the ARM NEON fast paths
for code compiled with the iOS SDK.

12 years agotest: Port composite test over to use new pixel_checker_t object.
Søren Sandmann Pedersen [Tue, 20 Dec 2011 00:31:25 +0000 (19:31 -0500)]
test: Port composite test over to use new pixel_checker_t object.

Also make some tweaks to the way the errors are printed.

12 years agotest: Add a new "pixel_checker_t" object.
Søren Sandmann Pedersen [Mon, 19 Dec 2011 22:31:06 +0000 (17:31 -0500)]
test: Add a new "pixel_checker_t" object.

Add a new pixel_checker_t object to test/utils.[ch]. This object
should be initialized with a format and can then be used to check
whether a given "real" pixel in that format is close enough to a
"perfect" pixel given as a double precision ARGB struct.

The acceptable deviation is calcuated as follows. Each channel of the
perfect pixel has 0.004 subtracted from it and is then converted to
the format. The resulting value is the minimum value that will be
accepted. Similarly, to compute the maximum value, the channel has
0.004 added to it and is then converted to the given format. Checking
a pixel is then a matter of splitting it into channels and checking
that each is within the computed bounds.

The value of 0.004 was chosen because it is the minimum one that will
make the existing composite test pass (see next commit). A problem
with this value is that it causes 0xFE to be acceptable when the
correct value is 1.0, and 0x01 to be acceptable when the correct value
is 0. It would be better if, when the result is exactly 0 or exactly
1, an a8r8g8b8 pixel were required to produce exactly 0x00 or 0xff to
preserve full black and full white. A deviation value of 0.003 would
produce this, but currently this would cause tests with operators that
involve divisions to fail.

12 years agoRename color_correct() to round_color()
Søren Sandmann Pedersen [Tue, 20 Dec 2011 00:53:28 +0000 (19:53 -0500)]
Rename color_correct() to round_color()

And do the rounding from float to int in the same way cairo does: by
multiplying with (1 << width), then subtracting one when the input was 1.0.

12 years agoMove the color_correct() function from composite.c to utils.c
Søren Sandmann Pedersen [Thu, 22 Dec 2011 23:15:02 +0000 (18:15 -0500)]
Move the color_correct() function from composite.c to utils.c

12 years agoGet rid of delegates for combiners
Søren Sandmann Pedersen [Sun, 8 Jan 2012 15:32:47 +0000 (10:32 -0500)]
Get rid of delegates for combiners

Add a new function _pixman_implementation_lookup_combiner() that will
find a usable combiner given an operator and information about whether
the combiner should apply component alpha and whether it should be 64
bit.

In pixman-general.c use this function to look up a combiner up front
instead of walking the delegate chain for every scanline.

12 years agotest/alphamap.c: Make dst and orig_dst more independent of each other
Søren Sandmann Pedersen [Sat, 7 Jan 2012 22:11:45 +0000 (17:11 -0500)]
test/alphamap.c: Make dst and orig_dst more independent of each other

When making the copy of the destination, do so separately for the
image and the alpha map. This ensures that the alpha channel of the
alpha map will be different from the alpha channel of the actual
image.

Previously, orig_dst would be copied onto dst along with its alpha
map, which mean that the alpha map of orig_dst would become the new
alpha channel of *both* dst and dst's alpha map. This meant that test
didn't actually test that the alpha maps alpha channel was actually
fetched.

12 years agoFix bugs with alpha maps
Søren Sandmann Pedersen [Sat, 7 Jan 2012 21:48:00 +0000 (16:48 -0500)]
Fix bugs with alpha maps

The alpha channel from the alpha map must be inserted as the new alpha
channel when a scanline is fetched from an image. Previously the alpha
map would overwrite the buffer instead. This wasn't caught be the
alpha map test because it would only verify that the resulting alpha
channel was correct, and not pay attention to incorrect color
channels.

12 years agotest: In the alphamap test, also test that we get the right red value
Søren Sandmann Pedersen [Sat, 7 Jan 2012 19:32:08 +0000 (14:32 -0500)]
test: In the alphamap test, also test that we get the right red value

There is a bug where the red channel of the alpha map of the
destination image is used instead of the red channel of the
destination image itself.

12 years agoMake mmx code compatible with Solaris Studio 12.3 compilers
Alan Coopersmith [Sat, 24 Dec 2011 00:32:57 +0000 (16:32 -0800)]
Make mmx code compatible with Solaris Studio 12.3 compilers

Rearranged some of the existing gcc & Intel compiler checks to allow
easier sharing of common cases among the compilers.

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
12 years agoFix rounding for DIV_UNc()
Søren Sandmann Pedersen [Tue, 20 Dec 2011 11:32:26 +0000 (06:32 -0500)]
Fix rounding for DIV_UNc()

We need to compute floor (a/b * 255 + 0.5), not floor (a / b * 255),
so add b/2 to the numerator in the DIV_UNc() macro.

12 years agoReject trapezoids where top (botttom) is above (below) the edges
Søren Sandmann Pedersen [Thu, 22 Dec 2011 16:37:26 +0000 (11:37 -0500)]
Reject trapezoids where top (botttom) is above (below) the edges

When a trapezoid has a top/bottom that is above/below the left/right
edges, degenerate trapezoids become possible. For example the edge
could be very short and close to horizontal. If the bottom edge is far
below the bottom point of such a short edge, the result is that the
lower right corner of the trapezoid will be extremely far to the left.

This kind of trapezoid causes overflows in the rasterization code, so
change pixman_trapezoid_valid() to reject them.

12 years agoIn MUL_UNc() cast to comp2_t
Søren Sandmann Pedersen [Tue, 20 Dec 2011 11:34:41 +0000 (06:34 -0500)]
In MUL_UNc() cast to comp2_t

Otherwise, when comp1_t is 16 bits wide, we can end up with a signed
integer overflow.

12 years agoFix a bunch of signed overflow issues
Søren Sandmann Pedersen [Wed, 21 Dec 2011 13:19:05 +0000 (08:19 -0500)]
Fix a bunch of signed overflow issues

In pixman-fast-path.c: (1 << 31) - 1 causes a signed overflow, so
change to (1U << n) - 1.

In pixman-image.c: The check for whether m10 == -m01 will overflow
when -m01 == INT_MIN. Instead just check whether the variables are 1
and -1.

In pixman-utils.c: When the depth of the topmost channel is 0, we can
end up shifting by 32.

In blitters-test.c: Replicating the mask would end up shifting more
than 32.

In region-contains-test.c: Computing the average of two large integers
could overflow. Instead add half the difference between them to the
first integer.

In stress-test.c: Masking the value in fake_reader() would sometimes
shift by 32. Instead just use the most significant bits instead of
the least significant.

All these issues were found by the IOC tool:

    http://embed.cs.utah.edu/ioc/

12 years agoAdd missing cast in _pixman_edge_multi_init()
Søren Sandmann Pedersen [Sun, 18 Dec 2011 14:54:47 +0000 (09:54 -0500)]
Add missing cast in _pixman_edge_multi_init()

nx and e->dy are both 32 bit quantities, so a cast is needed to make
sure their product is 64 bit before subtracting it from a 64 bit
quantity.

12 years agoFix some signed overflow bugs
Søren Sandmann Pedersen [Sun, 18 Dec 2011 13:16:45 +0000 (08:16 -0500)]
Fix some signed overflow bugs

In the macros for the PDF blend modes, two comp1_t variables are
multiplied together and then used as if the result were a
comp4_t. When comp1_t is a uint8_t, this is fine because they are
promoted to int, and the product of two uint8_ts fits in an
int. However, when comp1_t is uint16, the product does not necessarily
fit in an int, so casts are necessary.

Fix for bug 43906, reported by Siarhei Siamashka.

12 years agopixman-image.c: Fix typo in pixman_image_set_transform()
Søren Sandmann Pedersen [Thu, 5 Jan 2012 15:37:51 +0000 (10:37 -0500)]
pixman-image.c: Fix typo in pixman_image_set_transform()

A parenthesis was misplaced so that the size argument to memcmp() was
always 0. The bug is harmless except that the flags might be
unnecessarily recomputed in some cases.

A bug reporting this in Mozilla's fork was discovered here:

    https://bugzilla.mozilla.org/show_bug.cgi?id=710992

12 years agoautogen.sh: Support GNOME Build API
Colin Walters [Wed, 4 Jan 2012 13:06:05 +0000 (08:06 -0500)]
autogen.sh: Support GNOME Build API

http://people.gnome.org/~walters/docs/build-api.txt

12 years agogradient-walker: For NONE repeats, when x < 0 or x > 1, set both colors to 0
Søren Sandmann Pedersen [Sun, 18 Dec 2011 12:29:59 +0000 (07:29 -0500)]
gradient-walker: For NONE repeats, when x < 0 or x > 1, set both colors to 0

ec7c9c2b6865b48b8bd14e4 introduced a bug where NONE gradients would be
misrendered, causing the area outside the gradient to be treated as a
(very) long fade to transparent.The problem was that a check for
positions outside the gradients were dropped in favor of relying on
the sentinels.

Aside from misrendering, this also caused a signed integer overflow
when the code would compute a stepper size based on MIN_INT32.

This patches fixes the issue by reinstating a check for these cases
and setting both the right and left colors to transparent black.

12 years agoModify gradient-test to show a bug in NONE processing
Søren Sandmann Pedersen [Wed, 21 Dec 2011 10:19:00 +0000 (05:19 -0500)]
Modify gradient-test to show a bug in NONE processing

This patch modifies demos/gradient-test to display a bug in gradients
with a repeat mode of NONE. With the current gradient code, the left
side will be a solid red (actually an extremely long fade from solid
red to transparent) instead of a sharp transition from red to green.

12 years agoregion: Add pixman_region{,32}_clear() functions.
Søren Sandmann Pedersen [Fri, 9 Dec 2011 08:59:04 +0000 (03:59 -0500)]
region: Add pixman_region{,32}_clear() functions.

These functions simply reset the region to empty. They are equivalent
to

      pixman_region_fini (&region);
      pixman_region_init (&region);

12 years agoAndroid Runtime Detection Support For ARM NEON
Bobby Salazar [Tue, 13 Dec 2011 07:03:16 +0000 (02:03 -0500)]
Android Runtime Detection Support For ARM NEON

This patch adds runtime detection support for the ARM NEON fast paths
for code compiled with the Android NDK. This is the only code change
needed to enable the ARM NEON pixman fast paths for the ever growing
Android platform (200 million+ smartphones, tablets, etc.). Just make
sure to #define USE_ARM_NEON in your makefile.

12 years agoDon't use non-POSIX test
Naohiro Aota [Thu, 24 Nov 2011 12:12:15 +0000 (13:12 +0100)]
Don't use non-POSIX test

test "$test_CFLAGS" == "" &&         \

may cause an error on some POSIX shells and uses a style which is not
consistent with the other tests in configure.ac

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=42588 and
https://bugs.gentoo.org/show_bug.cgi?id=387087

13 years agotest: Produce autotools-looking report in the win32 build system
Andrea Canciani [Tue, 8 Nov 2011 21:00:46 +0000 (22:00 +0100)]
test: Produce autotools-looking report in the win32 build system

Tweak the commands used to run the tests on win32 to make the output
look mostly like that produced by the autotools test system.

In addition to this, make sure that the exit status of the test target
is success (0) if and only if no failure occurred.

13 years agodemos: Consistently use G_N_ELEMENTS()
Andrea Canciani [Thu, 3 Nov 2011 10:07:25 +0000 (11:07 +0100)]
demos: Consistently use G_N_ELEMENTS()

Instead of open-coding G_N_ELEMENTS(), just use it.

13 years agotest: Reuse the ARRAY_LENGTH() macro
Andrea Canciani [Thu, 3 Nov 2011 09:53:10 +0000 (10:53 +0100)]
test: Reuse the ARRAY_LENGTH() macro

It is provided by utils.h, there is no need to redefine it.

13 years agoUse the ARRAY_LENGTH() macro when possible
Andrea Canciani [Thu, 3 Nov 2011 09:51:27 +0000 (10:51 +0100)]
Use the ARRAY_LENGTH() macro when possible

This patch has been generated by the following Coccinelle semantic patch:

// Use the ARRAY_LENGTH() macro when possible
//
// Replace open-coded array length computations with the
// ARRAY_LENGTH() macro

@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(T))
+ ARRAY_LENGTH (E)

13 years agotest: Cleanup includes
Andrea Canciani [Thu, 3 Nov 2011 09:40:24 +0000 (10:40 +0100)]
test: Cleanup includes

All the tests are linked to libutil, hence it makes sence to always
include utils.h and reuse what it provides (config.h inclusion, access
to private pixman APIs, ARRAY_LENGTH, ...).

13 years agoRemove useless checks for NULL before freeing
Andrea Canciani [Thu, 3 Nov 2011 09:21:41 +0000 (10:21 +0100)]
Remove useless checks for NULL before freeing

This patch has been generated by the following Coccinelle semantic patch:

// Remove useless checks for NULL before freeing
//
// free (NULL) is a no-op, so there is no need to avoid it

@@
expression E;
@@
+ free (E);
+ E = NULL;
- if (unlikely (E != NULL)) {
-   free(E);
(
-   E = NULL;
|
-   E = 0;
)
   ...
- }

@@
expression E;
@@
+ free (E);
- if (unlikely (E != NULL)) {
-   free (E);
- }

13 years agoPost-release version bump to 0.25.1
Søren Sandmann Pedersen [Sun, 6 Nov 2011 21:36:01 +0000 (16:36 -0500)]
Post-release version bump to 0.25.1

13 years agoPre-release version bump to 0.24.0
Søren Sandmann Pedersen [Sun, 6 Nov 2011 21:10:33 +0000 (16:10 -0500)]
Pre-release version bump to 0.24.0

13 years agoChange MMX ldq_u to return _m64 instead of forcing all callers to cast
Alan Coopersmith [Sun, 30 Oct 2011 16:12:06 +0000 (09:12 -0700)]
Change MMX ldq_u to return _m64 instead of forcing all callers to cast

Sun/Oracle Studio compilers allow the pointers to be cast, but not the
non-pointer forms, causing pixman compiles to fail with many errors of:
"pixman-mmx.c", line 1411: invalid cast expression

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
13 years agoAdd definitions of INT64_MIN and INT64_MAX
Jeff Muizelaar [Wed, 2 Nov 2011 22:49:58 +0000 (18:49 -0400)]
Add definitions of INT64_MIN and INT64_MAX

13 years agoPost-release version bump to 0.23.9
Søren Sandmann Pedersen [Sat, 29 Oct 2011 09:51:44 +0000 (05:51 -0400)]
Post-release version bump to 0.23.9

13 years agoPre-release version bump to 0.23.8
Søren Sandmann Pedersen [Sat, 29 Oct 2011 09:33:44 +0000 (05:33 -0400)]
Pre-release version bump to 0.23.8

13 years agoFix use of uninitialized fields reported by valgrind
Søren Sandmann Pedersen [Tue, 25 Oct 2011 12:45:34 +0000 (08:45 -0400)]
Fix use of uninitialized fields reported by valgrind

In pixman-noop.c and pixman-sse2.c, we are accessing
image->bits.width/height without first making sure the image is a bits
image. The warning is harmless because we never act on this
information without checking that the image is a8r8g8b8, but valgrind
does warn about it.

In pixman-noop.c, just reorder the clauses in the if statement; in
pixman-sse2.c require images to have the FAST_PATH_BITS_IMAGE flag
set.

13 years agoMerge branch 'gradients'
Søren Sandmann Pedersen [Thu, 20 Oct 2011 13:13:12 +0000 (09:13 -0400)]
Merge branch 'gradients'

13 years agoARM: NEON: Fix assembly typo error in src_n_8_8888
Taekyun Kim [Tue, 18 Oct 2011 12:50:18 +0000 (21:50 +0900)]
ARM: NEON: Fix assembly typo error in src_n_8_8888

Binutils 2.21 does not complain about missing comma between ARM
register and alignement specifier in vld/vst instructions which
causes build error on binutils 2.20.

13 years agoARM: NEON: Standard fast path src_n_8_8
Taekyun Kim [Mon, 26 Sep 2011 09:33:27 +0000 (18:33 +0900)]
ARM: NEON: Standard fast path src_n_8_8

Performance numbers of before/after on cortex-a8 @ 1GHz

- before
L1:  28.05  L2:  28.26  M: 26.97 (  4.48%)  HT: 19.79  VT: 19.14  R: 17.61  RT:  9.88 ( 101Kops/s)

- after
L1:1430.28  L2:1252.10  M:421.93 ( 75.48%)  HT:170.16  VT:138.03  R:145.86  RT: 35.51 ( 255Kops/s)

13 years agoARM: NEON: Standard fast path src_n_8_8888
Taekyun Kim [Mon, 26 Sep 2011 08:03:54 +0000 (17:03 +0900)]
ARM: NEON: Standard fast path src_n_8_8888

Performance numbers of before/after on cortex-a8 @ 1GHz

- before
L1:  32.39  L2:  31.79  M: 30.84 ( 13.77%)  HT: 21.58  VT: 19.75  R: 18.83  RT: 10.46 ( 106Kops/s)

- after
L1: 516.25  L2: 372.00  M:193.49 ( 85.59%)  HT:136.93  VT:109.10  R:104.48  RT: 34.77 ( 253Kops/s)

13 years agoARM: NEON: Instruction scheduling of bilinear over_8888_8_8888
Taekyun Kim [Mon, 26 Sep 2011 10:04:53 +0000 (19:04 +0900)]
ARM: NEON: Instruction scheduling of bilinear over_8888_8_8888

Instructions are reordered to eliminate pipeline stalls and get
better memory access.

Performance of before/after on cortex-a8 @ 1GHz

<< 2000 x 2000 with scale factor close to 1.x >>
before : 40.53 Mpix/s
after  : 50.76 Mpix/s

13 years agoARM: NEON: Instruction scheduling of bilinear over_8888_8888
Taekyun Kim [Wed, 21 Sep 2011 06:52:13 +0000 (15:52 +0900)]
ARM: NEON: Instruction scheduling of bilinear over_8888_8888

Instructions are reordered to eliminate pipeline stalls and get
better memory access.

Performance of before/after on cortex-a8 @ 1GHz

<< 2000 x 2000 with scale factor close to 1.x >>
before : 50.43 Mpix/s
after  : 61.09 Mpix/s

13 years agoARM: NEON: Replace old bilinear scanline generator with new template
Taekyun Kim [Thu, 22 Sep 2011 15:03:22 +0000 (00:03 +0900)]
ARM: NEON: Replace old bilinear scanline generator with new template

Bilinear scanline functions in pixman-arm-neon-asm-bilinear.S can
be replaced with new template just by wrapping existing macros.

13 years agoARM: NEON: Bilinear macro template for instruction scheduling
Taekyun Kim [Tue, 20 Sep 2011 12:32:35 +0000 (21:32 +0900)]
ARM: NEON: Bilinear macro template for instruction scheduling

This macro template takes 6 code blocks.

1. process_last_pixel
2. process_two_pixels
3. process_four_pixels
4. process_pixblock_head
5. process_pixblock_tail
6. process_pixblock_tail_head

process_last_pixel does not need to update horizontal weight. This
is done by the template. two and four code block should update
horizontal weight inside of them. head/tail/tail_head blocks
consist unrolled core loop. You can apply instruction scheduling
to the tail_head blocks.

You can also specify size of the pixel block. Supported size is 4
and 8. If you want to use mask, give BILINEAR_FLAG_USE_MASK flags
to the template, then you can use register MASK. When using d8~d15
registers, give BILINEAR_FLAG_USE_ALL_NEON_REGS to make sure
registers are properly saved on the stack and later restored.

13 years agoARM: NEON: Some cleanup of bilinear scanline functions
Taekyun Kim [Tue, 20 Sep 2011 10:46:25 +0000 (19:46 +0900)]
ARM: NEON: Some cleanup of bilinear scanline functions

Use STRIDE and initial horizontal weight update is done before
entering interpolation loop. Cache preload for mask and dst.

13 years agoSimplify gradient_walker_reset()
Søren Sandmann Pedersen [Fri, 14 Oct 2011 13:04:48 +0000 (09:04 -0400)]
Simplify gradient_walker_reset()

The code that searches for the closest color stop to the given
position is duplicated across the various repeat modes. Replace the
switch with two if/else constructions, and put the search code between
them.

13 years agoUse sentinels instead of special casing first and last stops
Søren Sandmann Pedersen [Fri, 14 Oct 2011 13:02:14 +0000 (09:02 -0400)]
Use sentinels instead of special casing first and last stops

When storing the gradient stops internally, allocate two more stops,
one before the beginning of the stop list and one after the
end. Initialize those stops based on the repeat property of the
gradient.

This allows gradient_walker_reset() to be simplified because it can
now simply pick the two closest stops to the position without special
casing the first and last stops.

13 years agogradient walker: Correct types and fix formatting
Søren Sandmann Pedersen [Fri, 14 Oct 2011 11:42:00 +0000 (07:42 -0400)]
gradient walker: Correct types and fix formatting

The type of pos in gradient_walker_reset() and gradient_walker_pixel()
is pixman_fixed_48_16_t and not pixman_fixed_32_32. The types of the
positions in the walker struct are pixman_fixed_t and not int32_t, and
need_reset is a boolean, not an integer. The spread field should be
called repeat and have the type pixman_repeat_t.

Also fix some formatting issues, make gradient_walker_reset() static,
and delete the pointless PIXMAN_GRADIENT_WALKER_NEED_RESET() macro.

13 years agoAdd stable release / development snapshot to draft release notes
Søren Sandmann Pedersen [Tue, 11 Oct 2011 20:12:24 +0000 (16:12 -0400)]
Add stable release / development snapshot to draft release notes

This will hopefully serve as a reminder to me that I should put this
information in the release notes.

13 years agoPost-release version bump to 0.23.7
Søren Sandmann Pedersen [Tue, 11 Oct 2011 10:10:39 +0000 (06:10 -0400)]
Post-release version bump to 0.23.7

13 years agoPre-release version bump to 0.23.6
Søren Sandmann Pedersen [Tue, 11 Oct 2011 10:00:51 +0000 (06:00 -0400)]
Pre-release version bump to 0.23.6

13 years agoSimple repeat: Extend too short source scanlines into temporary buffer
Taekyun Kim [Thu, 22 Sep 2011 09:42:38 +0000 (18:42 +0900)]
Simple repeat: Extend too short source scanlines into temporary buffer

Too short scanlines can cause repeat handling overhead and optimized
pixman composite functions usually process a bunch of pixels in a
single loop iteration it might be beneficial to pre-extend source
scanlines. The temporary buffers will usually reside in cache, so
accessing them should be quite efficient.

13 years agoSimple repeat fast path
Taekyun Kim [Mon, 29 Aug 2011 12:44:36 +0000 (21:44 +0900)]
Simple repeat fast path

We can implement simple repeat by stitching existing fast path
functions. First lookup COVER_CLIP function for given input and
then stitch horizontally using the function.

13 years agoMove _pixman_lookup_composite_function() to pixman-utils.c
Taekyun Kim [Thu, 22 Sep 2011 07:33:02 +0000 (16:33 +0900)]
Move _pixman_lookup_composite_function() to pixman-utils.c

13 years agoAdd src, mask, and dest flags to the composite args struct.
Søren Sandmann Pedersen [Mon, 27 Jun 2011 21:17:04 +0000 (21:17 +0000)]
Add src, mask, and dest flags to the composite args struct.

These flags are useful in the various compositing routines, and the
flags stored in the image structs are missing some bits of information
that can only be computed when pixman_image_composite() is called.

13 years agoAdd new fast path flag FAST_PATH_BITS_IMAGE
Taekyun Kim [Thu, 22 Sep 2011 07:26:55 +0000 (16:26 +0900)]
Add new fast path flag FAST_PATH_BITS_IMAGE

This fast path flag indicate that type of the image is bits image.

13 years agoinit/fini functions for pixman_image_t
Taekyun Kim [Thu, 22 Sep 2011 07:20:03 +0000 (16:20 +0900)]
init/fini functions for pixman_image_t

pixman_image_t itself can be on stack or heap. So segregating
init/fini from create/unref can be useful when we want to use
pixman_image_t on stack or other memory.

13 years agosse2: Bilinear scaled over_8888_8_8888
Taekyun Kim [Wed, 7 Sep 2011 14:00:29 +0000 (23:00 +0900)]
sse2: Bilinear scaled over_8888_8_8888

13 years agosse2: Bilinear scaled over_8888_8888
Taekyun Kim [Wed, 7 Sep 2011 13:57:29 +0000 (22:57 +0900)]
sse2: Bilinear scaled over_8888_8888

13 years agosse2: Macros for assembling bilinear interpolation code fractions
Taekyun Kim [Wed, 7 Sep 2011 13:51:46 +0000 (22:51 +0900)]
sse2: Macros for assembling bilinear interpolation code fractions

Primitive bilinear interpolation code is reusable to implement other
bilinear functions.

BILINEAR_DECLARE_VARIABLES
- Declare variables needed to interpolate src pixels.

BILINEAR_INTERPOLATE_ONE_PIXEL
- Interpolate one pixel and advance to next pixel

BILINEAR_SKIP_ONE_PIXEL
- Skip interpolation and just advance to next pixel
  This is useful for skipping zero mask

13 years agoCorrect the minimum gcc version needed for iwmmxt
Matt Turner [Thu, 6 Oct 2011 21:56:09 +0000 (17:56 -0400)]
Correct the minimum gcc version needed for iwmmxt

Spotted by Søren Sandmann.

Signed-off-by: Matt Turner <mattst88@gmail.com>
13 years agoMake sure iwMMXt is only detected on ARM
Matt Turner [Thu, 6 Oct 2011 02:54:36 +0000 (22:54 -0400)]
Make sure iwMMXt is only detected on ARM

iwMMXt is incorrectly detected on x86 and amd64. This happens because
the test uses standard _mm_* intrinsic functions which it compiles with
-march=iwmmxt, but when the user has set CFLAGS=-march=k8 for instance,
no error is generated from -march=iwmmxt, even though it's not a valid
flag on x86/amd64. Passing CFLAGS=-march=native does not override the
-march=iwmmxt flag though, which is why it wasn't noticed before.

So, just #error out in the test if the __arm__ preprocessor directive
isn't defined.

Fixes https://bugs.gentoo.org/show_bug.cgi?id=385179

Signed-off-by: Matt Turner <mattst88@gmail.com>