Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR...
authorliuhongt <hongtao.liu@intel.com>
Thu, 22 Apr 2021 07:33:16 +0000 (15:33 +0800)
committerliuhongt <hongtao.liu@intel.com>
Thu, 13 May 2021 00:41:36 +0000 (08:41 +0800)
commit94de7e225c1fda079052c3f0725c926437d56c94
tree83382d842e2bd611018be236226148e65bb08aed
parent0ff3a0f2b9d5cbea70d134cda2e74b674f8be9c9
Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR target/94680]

If the second operand of __builtin_shuffle is const vector 0, and with
specific mask, it can be optimized to movq/vmovps.

.i.e.
foo128:
-       vxorps  %xmm1, %xmm1, %xmm1
-       vmovlhps        %xmm1, %xmm0, %xmm0
+       vmovq   %xmm0, %xmm0

 foo256:
-       vxorps  %xmm1, %xmm1, %xmm1
-       vshuff32x4      $0, %ymm1, %ymm0, %ymm0
+       vmovaps %xmm0, %xmm0

 foo512:
-       vxorps  %xmm1, %xmm1, %xmm1
-       vshuff32x4      $68, %zmm1, %zmm0, %zmm0
+       vmovaps %ymm0, %ymm0

gcc/ChangeLog:

PR target/94680
* config/i386/sse.md (ssedoublevecmode): Add attribute for
V64QI/V32HI/V16SI/V4DI.
(ssehalfvecmode): Add attribute for V2DI/V2DF.
(*vec_concatv4si_0): Extend to VI124_128.
(*vec_concat<mode>_0): New pre-reload splitter.
* config/i386/predicates.md (movq_parallel): New predicate.

gcc/testsuite/ChangeLog:

PR target/94680
* gcc.target/i386/avx-pr94680.c: New test.
* gcc.target/i386/avx512f-pr94680.c: New test.
* gcc.target/i386/sse2-pr94680.c: New test.
gcc/config/i386/predicates.md
gcc/config/i386/sse.md
gcc/testsuite/gcc.target/i386/avx-pr94680.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/avx512f-pr94680.c [new file with mode: 0644]
gcc/testsuite/gcc.target/i386/sse2-pr94680.c [new file with mode: 0644]