Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR target/94680]
If the second operand of __builtin_shuffle is const vector 0, and with
specific mask, it can be optimized to movq/vmovps.
.i.e.
foo128:
- vxorps %xmm1, %xmm1, %xmm1
- vmovlhps %xmm1, %xmm0, %xmm0
+ vmovq %xmm0, %xmm0
foo256:
- vxorps %xmm1, %xmm1, %xmm1
- vshuff32x4 $0, %ymm1, %ymm0, %ymm0
+ vmovaps %xmm0, %xmm0
foo512:
- vxorps %xmm1, %xmm1, %xmm1
- vshuff32x4 $68, %zmm1, %zmm0, %zmm0
+ vmovaps %ymm0, %ymm0
gcc/ChangeLog:
PR target/94680
* config/i386/sse.md (ssedoublevecmode): Add attribute for
V64QI/V32HI/V16SI/V4DI.
(ssehalfvecmode): Add attribute for V2DI/V2DF.
(*vec_concatv4si_0): Extend to VI124_128.
(*vec_concat<mode>_0): New pre-reload splitter.
* config/i386/predicates.md (movq_parallel): New predicate.
gcc/testsuite/ChangeLog:
PR target/94680
* gcc.target/i386/avx-pr94680.c: New test.
* gcc.target/i386/avx512f-pr94680.c: New test.
* gcc.target/i386/sse2-pr94680.c: New test.