Optimize vpermtiw/b to vpunpcklqdq for certain cases.
Assembly Optimization like:
- vmovq %xmm0, %xmm2
- vmovdqa .LC0(%rip), %xmm0
vmovq %xmm1, %xmm1
- vpermi2w %xmm1, %xmm2, %xmm0
+ vmovq %xmm0, %xmm0
+ vpunpcklqdq %xmm1, %xmm0, %xmm0
...
-.LC0:
- .value 0
- .value 1
- .value 2
- .value 3
- .value 8
- .value 9
- .value 10
- .value 11
gcc/ChangeLog:
PR target/105033
* config/i386/sse.md (*vec_concatv4si): Extend to ..
(*vec_concat<mode>): .. V16QI and V8HImode.
(*vec_concatv16qi_permt2): New pre_reload define_insn_and_split.
(*vec_concatv8hi_permt2): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr105033.c: New test.