[X86][AVX] Fold concat(ps*lq(x,32),ps*lq(y,32)) -> shuffle(concat(x,y),zero) (PR46621)
authorSimon Pilgrim <llvm-dev@redking.me.uk>
Wed, 12 May 2021 16:34:11 +0000 (17:34 +0100)
committerSimon Pilgrim <llvm-dev@redking.me.uk>
Wed, 12 May 2021 17:04:40 +0000 (18:04 +0100)
commitfb1d61b7257ccd5ba0c96bcea78d6516384ce5b6
treec8636c199269d734ee6ab0aa765768642bb2d24f
parentca5d0a7310bfb21730ac6dd735e06502e7e45099
[X86][AVX] Fold concat(ps*lq(x,32),ps*lq(y,32)) -> shuffle(concat(x,y),zero) (PR46621)

On AVX1 targets we can handle v4i64 logical shifts by 32 bits as a pair of v8f32 shuffles with zero.

I was hoping to put this in LowerScalarImmediateShift, but performing that early causes regressions where other instructions were respliting the subvectors.
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/vec_int_to_fp.ll
llvm/test/CodeGen/X86/vector-shift-lshr-256.ll
llvm/test/CodeGen/X86/vector-shift-shl-256.ll