[X86] lowerShuffleAsLanePermuteAndRepeatedMask - retain the per-lane undef elements...
authorSimon Pilgrim <llvm-dev@redking.me.uk>
Sun, 29 Jan 2023 11:03:41 +0000 (11:03 +0000)
committerSimon Pilgrim <llvm-dev@redking.me.uk>
Sun, 29 Jan 2023 11:03:47 +0000 (11:03 +0000)
commit37bc62ed0a24303aa572155009358b8937ab8b4c
tree209725c20c43bb93bd194c7b9cda91aad2c0af89
parent1d7961fd1a36f0955423362932e1591e7d26ba9d
[X86] lowerShuffleAsLanePermuteAndRepeatedMask - retain the per-lane undef elements and don't just copy the repeated mask

lowerShuffleAsLanePermuteAndRepeatedMask expands a shuffle from shuffle(x,y,mask) to shuffle(shuffle(x,y,lanemask1),shuffle(x,y,lanemask2),repeatedinlanemask)

However, we weren't making use of the fact that elements of the original mask might be undef - instead of fully applying the entire repeatedinlanemask to every lane, we can simplify the mask if we never demanded that element in the original mask.

Yet another improvement addressing regressions from D127115

Differential Revision: https://reviews.llvm.org/D142536
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast.ll
llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll
llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll
llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast.ll
llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast_from_memory.ll