[X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one...
authorCraig Topper <craig.topper@intel.com>
Mon, 19 Aug 2019 18:15:50 +0000 (18:15 +0000)
committerCraig Topper <craig.topper@intel.com>
Mon, 19 Aug 2019 18:15:50 +0000 (18:15 +0000)
commita0d92c72620c49aa36b1738a272a2715f7909a6a
tree17423b1397d57b7dee76e2b5f5ec93536b512103
parenta8abe1f82899847e29c4f1d66c32fad17dacb62f
[X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle.

The motivating case are the changes in vector-reduce-add.ll where
we were doing extra work in the scalar domain instead of shuffling.
There may be some one use check that needs to be looked into there,
but this patch sidesteps the issue by avoiding broadcasts that
aren't really broadcasting.

Differential Revision: https://reviews.llvm.org/D66071

llvm-svn: 369287
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/avg.ll
llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
llvm/test/CodeGen/X86/insertelement-shuffle.ll
llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll
llvm/test/CodeGen/X86/sse41.ll
llvm/test/CodeGen/X86/vector-reduce-add.ll
llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
llvm/test/CodeGen/X86/vector-shuffle-128-v8.ll
llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
llvm/test/CodeGen/X86/vector-shuffle-combining.ll