[x86] narrow a shuffle that doesn't use or set any high elements
authorSanjay Patel <spatel@rotateright.com>
Fri, 25 Jan 2019 15:37:42 +0000 (15:37 +0000)
committerSanjay Patel <spatel@rotateright.com>
Fri, 25 Jan 2019 15:37:42 +0000 (15:37 +0000)
commit21aa6ddc1413667516f7b5cc7a6013a9593dd404
tree5beab445e0415b28a875f746bc4f3dae2aa9f64f
parentb120127001339c4cd36b9e3d3dc9731c857fbce6
[x86] narrow a shuffle that doesn't use or set any high elements

This isn't the final fix for our reduction/horizontal codegen, but it takes care
of a lot of the problems. After we narrow the shuffle, existing combines for
insert/extract and binops kick in, and we end up with cheaper 128-bit ops.

The avg and mul reduction tests show an existing shuffle lowering hole for
AVX2/AVX512. I think in its most minimal form this is:
https://bugs.llvm.org/show_bug.cgi?id=40434
...but we might need multiple fixes to get it right.

Differential Revision: https://reviews.llvm.org/D57156

llvm-svn: 352209
20 files changed:
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/avg.ll
llvm/test/CodeGen/X86/avx512-hadd-hsub.ll
llvm/test/CodeGen/X86/madd.ll
llvm/test/CodeGen/X86/min-legal-vector-width.ll
llvm/test/CodeGen/X86/sad.ll
llvm/test/CodeGen/X86/vector-compare-all_of.ll
llvm/test/CodeGen/X86/vector-compare-any_of.ll
llvm/test/CodeGen/X86/vector-reduce-add-widen.ll
llvm/test/CodeGen/X86/vector-reduce-add.ll
llvm/test/CodeGen/X86/vector-reduce-and-widen.ll
llvm/test/CodeGen/X86/vector-reduce-and.ll
llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll
llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll
llvm/test/CodeGen/X86/vector-reduce-mul-widen.ll
llvm/test/CodeGen/X86/vector-reduce-mul.ll
llvm/test/CodeGen/X86/vector-reduce-or-widen.ll
llvm/test/CodeGen/X86/vector-reduce-or.ll
llvm/test/CodeGen/X86/vector-reduce-xor-widen.ll
llvm/test/CodeGen/X86/vector-reduce-xor.ll