[x86] split 256-bit store of concatenated vectors
authorSanjay Patel <spatel@rotateright.com>
Tue, 4 Jun 2019 16:40:04 +0000 (16:40 +0000)
committerSanjay Patel <spatel@rotateright.com>
Tue, 4 Jun 2019 16:40:04 +0000 (16:40 +0000)
commit606eb2367f9f0bef2d1e0182bbb2bf4effb1711e
tree07ad29ff737cfeb198014fa795f057a9150954e6
parentf15e3d856fddd3ecf80fdbb798be64d0c4bc6de4
[x86] split 256-bit store of concatenated vectors

This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 362524
25 files changed:
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/avg.ll
llvm/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll
llvm/test/CodeGen/X86/avx-intrinsics-x86.ll
llvm/test/CodeGen/X86/avx512-trunc-widen.ll
llvm/test/CodeGen/X86/avx512-trunc.ll
llvm/test/CodeGen/X86/nontemporal-2.ll
llvm/test/CodeGen/X86/oddsubvector.ll
llvm/test/CodeGen/X86/pmovsx-inreg.ll
llvm/test/CodeGen/X86/shrink_vmul-widen.ll
llvm/test/CodeGen/X86/shrink_vmul.ll
llvm/test/CodeGen/X86/shuffle-vs-trunc-512-widen.ll
llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll
llvm/test/CodeGen/X86/subvector-broadcast.ll
llvm/test/CodeGen/X86/vec_fptrunc.ll
llvm/test/CodeGen/X86/vec_saddo.ll
llvm/test/CodeGen/X86/vec_smulo.ll
llvm/test/CodeGen/X86/vec_ssubo.ll
llvm/test/CodeGen/X86/vec_uaddo.ll
llvm/test/CodeGen/X86/vec_umulo.ll
llvm/test/CodeGen/X86/vec_usubo.ll
llvm/test/CodeGen/X86/vector-gep.ll
llvm/test/CodeGen/X86/vector-trunc-widen.ll
llvm/test/CodeGen/X86/vector-trunc.ll
llvm/test/CodeGen/X86/x86-interleaved-access.ll