[x86] fold vperm2x128 to concat of 128-bit high half vectors
authorSanjay Patel <spatel@rotateright.com>
Wed, 22 Jan 2020 19:48:28 +0000 (14:48 -0500)
committerSanjay Patel <spatel@rotateright.com>
Wed, 22 Jan 2020 20:35:50 +0000 (15:35 -0500)
commit363d27c871f44c45bb70a8adfb0ad93a0bf2e04d
tree89261880880c5b7a02421d95dc767905f541d354
parent38c68047b04184fefadcd38e759d9526039cce86
[x86] fold vperm2x128 to concat of 128-bit high half vectors

vperm (ins ?, X, C), (ins ?, Y, C), 0x31 --> concat X, Y

This is another shuffle problem seen with PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

We have this small crack in legalization/lowering/combining/demanded
that allows forming a vperm2f128 of high halves with AVX1 when we
could do better by peeking through the insert_subvector nodes.
AFAICT, it requires IR as shown in the diffs - much larger than legal
vectors - to avoid all of the usual folds.

Another option would prevent forming the 256-bit vperm in lowering.

Differential Revision: https://reviews.llvm.org/D73197
llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/x86-interleaved-access.ll