[DAGCombiner] improve throughput of shift+logic+shift
authorSanjay Patel <spatel@rotateright.com>
Sun, 1 Sep 2019 18:38:15 +0000 (18:38 +0000)
committerSanjay Patel <spatel@rotateright.com>
Sun, 1 Sep 2019 18:38:15 +0000 (18:38 +0000)
commitc88220836768732a9cb1fb66f042f64d1107a8d6
treea64d0afb76a376ec390690c0d267298148566c76
parentc98fc5a7934831450c318db589798250b8e98b87
[DAGCombiner] improve throughput of shift+logic+shift

The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Differential Revision: https://reviews.llvm.org/D67021

llvm-svn: 370617
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/test/CodeGen/AArch64/bitfield-insert.ll
llvm/test/CodeGen/AArch64/shift-logic.ll
llvm/test/CodeGen/Thumb2/2010-03-15-AsmCCClobber.ll
llvm/test/CodeGen/X86/shift-logic.ll