[VectorCombine] new IR transform pass for partial vector ops
authorSanjay Patel <spatel@rotateright.com>
Sun, 9 Feb 2020 15:04:41 +0000 (10:04 -0500)
committerSanjay Patel <spatel@rotateright.com>
Sun, 9 Feb 2020 15:04:41 +0000 (10:04 -0500)
commita17f03bd93939cf30bfbb829321437bd0aaa4ef0
treeccd62b944e3e912f97f68bfe07859855652261af
parent74857b4260ec9db8d688c2d377a5f370efc150b4
[VectorCombine] new IR transform pass for partial vector ops

We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480
19 files changed:
llvm/include/llvm/InitializePasses.h
llvm/include/llvm/LinkAllPasses.h
llvm/include/llvm/Transforms/Vectorize.h
llvm/include/llvm/Transforms/Vectorize/VectorCombine.h [new file with mode: 0644]
llvm/lib/Passes/PassBuilder.cpp
llvm/lib/Passes/PassRegistry.def
llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
llvm/lib/Transforms/Vectorize/CMakeLists.txt
llvm/lib/Transforms/Vectorize/VectorCombine.cpp [new file with mode: 0644]
llvm/lib/Transforms/Vectorize/Vectorize.cpp
llvm/test/Other/new-pm-defaults.ll
llvm/test/Other/new-pm-thinlto-defaults.ll
llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
llvm/test/Other/opt-O2-pipeline.ll
llvm/test/Other/opt-O3-pipeline.ll
llvm/test/Other/opt-Os-pipeline.ll
llvm/test/Transforms/VectorCombine/X86/extract-cmp.ll [new file with mode: 0644]
llvm/test/Transforms/VectorCombine/X86/lit.local.cfg [new file with mode: 0644]