[AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation
authorSjoerd Meijer <smeijer@nvidia.com>
Mon, 13 Mar 2023 13:05:34 +0000 (13:05 +0000)
committerSjoerd Meijer <smeijer@nvidia.com>
Mon, 13 Mar 2023 14:52:09 +0000 (14:52 +0000)
commit775451b66a4c726f1c3925f74be8cefcb308e4f8
tree73d3a929c76fc67568e842e25397389c26b64213
parent71e2d7106fe58ff2dcc6886c80ea0839a129d3c0
[AArch64] Cost-model vector splat LD1Rs to avoid unprofitable SLP vectorisation

This slightly increases the costs of InsertElement instructions that are part
of a vector splat sequence, i.e. a load, InsertElement and a shuffle (load +
dup). The resulting LD1R is a high latency instruction, and this slight
increase in costs avoids SLP vectorisation for a couple of cases where this
isn't profitable.

Fixes: https://github.com/llvm/llvm-project/issues/61047

Differential Revision: https://reviews.llvm.org/D145578
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll
llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll