[AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1
authorDavid Sherwood <david.sherwood@arm.com>
Tue, 25 Apr 2023 08:46:41 +0000 (08:46 +0000)
committerDavid Sherwood <david.sherwood@arm.com>
Thu, 18 May 2023 10:35:57 +0000 (10:35 +0000)
commitc7dbe326dff81273eabe339fe69cd7bef947619c
tree600d5c957ee03fb2dd93f83e8c69a1ae4252aed9
parent01efcec6dbd1431d2ac112f537d5639a9eab18b2
[AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1

This patch enables the tail-folding of simple loops by default
when targeting the neoverse-v1 CPU. Simple loops exclude those
with recurrences or reductions or loops that are reversed.

New tests have been added here:

Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

In terms of SPEC2017 only one benchmark is really affected when
building with "-Ofast -mcpu=neoverse-v1 -flto", which is
(+ faster, - slower):

525.x264: +7.0%

Differential Revision: https://reviews.llvm.org/D130618
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-vscale-tune.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-overflow-checks.ll