[LV] Use the known trip count when costing non-tail folded VFs
authorDavid Green <david.green@arm.com>
Mon, 24 Apr 2023 21:02:30 +0000 (22:02 +0100)
committerDavid Green <david.green@arm.com>
Mon, 24 Apr 2023 21:02:30 +0000 (22:02 +0100)
commit1869a9c225c7ed411a15592d21b277716b65a374
tree0a9f941ccb2d812e80ca2fe9f854ce8e49e90072
parent2bca3f2a92a506997914f335396e124c0a5f87dd
[LV] Use the known trip count when costing non-tail folded VFs

Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.

This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.

Differential Revision: https://reviews.llvm.org/D147720
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll