The performance improvement on LBM previously achieved with improved software
prefetching (36d4421) have gone lost recently with e00f189. There now is one
memory access in the loop that LoopDataPrefetch cannot handle (while before
there was none) which the heuristic rejects.
This patch adds a small margin by allowing 1 non-prefetched memory access for
every 32 prefetched ones, so that the heuristic doesn't bail in this type of
case.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D92985
// Emit prefetch instructions for smaller strides in cases where we think
// the hardware prefetcher might not be able to keep up.
- if (NumStridedMemAccesses > 32 &&
- NumStridedMemAccesses == NumMemAccesses && !HasCall)
+ if (NumStridedMemAccesses > 32 && !HasCall &&
+ (NumMemAccesses - NumStridedMemAccesses) * 32 <= NumStridedMemAccesses)
return 1;
return ST->hasMiscellaneousExtensions3() ? 8192 : 2048;