[llvm] [LV] Pure runtime check for minimum profitable trip count. (PR #115833)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 13 06:17:57 PST 2024
================
@@ -2502,6 +2502,12 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
#endif
// Don't execute the vector loop if (UMax - n) < (VF * UF).
CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
+ } else if (MinProfitableTripCount.isNonZero()) {
----------------
david-arm wrote:
I'm not sure this is right because it doesn't look like MinProfitableTripCount was designed for tail-folded loops. If you look at how it's calculated it seems to assume the loop has an unpredicated vector body, followed by a vector epilogue and a scalar tail. Here is an example:
```
// Second, compute a minimum iteration count so that the cost of the
// runtime checks is only a fraction of the total scalar loop cost. This
// adds a loop-dependent bound on the overhead incurred if the runtime
// checks fail. In case the runtime checks fail, the cost is RtC + ScalarC
// * TC. To bound the runtime check to be a fraction 1/X of the scalar
// cost, compute
// RtC < ScalarC * TC * (1 / X) ==> RtC * X / ScalarC < TC
uint64_t MinTC2 = divideCeil(RtC * 10, ScalarC);
```
I do understand what you're trying to do, but part of me feels this is counter-intuitive because tail-folding is supposed to be especially beneficial for loops with low trip counts. If we require the trip count to be that high then I feel we may as well not bother tail-folding. The SPEC2017 x264 benchmark has hot loops with low trip counts (8,12,16,etc.) that benefit from tail-folding and also require runtime memory checks, so that's one thing to be careful of here as it could lead to performance regressions.
Another problem here is inconsistency - it looks like you're only adding the check here, but what about the above tail-folding case:
```
} else if (VF.isScalable() &&
!isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
```
It's not obvious to me why we don't need the same check there?
https://github.com/llvm/llvm-project/pull/115833
More information about the llvm-commits
mailing list