[llvm] [LV] Pure runtime check for minimum profitable trip count. (PR #115833)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 13 06:17:57 PST 2024


================
@@ -2502,6 +2502,12 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
 #endif
     // Don't execute the vector loop if (UMax - n) < (VF * UF).
     CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
+  } else if (MinProfitableTripCount.isNonZero()) {
----------------
david-arm wrote:

I'm not sure this is right because it doesn't look like MinProfitableTripCount was designed for tail-folded loops. If you look at how it's calculated it seems to assume the loop has an unpredicated vector body, followed by a vector epilogue and a scalar tail. Here is an example:

```
  // Second, compute a minimum iteration count so that the cost of the
  // runtime checks is only a fraction of the total scalar loop cost. This
  // adds a loop-dependent bound on the overhead incurred if the runtime
  // checks fail. In case the runtime checks fail, the cost is RtC + ScalarC
  // * TC. To bound the runtime check to be a fraction 1/X of the scalar
  // cost, compute
  //   RtC < ScalarC * TC * (1 / X)  ==>  RtC * X / ScalarC < TC
  uint64_t MinTC2 = divideCeil(RtC * 10, ScalarC);
```

I do understand what you're trying to do, but part of me feels this is counter-intuitive because tail-folding is supposed to be especially beneficial for loops with low trip counts. If we require the trip count to be that high then I feel we may as well not bother tail-folding. The SPEC2017 x264 benchmark has hot loops with low trip counts (8,12,16,etc.) that benefit from tail-folding and also require runtime memory checks, so that's one thing to be careful of here as it could lead to performance regressions.

Another problem here is inconsistency - it looks like you're only adding the check here, but what about the above tail-folding case:

```
  } else if (VF.isScalable() &&
             !isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
             Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
```

It's not obvious to me why we don't need the same check there?

https://github.com/llvm/llvm-project/pull/115833


More information about the llvm-commits mailing list