[llvm] [LV] Use SCEV to check if IV overflow check is known (PR #115705)

Mon Nov 11 06:50:04 PST 2024

================
@@ -2491,17 +2491,19 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
     Value *LHS = Builder.CreateSub(MaxUIntTripCount, Count);
 
     Value *Step = CreateStep();
-#ifndef NDEBUG
     ScalarEvolution &SE = *PSE.getSE();
     const SCEV *TC2OverflowSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);
-    assert(
-        !isIndvarOverflowCheckKnownFalse(Cost, VF * UF) &&
-        !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT),
-                             TC2OverflowSCEV, SE.getSCEV(Step)) &&
-        "unexpectedly proved overflow check to be known");
-#endif
-    // Don't execute the vector loop if (UMax - n) < (VF * UF).
-    CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
+    const SCEV *StepSCEV = SE.getSCEV(Step);
+
+    // Check if (UMax - n) < (VF * UF).
+    if (SE.isKnownPredicate(ICmpInst::ICMP_ULT, TC2OverflowSCEV, StepSCEV)) {
----------------
david-arm wrote:

Ah this is interesting and thanks for explaining and putting up a fix! So the test you've added shows `isIndvarOverflowCheckKnownFalse` is returning the wrong answer, or at least an answer that's inconsistent with the `SE.isKnownPredicate` call. Once we've applied the loop guards we know that the check is false. So it looks like there are a few choices:

1. In the assert above we could avoid applying the loop guards and then `isIndvarOverflowCheckKnownFalse` will be consistent with the call to `SE.isKnownPredicate`. That might be a much smaller change than this patch I think? Or,
2. Teach `isIndvarOverflowCheckKnownFalse` to apply loop guards, which I think would require changing ScalarEvolution::getSmallConstantMaxTripCount to apply loop guards. It looks like `ScalarEvolution::getSmallConstantTripMultiple` already does something similar. Or,
3. Use the current version of your patch.

The only problem with 3 is that the code is now more complex and it doesn't look like there are any tests for the `CheckMinIters = Builder.getTrue();` case you've added.

I personally feel like 2 is the most powerful change because you've highlighted a problem in how we estimate the small constant max trip count. If the vectoriser is always going to apply loop guards anyway, then we should do the same when estimating the trip count. However, I realise it would expand the scope of this patch quite a bit and I perfectly understand if you don't have time to do this right now! What do you think?

https://github.com/llvm/llvm-project/pull/115705