[llvm] [LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (PR #132170)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 20 07:43:47 PDT 2025


================
@@ -4105,13 +4102,40 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
     const SCEV *Rem = SE->getURemExpr(
         SE->applyLoopGuards(ExitCount, TheLoop),
         SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));
-    if (Rem->isZero()) {
+    return Rem->isZero();
+  };
+
+  if (MaxPowerOf2RuntimeVF && *MaxPowerOf2RuntimeVF > 0) {
+    assert((UserVF.isNonZero() || isPowerOf2_32(*MaxPowerOf2RuntimeVF)) &&
+           "MaxFixedVF must be a power of 2");
+    if (IsKnownModTripCountZero(*MaxPowerOf2RuntimeVF)) {
       // Accept MaxFixedVF if we do not have a tail.
       LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
       return MaxFactors;
     }
   }
 
+  if (MaxTC && MaxTC <= TTI.getMinTripCountTailFoldingThreshold()) {
----------------
david-arm wrote:

This is actually different behaviour to before, since previously the trip count was calculated using `auto ExpectedTC = getSmallBestKnownTC(PSE, L);` whereas `MaxTC` comes from `getSmallConstantMaxTripCount`. The key thing missing here is the use of profile information I think. So I think you'll need to do something like:

```
  auto ExpectedTC = getSmallBestKnownTC(PSE, L);
  if (ExpectedTC <= TTI.getMinTripCountTailFoldingThreshold())
```

I think this has exposed a missing test case in general for a loop with profile information that has a too-low trip count. Would you be happy to add one? There is an example of using branch weights in `Transforms/LoopVectorize/tripcount.ll` I think.

https://github.com/llvm/llvm-project/pull/132170


More information about the llvm-commits mailing list