[llvm-branch-commits] [llvm] Patch 3: [LV] Add extra CM instace for EpilogueTF (PR #202820)

Wed Jun 10 05:10:39 PDT 2026

================
@@ -5517,22 +5518,49 @@ void LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC,
   // for later use by the cost model.
   Config.computeMinimalBitwidths();
 
+  SmallVector<LoopVectorizationCostModel *, 2> EnabledCMs;
+  EnabledCMs.push_back(&CM);
+
+  // Make sure firstly that the epilogue of main vector loop is allowed, then
+  // check if the tail-folded epilogue feature is enabled.
+  if (CM.EpilogueLoweringStatus == CM_EpilogueAllowed &&
+      EpilogueTailFoldingCM) {
+    // To avoid redundant heavy computation, copy computed `ValuesToIgnore`
+    // and `VecValuesToIgnore` to the EpilogueTailFoldingCM as they will be
+    // same.
+    EpilogueTailFoldingCM->ValuesToIgnore.insert_range(CM.ValuesToIgnore);
+    EpilogueTailFoldingCM->VecValuesToIgnore.insert_range(CM.VecValuesToIgnore);
+
+    // After making sure that we can get valid results of computeMaxVF, make
+    // sure that tail-folding for the epilogue loop still valid.
+    if (EpilogueTailFoldingCM->computeMaxVF(UserVF, UserIC) &&
+        EpilogueTailFoldingCM->foldTailByMasking()) {
+      EnabledCMs.push_back(&*EpilogueTailFoldingCM);
+      LLVM_DEBUG(dbgs() << "LV: CM instances: " << EnabledCMs.size() << "\n");
+    }
+  }
+
   // Invalidate interleave groups if all blocks of loop will be predicated.
-  if (CM.blockNeedsPredicationForAnyReason(OrigLoop->getHeader()) &&
-      !useMaskedInterleavedAccesses(TTI)) {
-    LLVM_DEBUG(
-        dbgs()
-        << "LV: Invalidate all interleaved groups due to fold-tail by masking "
-           "which requires masked-interleaved support.\n");
-    if (CM.InterleaveInfo.invalidateGroups())
-      // Invalidating interleave groups also requires invalidating all decisions
-      // based on them, which includes widening decisions and uniform and scalar
-      // values.
-      CM.invalidateCostModelingDecisions();
+  if (!useMaskedInterleavedAccesses(TTI)) {
+    for_each(EnabledCMs, [&](auto *CurrentCM) {
----------------
igogo-x86 wrote:

Could we avoid the repeated for_each blocks here? It looks like we do the same cost model setup for the main CM and for EpilogueTailFoldingCM when it is present. Maybe this can be moved to a small helper, for example:

```
  prepareCostModelForVFs(CM, VFCandidates);
  if (EpilogueTailFoldingCM)
    prepareCostModelForVFs(*EpilogueTailFoldingCM, VFCandidates);
```

This would make the code easier to read and show that both cost models go through the same preparation.

https://github.com/llvm/llvm-project/pull/202820