[llvm] [WIP][LV] Ignore some costs when loop gets fully unrolled (PR #106699)

Fri Aug 30 02:52:59 PDT 2024

https://github.com/igogo-x86 created https://github.com/llvm/llvm-project/pull/106699

When fixed-width VF equals the number of iterations, comparison instruction and induction operation will be DCEed later. Ignoring the costs of these instructions improves the cost model.

I have one benchmark that significantly improves when vectorised with fixed VF instead of scalable on AArch64 due to full unrolling. But I am not sure how to ignore the costs properly. This patch, as it is, makes new and legacy cost models out of sync and compiler fails with assertion error on this line:

```
assert((BestFactor.Width == LegacyVF.Width ||
 planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width),
 BestFactor.Width, CostCtx,
 OrigLoop, CM)) &&
 " VPlan cost model and legacy cost model disagreed");
```

Identifying variables to ignore in `LoopVectorizationCostModel::getInstructionCost` is much more challenging.

Is the approach shown in the patch ok, and can I wait for the legacy cost model to be removed? If not, how can it be done differently?

>From 9a5cbbc01456725a0568d0ab6f05a7f8aff3ae4a Mon Sep 17 00:00:00 2001
From: Igor Kirillov <igor.kirillov at arm.com>
Date: Fri, 30 Aug 2024 09:22:21 +0000
Subject: [PATCH] [WIP][LV] Ignore some costs when loop gets fully unrolled

When fixed-width VF equals the number of iterations, comparison instruction
and induction operation will be DCEed later. Ignoring the costs of these
instructions improves the cost model.
---
 .../Transforms/Vectorize/LoopVectorize.cpp    | 20 +++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6babfd1eee9108..7dc9a60cb0f034 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7112,6 +7112,26 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
         continue;
       IVInsts.push_back(CI);
     }
+
+    // If the given VF loop gets fully unrolled, ignore the costs of comparison
+    // and increment instruction, as they'll get simplified away
+    auto TC = CM.PSE.getSE()->getSmallConstantTripCount(OrigLoop);
+    auto *Cmp = OrigLoop->getLatchCmpInst();
+    if (Cmp && VF.isFixed() && VF.getFixedValue() == TC) {
+      CostCtx.SkipCostComputation.insert(Cmp);
+      for (Instruction *IVInst : IVInsts) {
+        bool IsSimplifiedAway = true;
+        for (auto *UIV : IVInst->users()) {
+          if (!Legal->isInductionVariable(UIV) && UIV != Cmp) {
+            IsSimplifiedAway = false;
+            break;
+          }
+        }
+        if (IsSimplifiedAway)
+          CostCtx.SkipCostComputation.insert(IVInst);
+      }
+    }
+
     for (Instruction *IVInst : IVInsts) {
       if (CostCtx.skipCostComputation(IVInst, VF.isVector()))
         continue;