[llvm] [LV] Stop using the legacy cost model for udiv + friends (PR #152707)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 20 08:55:50 PDT 2025


https://github.com/david-arm updated https://github.com/llvm/llvm-project/pull/152707

>From 7a8a9b0d8c292dc57c2ccfc745baa427aa02f240 Mon Sep 17 00:00:00 2001
From: David Sherwood <david.sherwood at arm.com>
Date: Fri, 8 Aug 2025 12:50:28 +0000
Subject: [PATCH 1/4] [LV] Stop using the legacy cost model for udiv + friends

In VPWidenRecipe::computeCost for the instructions udiv, sdiv,
urem and srem we fall back on the legacy cost unnecessarily. At
this point we know that the vplan must be functionally correct, i.e.
if the divide/remainder is not safe to speculatively execute then
we must have either:

1. Scalarised the operation, in which case we wouldn't be using
a VPWidenRecipe, or
2. We've inserted a select for the second operand to ensure we
don't fault through divide-by-zero.

For 2) it's necessary to add the select operation to
VPInstruction::computeCost so that we mirror the cost of the
legacy cost model. The only problem with this is that we also
generate selects in vplan for predicated loops with reductions,
which *aren't* accounted for in the legacy cost model. In order
to prevent asserts firing I've also added the selects to
precomputeCosts to ensure the legacy costs match the vplan costs
for reductions.
---
 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp |  3 +++
 llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp  | 16 +++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5e7f6523cd86d..4aaaf5485ab77 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4278,6 +4278,9 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
           if (!VPI)
             continue;
           switch (VPI->getOpcode()) {
+          // Selects are not modelled in the legacy cost model if they are
+          // inserted for reductions.
+          case Instruction::Select:
           case VPInstruction::ActiveLaneMask:
           case VPInstruction::ExplicitVectorLength:
             C += VPI->cost(VF, CostCtx);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index f8fde0500b77a..30eabff06b786 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1029,6 +1029,15 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
   }
 
   switch (getOpcode()) {
+  case Instruction::Select: {
+    // TODO: It may be possible to improve this by analyzing where the
+    // condition operand comes from.
+    CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
+    auto *CondTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
+    auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(1)), VF);
+    return Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy, Pred,
+                                      Ctx.CostKind);
+  }
   case Instruction::ExtractElement:
   case VPInstruction::ExtractLane: {
     // Add on the cost of extracting the element.
@@ -2099,9 +2108,10 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
   case Instruction::SDiv:
   case Instruction::SRem:
   case Instruction::URem:
-    // More complex computation, let the legacy cost-model handle this for now.
-    return Ctx.getLegacyCost(cast<Instruction>(getUnderlyingValue()), VF);
-  case Instruction::FNeg:
+    // If the div/rem operation isn't safe to speculate and requires
+    // predication, then the only way we can even create a vplan is to insert
+    // a select on the second input operand to ensure we use the value of 1
+    // for the inactive lanes. The select will be costed separately.
   case Instruction::Add:
   case Instruction::FAdd:
   case Instruction::Sub:

>From 2275840b16b6fe78dde6b3056ccf8ccf71b3b370 Mon Sep 17 00:00:00 2001
From: David Sherwood <david.sherwood at arm.com>
Date: Tue, 12 Aug 2025 10:52:14 +0000
Subject: [PATCH 2/4] Address review comment

---
 llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 30eabff06b786..d6829af1c33e8 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1033,8 +1033,12 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
     // TODO: It may be possible to improve this by analyzing where the
     // condition operand comes from.
     CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
-    auto *CondTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
-    auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(1)), VF);
+    auto *CondTy = Ctx.Types.inferScalarType(getOperand(0));
+    auto *VecTy = Ctx.Types.inferScalarType(getOperand(1));
+    if (!vputils::onlyFirstLaneUsed(this)) {
+      CondTy = toVectorTy(CondTy, VF);
+      VecTy = toVectorTy(VecTy, VF);
+    }
     return Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy, Pred,
                                       Ctx.CostKind);
   }

>From 2ccec614a718133db31c9c40a55a9de99f8101b3 Mon Sep 17 00:00:00 2001
From: David Sherwood <david.sherwood at arm.com>
Date: Wed, 20 Aug 2025 15:46:11 +0000
Subject: [PATCH 3/4] Fix asserts after rebase

---
 .../Transforms/Vectorize/LoopVectorize.cpp    | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 4aaaf5485ab77..53c1adbb131ff 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4280,7 +4280,24 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
           switch (VPI->getOpcode()) {
           // Selects are not modelled in the legacy cost model if they are
           // inserted for reductions.
-          case Instruction::Select:
+          case Instruction::Select: {
+            VPValue *V =
+                R.getNumDefinedValues() == 1 ? R.getVPSingleValue() : nullptr;
+            if (V && V->getNumUsers() == 1) {
+              if (auto *UR = dyn_cast<VPWidenRecipe>(*V->user_begin())) {
+                switch (UR->getOpcode()) {
+                case Instruction::UDiv:
+                case Instruction::SDiv:
+                case Instruction::URem:
+                case Instruction::SRem:
+                  continue;
+                default:
+                  break;
+                }
+              }
+            }
+            [[fallthrough]];
+          }
           case VPInstruction::ActiveLaneMask:
           case VPInstruction::ExplicitVectorLength:
             C += VPI->cost(VF, CostCtx);

>From 19e3beee8021980f713995eea7d3f94f7a614e3f Mon Sep 17 00:00:00 2001
From: David Sherwood <david.sherwood at arm.com>
Date: Wed, 20 Aug 2025 15:55:06 +0000
Subject: [PATCH 4/4] Fix another rebase issue

---
 llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index d6829af1c33e8..63ed2d44ed8ff 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2116,6 +2116,7 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
     // predication, then the only way we can even create a vplan is to insert
     // a select on the second input operand to ensure we use the value of 1
     // for the inactive lanes. The select will be costed separately.
+  case Instruction::FNeg:
   case Instruction::Add:
   case Instruction::FAdd:
   case Instruction::Sub:



More information about the llvm-commits mailing list