[llvm] [VPlan] Get Addr computation cost with scalar type if it is uniform for gather/scatter. (PR #150371)

Thu Aug 7 23:05:37 PDT 2025

https://github.com/ElvisWang123 updated https://github.com/llvm/llvm-project/pull/150371

>From 776235f26603e4bf3e12c2dcdf93ab6bf9abce02 Mon Sep 17 00:00:00 2001
From: Elvis Wang <elvis.wang at sifive.com>
Date: Wed, 23 Jul 2025 17:02:40 -0700
Subject: [PATCH 1/2] [VPlan] Get Addr computation cost with scalar type if it
 is uniform for gather/scatter.

This patch query `getAddressComputationCost()` with scalar type if the
address is uniform. This can help the cost for gather/scatter more
accurate.

In current LV, non consecutive VPWidenMemoryRecipe (gather/scatter) will
account the cost of address computation. But there are some cases that
the addr is uniform accross lanes, that makes the address can be
calculated with scalar type and broadcast.

I have a follow optimization that try to converts gather/scatter with
uniform memory acces to scalar load/store + broadcast. With this
optimization, we can remove this temporary change.
---
 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp |  6 ++++++
 llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp  | 15 ++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index a52aa8420b301..5f60791bedc10 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7018,6 +7018,12 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
   auto Iter = vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry());
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
     for (VPRecipeBase &R : *VPBB) {
+      if (auto *MR = dyn_cast<VPWidenMemoryRecipe>(&R)) {
+        // The address computation cost can be query as scalar type if the
+        // address is uniform.
+        if (!MR->isConsecutive() && vputils::isSingleScalar(MR->getAddr()))
+          return true;
+      }
       if (auto *IR = dyn_cast<VPInterleaveRecipe>(&R)) {
         auto *IG = IR->getInterleaveGroup();
         unsigned NumMembers = IG->getNumMembers();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e971ba1aac15c..ab11bfe560f6e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -3099,9 +3099,18 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
     const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
     assert(!Reverse &&
            "Inconsecutive memory access should not have the order.");
-    return Ctx.TTI.getAddressComputationCost(Ty) +
-           Ctx.TTI.getGatherScatterOpCost(Opcode, Ty, Ptr, IsMasked, Alignment,
-                                          Ctx.CostKind, &Ingredient);
+    InstructionCost Cost = 0;
+
+    // If the address value is uniform across all lane, then the address can be
+    // calculated with scalar type and broacast.
+    if (vputils::isSingleScalar(getAddr()))
+      Cost += Ctx.TTI.getAddressComputationCost(Ty->getScalarType());
+    else
+      Cost += Ctx.TTI.getAddressComputationCost(Ty);
+
+    return Cost + Ctx.TTI.getGatherScatterOpCost(Opcode, Ty, Ptr, IsMasked,
+                                                 Alignment, Ctx.CostKind,
+                                                 &Ingredient);
   }
 
   InstructionCost Cost = 0;

>From 8989f33e15fcd0bcfc50da39542082a782ca5b28 Mon Sep 17 00:00:00 2001
From: Elvis Wang <elvis.wang at sifive.com>
Date: Thu, 7 Aug 2025 23:04:03 -0700
Subject: [PATCH 2/2] !fixup, remove unnecessary
 planContainsAdditionalSimplifications.

---
 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5f60791bedc10..a52aa8420b301 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7018,12 +7018,6 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
   auto Iter = vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry());
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
     for (VPRecipeBase &R : *VPBB) {
-      if (auto *MR = dyn_cast<VPWidenMemoryRecipe>(&R)) {
-        // The address computation cost can be query as scalar type if the
-        // address is uniform.
-        if (!MR->isConsecutive() && vputils::isSingleScalar(MR->getAddr()))
-          return true;
-      }
       if (auto *IR = dyn_cast<VPInterleaveRecipe>(&R)) {
         auto *IG = IR->getInterleaveGroup();
         unsigned NumMembers = IG->getNumMembers();