[llvm] [VPlan] Get Addr computation cost with scalar type if it is uniform for gather/scatter. (PR #150371)
Elvis Wang via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 23 22:27:10 PDT 2025
https://github.com/ElvisWang123 created https://github.com/llvm/llvm-project/pull/150371
This patch query `getAddressComputationCost()` with scalar type if the address is uniform. This can help the cost for gather/scatter more accurate.
In current LV, non consecutive VPWidenMemoryRecipe (gather/scatter) will account the cost of address computation. But there are some cases that the address is uniform across all lanes, that makes the address can be calculated with scalar type and broadcast.
I have a followup optimization that tries to convert gather/scatter with uniform memory access to scalar load/store + broadcast (and select if needed). With this optimization, we can remove this temporary change.
This patch is preparation for #149955.
>From d56bab9682a422ed05d2880dd9e9d35b63e81e7d Mon Sep 17 00:00:00 2001
From: Elvis Wang <elvis.wang at sifive.com>
Date: Wed, 23 Jul 2025 17:02:40 -0700
Subject: [PATCH] [VPlan] Get Addr computation cost with scalar type if it is
uniform for gather/scatter.
This patch query `getAddressComputationCost()` with scalar type if the
address is uniform. This can help the cost for gather/scatter more
accurate.
In current LV, non consecutive VPWidenMemoryRecipe (gather/scatter) will
account the cost of address computation. But there are some cases that
the addr is uniform accross lanes, that makes the address can be
calculated with scalar type and broadcast.
I have a follow optimization that try to converts gather/scatter with
uniform memory acces to scalar load/store + broadcast. With this
optimization, we can remove this temporary change.
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 6 ++++++
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | 15 ++++++++++++---
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 3ce9d29d34553..7adb87f4557f8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6932,6 +6932,12 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
auto Iter = vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry());
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
for (VPRecipeBase &R : *VPBB) {
+ if (auto *MR = dyn_cast<VPWidenMemoryRecipe>(&R)) {
+ // The address computation cost can be query as scalar type if the
+ // address is uniform.
+ if (!MR->isConsecutive() && vputils::isSingleScalar(MR->getAddr()))
+ return true;
+ }
if (auto *IR = dyn_cast<VPInterleaveRecipe>(&R)) {
auto *IG = IR->getInterleaveGroup();
unsigned NumMembers = IG->getNumMembers();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 57b713d3dfcb9..e8a3951bbeb20 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -3083,9 +3083,18 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
assert(!Reverse &&
"Inconsecutive memory access should not have the order.");
- return Ctx.TTI.getAddressComputationCost(Ty) +
- Ctx.TTI.getGatherScatterOpCost(Opcode, Ty, Ptr, IsMasked, Alignment,
- Ctx.CostKind, &Ingredient);
+ InstructionCost Cost = 0;
+
+ // If the address value is uniform across all lane, then the address can be
+ // calculated with scalar type and broacast.
+ if (vputils::isSingleScalar(getAddr()))
+ Cost += Ctx.TTI.getAddressComputationCost(Ty->getScalarType());
+ else
+ Cost += Ctx.TTI.getAddressComputationCost(Ty);
+
+ return Cost + Ctx.TTI.getGatherScatterOpCost(Opcode, Ty, Ptr, IsMasked,
+ Alignment, Ctx.CostKind,
+ &Ingredient);
}
InstructionCost Cost = 0;
More information about the llvm-commits
mailing list