[llvm] [LV] Handle scalable VFs in optimizeForVFAndUF (PR #82669)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 22 10:15:43 PST 2024
https://github.com/preames created https://github.com/llvm/llvm-project/pull/82669
Given a scalable VF of the form <NumElts * VScale>, this patch adds the ability to discharge a backedge test for a loop whose trip count is between (NumElts, MinVScale*NumElts).
A couple of notes on this:
* Annoyingly, I could not figure out to write a test for this case. My attempt is checked in as test32_i8 in f67ef1a, but LV uses a fixed vector in that case, and ignored the force flags.
* This depends on 9eb5f94f to avoid appearing like a regression. Since SCEV doesn't know any upper bound on vscale without the vscale_range attribute (it doesn't query TTI), the ranges overflow on the multiply. Arguably, this is fixing a bug in the current LV code since in theory vscale can be large enough to overflow for real, but no actual target is going to see that case.
>From b84feac43f3603738f4de69e0b9f5a7296c55a07 Mon Sep 17 00:00:00 2001
From: Philip Reames <preames at rivosinc.com>
Date: Wed, 21 Feb 2024 12:43:50 -0800
Subject: [PATCH] [LV] Handle scalable VFs in optimizeForVFAndUF
Given a scalable VF of the form <NumElts * VScale>, this patch
adds the ability to discharge a backedge test for a loop whose
trip count is between (NumElts, MinVScale*NumElts).
A couple of notes on this:
* Annoyingly, I could not figure out to write a test for this case.
My attempt is checked in as test32_i8 in f67ef1a, but LV uses a
fixed vector in that case, and ignored the force flags.
* This depends on 9eb5f94f to avoid appearing like a regression.
Since SCEV doesn't know any upper bound on vscale without the
vscale_range attribute (it doesn't query TTI), the ranges overflow
on the multiply. Arguably, this is fixing a bug in the current
LV code since in theory vscale can be large enough to overflow
for real, but no actual target is going to see that case.
---
llvm/include/llvm/Analysis/ScalarEvolution.h | 1 +
llvm/lib/Analysis/ScalarEvolution.cpp | 7 +++++++
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | 4 ++--
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 0880f9c65aa45d..5828cc156cc785 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -570,6 +570,7 @@ class ScalarEvolution {
const SCEV *getPtrToIntExpr(const SCEV *Op, Type *Ty);
const SCEV *getTruncateExpr(const SCEV *Op, Type *Ty, unsigned Depth = 0);
const SCEV *getVScale(Type *Ty);
+ const SCEV *getElementCount(Type *Ty, ElementCount EC);
const SCEV *getZeroExtendExpr(const SCEV *Op, Type *Ty, unsigned Depth = 0);
const SCEV *getZeroExtendExprImpl(const SCEV *Op, Type *Ty,
unsigned Depth = 0);
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 4b2db80bc1ec30..e1e6742e50efec 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -509,6 +509,13 @@ const SCEV *ScalarEvolution::getVScale(Type *Ty) {
return S;
}
+const SCEV *ScalarEvolution::getElementCount(Type *Ty, ElementCount EC) {
+ const SCEV *Res = getConstant(Ty, EC.getKnownMinValue());
+ if (EC.isScalable())
+ Res = getMulExpr(Res, getVScale(Ty));
+ return Res;
+}
+
SCEVCastExpr::SCEVCastExpr(const FoldingSetNodeIDRef ID, SCEVTypes SCEVTy,
const SCEV *op, Type *ty)
: SCEV(ID, SCEVTy, computeExpressionSize(op)), Op(op), Ty(ty) {}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9c3f35112b592f..a01eaa3c6c8b3a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -626,8 +626,8 @@ void VPlanTransforms::optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF,
Plan.getCanonicalIV()->getStartValue()->getLiveInIRValue()->getType();
const SCEV *TripCount = createTripCountSCEV(IdxTy, PSE);
ScalarEvolution &SE = *PSE.getSE();
- const SCEV *C =
- SE.getConstant(TripCount->getType(), BestVF.getKnownMinValue() * BestUF);
+ ElementCount NumElements = BestVF.multiplyCoefficientBy(BestUF);
+ const SCEV *C = SE.getElementCount(TripCount->getType(), NumElements);
if (TripCount->isZero() ||
!SE.isKnownPredicate(CmpInst::ICMP_ULE, TripCount, C))
return;
More information about the llvm-commits
mailing list