[llvm] df9ba13 - [LV] Handle scalable VFs in optimizeForVFAndUF (#82669)

Mon Mar 4 13:49:50 PST 2024

Author: Philip Reames
Date: 2024-03-04T13:49:35-08:00
New Revision: df9ba13579e298fcb57aa59c5c187ce6729881d6

URL: https://github.com/llvm/llvm-project/commit/df9ba13579e298fcb57aa59c5c187ce6729881d6
DIFF: https://github.com/llvm/llvm-project/commit/df9ba13579e298fcb57aa59c5c187ce6729881d6.diff

LOG: [LV] Handle scalable VFs in optimizeForVFAndUF (#82669)

Given a scalable VF of the form <NumElts * VScale>, this patch adds the
ability to discharge a backedge test for a loop whose trip count is
between (NumElts, MinVScale*NumElts).

A couple of notes on this:
* Annoyingly, I could not figure out to write a test for this case. My
attempt is checked in as test32_i8 in f67ef1a, but LV uses a fixed
vector in that case, and ignored the force flags.
* This depends on 9eb5f94f to avoid appearing like a regression. Since
SCEV doesn't know any upper bound on vscale without the vscale_range
attribute (it doesn't query TTI), the ranges overflow on the multiply.
Arguably, this is fixing a bug in the current LV code since in theory
vscale can be large enough to overflow for real, but no actual target is
going to see that case.

Added: 
    

Modified: 
    llvm/include/llvm/Analysis/ScalarEvolution.h
    llvm/lib/Analysis/ScalarEvolution.cpp
    llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 0880f9c65aa45d..5828cc156cc785 100644

--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -570,6 +570,7 @@ class ScalarEvolution {
   const SCEV *getPtrToIntExpr(const SCEV *Op, Type *Ty);
   const SCEV *getTruncateExpr(const SCEV *Op, Type *Ty, unsigned Depth = 0);
   const SCEV *getVScale(Type *Ty);
+  const SCEV *getElementCount(Type *Ty, ElementCount EC);
   const SCEV *getZeroExtendExpr(const SCEV *Op, Type *Ty, unsigned Depth = 0);
   const SCEV *getZeroExtendExprImpl(const SCEV *Op, Type *Ty,
                                     unsigned Depth = 0);

diff  --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 4b2db80bc1ec30..e1e6742e50efec 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -509,6 +509,13 @@ const SCEV *ScalarEvolution::getVScale(Type *Ty) {
   return S;
 }
 
+const SCEV *ScalarEvolution::getElementCount(Type *Ty, ElementCount EC) {
+  const SCEV *Res = getConstant(Ty, EC.getKnownMinValue());
+  if (EC.isScalable())
+    Res = getMulExpr(Res, getVScale(Ty));
+  return Res;
+}
+
 SCEVCastExpr::SCEVCastExpr(const FoldingSetNodeIDRef ID, SCEVTypes SCEVTy,
                            const SCEV *op, Type *ty)
     : SCEV(ID, SCEVTy, computeExpressionSize(op)), Op(op), Ty(ty) {}

diff  --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9d6deb802e2090..f6b564ad931ca9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -618,8 +618,8 @@ void VPlanTransforms::optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF,
       Plan.getCanonicalIV()->getStartValue()->getLiveInIRValue()->getType();
   const SCEV *TripCount = createTripCountSCEV(IdxTy, PSE);
   ScalarEvolution &SE = *PSE.getSE();
-  const SCEV *C =
-      SE.getConstant(TripCount->getType(), BestVF.getKnownMinValue() * BestUF);
+  ElementCount NumElements = BestVF.multiplyCoefficientBy(BestUF);
+  const SCEV *C = SE.getElementCount(TripCount->getType(), NumElements);
   if (TripCount->isZero() ||
       !SE.isKnownPredicate(CmpInst::ICMP_ULE, TripCount, C))
     return;