[PATCH] D147117: [SCEV] When computing trip count, only zext if necessary

Joshua Cao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 29 00:25:51 PDT 2023


caojoshua created this revision.
Herald added subscribers: javed.absar, hiraditya.
Herald added a project: All.
caojoshua added reviewers: congzhe, mkazantsev, reames, nikic.
Herald added a subscriber: StephenFan.
caojoshua updated this revision to Diff 509232.
caojoshua edited the summary of this revision.
caojoshua added a comment.
caojoshua published this revision for review.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

This patch improves on https://reviews.llvm.org/D110587. To summarize
the patch, given backedge-taken count BC, trip count TC is `BC + 1`.
However, we don't know if BC we might overflow. So the patch modifies TC
computation to `1 + zext(BC)`.

This patch only adds the zext if necessary by looking at the constant
range. If we can determine that BC cannot be the max value for its
bitwidth, then we know adding 1 will not overflow, and the zext is not
needed. We apply loop guards before computing TC to get more data.

The primary motivation is to support my work on more precise trip
multiples in https://reviews.llvm.org/D141823. For example:

  void test(unsigned n)
    __builtin_assume(n % 6 == 0);
    for (unsigned i = 0; i < n; ++i)
      foo();

Prior to this patch, we had `TC = 1 + zext(-1 + 6 * ((6 umax %n) /u
6))<nuw>`. SCEV range computation is able to determine that the BC
cannot be the max value, so the zext is not needed. The result is `TC
-> (6 * ((6 umax %n) /u 6))<nuw>`. From here, we would be able to
determine that %n is a multiple of 6.

There was one change in LoopCacheAnalysis/LoopInterchange required.
Before this patch, if a loop has BC = false, it would compute `TC -> 1 +
zext(false) -> 1`, which was fine. After this patch, it computes `TC -> 1
+ false = true`. CacheAnalysis would then sign extend the `true`, which
was not the intended the behavior. I modified CacheAnalysis such that
it would only zero extend trip counts.

This patch is not NFC, but also does not change any SCEV outputs. I
would like to get this patch out first to make work with trip multiples
easier.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D147117

Files:
  llvm/lib/Analysis/LoopCacheAnalysis.cpp
  llvm/lib/Analysis/ScalarEvolution.cpp


Index: llvm/lib/Analysis/ScalarEvolution.cpp
===================================================================
--- llvm/lib/Analysis/ScalarEvolution.cpp
+++ llvm/lib/Analysis/ScalarEvolution.cpp
@@ -8046,6 +8046,12 @@
   if (!Extend)
     return getAddExpr(ExitCount, getOne(ExitCountType));
 
+  ConstantRange ExitCountRange =
+      getRangeRef(ExitCount, RangeSignHint::HINT_RANGE_UNSIGNED);
+  if (!ExitCountRange.contains(
+          APInt::getMaxValue(ExitCountRange.getBitWidth())))
+    return getAddExpr(ExitCount, getOne(ExitCount->getType()));
+
   auto *WiderType = Type::getIntNTy(ExitCountType->getContext(),
                                     1 + ExitCountType->getScalarSizeInBits());
   return getAddExpr(getNoopOrZeroExtend(ExitCount, WiderType),
@@ -8228,15 +8234,14 @@
     return 1;
 
   // Get the trip count
-  const SCEV *TCExpr = getTripCountFromExitCount(ExitCount);
+  const SCEV *TCExpr = getTripCountFromExitCount(applyLoopGuards(ExitCount, L));
 
   const SCEVConstant *TC = dyn_cast<SCEVConstant>(TCExpr);
   if (!TC)
     // Attempt to factor more general cases. Returns the greatest power of
     // two divisor. If overflow happens, the trip count expression is still
     // divisible by the greatest power of 2 divisor returned.
-    return 1U << std::min((uint32_t)31,
-                          GetMinTrailingZeros(applyLoopGuards(TCExpr, L)));
+    return 1U << std::min((uint32_t)31, GetMinTrailingZeros(TCExpr));
 
   ConstantInt *Result = TC->getValue();
 
Index: llvm/lib/Analysis/LoopCacheAnalysis.cpp
===================================================================
--- llvm/lib/Analysis/LoopCacheAnalysis.cpp
+++ llvm/lib/Analysis/LoopCacheAnalysis.cpp
@@ -297,7 +297,7 @@
     Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());
     const SCEV *CacheLineSize = SE.getConstant(WiderType, CLS);
     Stride = SE.getNoopOrAnyExtend(Stride, WiderType);
-    TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);
+    TripCount = SE.getNoopOrZeroExtend(TripCount, WiderType);
     const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
     RefCost = SE.getUDivExpr(Numerator, CacheLineSize);
 
@@ -334,7 +334,7 @@
 
   // Attempt to fold RefCost into a constant.
   if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))
-    return ConstantCost->getValue()->getSExtValue();
+    return ConstantCost->getValue()->getZExtValue();
 
   LLVM_DEBUG(dbgs().indent(4)
              << "RefCost is not a constant! Setting to RefCost=InvalidCost "


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D147117.509232.patch
Type: text/x-patch
Size: 2523 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230329/3bc64fca/attachment.bin>


More information about the llvm-commits mailing list