[llvm] [IndVarSimplify] Eliminated Pure LoopCounter (PR #146845)

Thu Jul 3 03:10:36 PDT 2025

https://github.com/buggfg created https://github.com/llvm/llvm-project/pull/146845

This patch accomplishes two main tasks:

- It relaxes the stride restriction on the LoopCounter to provide additional optimization opportunities; 
- It eliminates the Pure LoopCounter, thereby unlocking potential for further optimizations, such as LoopUnroll.

**Key changes**:

- Relax the stride restriction on the LoopCounter

  Usually the **loop's iterations are counted** by an integer-valued variable that proceeds **upward (or downward) by a constant** amount with each iteration[1][2].  However, the current design requires the loop counter to have a unit stride (+1) and does not support -1, which limits optimization potential.

  Without disrupting the original design, we have relaxed the stride restrictions in both `isLoopCounter()` and `genLoopLimit()` to allow loop counters with strides of either +1 or -1. This enhancement, combined with LLVM’s existing infrastructure, enables support for common countdown loop patterns—for example, transforming a loop such as `for (i = 31; i * i > 48; i--)` into `for (i = 31; i > 6; i--)`—thereby improving the pass’s versatility and optimization coverage.

  

  [1] Steven S. Muchnick. 1998. Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

  [2] [Loops - Using and Porting GNU Fortran](https://gcc.gnu.org/onlinedocs/gcc-3.3.6/g77/Loops.html)

- Eliminate the Pure LoopCounter

  We define a PHI node as the Pure LoopCounter if it meets these three conditions: (1) it is used for loop termination testing;  (2) it has at most two users, PostIncOrDec and Cmp (optional). (3) its type is of integer type.

  This optimization is based on a key insight: the Pure LoopCounter can be replaced by another induction variable that satisfies the following conditions: (1) it meets the criteria defined by `isLoopCounter()`; (2) it is of integer type with a width equal to that of `ExitCount`; and (3) it is not the Pure LoopCounter itself.

  Eliminating the Pure LoopCounter can expose additional opportunities for downstream optimizations, such as loop unroll. For example

  ```fortran
    DO j = jms, jme
      DO k = kms, kme
        DO i = ims, ime
          u(i,k,j) = u(i,k,j) + dts * ru_tend(i,k,j)
        ENDDO 
      ENDDO
    ENDDO
  ```

  This common example can successfully perform four loop unrollings after eliminating the pure loop counter, resulting in a **36%** performance improvement.

**Key details**

- Only active when  `-enable-pure-loop-counter-elimination` is enabled(disabled by default)

**Tests**

- Two LIT tests  added to confirm the effectiveness of the patch.

- Four existing LIT tests  modified because relaxing the stride restrictions on LoopCounters provides more optimization opportunities for LFTR, such as simplifying exit comparisons from `sgt/slt` to `eq/ne`.

- We further verified the correctness of the transformation and evaluated its performance using the SPEC CPU2006 and SPEC CPU2017 benchmark suites. 

- The results of the SPEC CPU2017 benchmark are as follows:

  | bench  | ratio<br>[llvm base] | ratio<br>[with the relaxed LoopCounter] |  SpeedUp  | ratio<br>[with this PR] |  SpeedUp  |
  | ------------------------------------------------------------ | :------------------: | :-------------------------------------: | :-------: | :---------------------: | :-------: |
  | 623.xalancbmk_s                                              |        12.86         |                  13.01                  | **1.01x** |          13.01          | **1.01x** |
  | 648.exchange2_s                                              |        17.38         |                  18.87                  | **1.09x** |          18.88          | **1.09x** |
  | 619.lbm_s                                                    |         7.95         |                  8.08                   | **1.02x** |          8.13           | **1.02x** |
  | 621.wrf_s                                                    |        11.04         |                  11.02                  |   1.00x   |          11.19          | **1.01x** |
  | 628.pop2_s                                                   |         3.75         |                  3.75                   |   1.00x   |          3.78           | **1.01x** |
  | 654.roms_s                                                   |        10.74         |                  10.70                  |   1.00x   |          10.95          | **1.02x** |

  * \[with the relaxed LoopCounter]: Relax the stride restriction on the LoopCounter.

  * \[with this PR]: Relax stride constraints and eliminate the Pure LoopCounter.

  * Platform: X86-intel (I9-11900K, L1-cache 384KB, L2-cache 4MB, L3-cache 16MB, cache line 64B).

**Authors**

  The [XSCC compiler team](https://github.com/orgs/OpenXiangShan/teams/xscc) developed this implementation.

>From 4aedb2213373bc99919f51b321b0201b99e9f0bd Mon Sep 17 00:00:00 2001
From: bernadate <3171290993 at qq.com>
Date: Thu, 3 Jul 2025 17:59:44 +0800
Subject: [PATCH]  [IndVarSimplify] Eliminated Pure LoopCounter

Co-Authored-By: ict-ql <168183727+ict-ql at users.noreply.github.com>
Co-Authored-By: Chyaka <52224511+liliumshade at users.noreply.github.com>
Co-Authored-By: Lin Wang <wanglulin at ict.ac.cn>
---
 llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | 171 +++++++++++++++++-
 .../Transforms/IndVarSimplify/X86/pr24356.ll  |   8 +-
 .../check-loop-counter-stride.ll              |  39 ++++
 .../Transforms/IndVarSimplify/drop-exact.ll   |   8 +-
 .../IndVarSimplify/eliminate-comparison.ll    |   7 +-
 .../eliminate-pure-loop-counter.ll            |  71 ++++++++
 llvm/test/Transforms/IndVarSimplify/lftr.ll   |   2 +-
 7 files changed, 284 insertions(+), 22 deletions(-)
 create mode 100644 llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll
 create mode 100644 llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll

diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index 334c911191cb8..2bf4e7be70358 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -122,6 +122,10 @@ static cl::opt<bool>
 AllowIVWidening("indvars-widen-indvars", cl::Hidden, cl::init(true),
                 cl::desc("Allow widening of indvars to eliminate s/zext"));
 
+static cl::opt<bool> EnablePureLoopCounterElimination(
+    "enable-pure-loop-counter-elimination", cl::Hidden, cl::init(false),
+    cl::desc("Enable Pure LoopCounter elimination."));
+
 namespace {
 
 class IndVarSimplify {
@@ -160,6 +164,13 @@ class IndVarSimplify {
 
   bool sinkUnusedInvariants(Loop *L);
 
+  PHINode *findCandidateLoopCounter(Loop *L, PHINode *LoopCounter,
+                                    const SCEV *ExitCount, ICmpInst *Cond,
+                                    ScalarEvolution *SE);
+
+  bool tryToEliminatePureLoopCounter(Loop *L, ScalarEvolution *SE,
+                                     SCEVExpander &Rewriter, LoopInfo *LI);
+
 public:
   IndVarSimplify(LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
                  const DataLayout &DL, TargetLibraryInfo *TLI,
@@ -798,7 +809,7 @@ static bool hasConcreteDef(Value *V) {
 
 /// Return true if the given phi is a "counter" in L.  A counter is an
 /// add recurance (of integer or pointer type) with an arbitrary start, and a
-/// step of 1.  Note that L must have exactly one latch.
+/// step of 1/-1.  Note that L must have exactly one latch.
 static bool isLoopCounter(PHINode* Phi, Loop *L,
                           ScalarEvolution *SE) {
   assert(Phi->getParent() == L->getHeader());
@@ -808,7 +819,13 @@ static bool isLoopCounter(PHINode* Phi, Loop *L,
     return false;
 
   const SCEV *S = SE->getSCEV(Phi);
-  if (!match(S, m_scev_AffineAddRec(m_SCEV(), m_scev_One(), m_SpecificLoop(L))))
+  const SCEVConstant *Step;
+  if (!match(S, m_scev_AffineAddRec(m_SCEV(), m_SCEVConstant(Step),
+                                    m_SpecificLoop(L))))
+    return false;
+  int64_t StepVal = Step->getValue()->getSExtValue();
+  // Require that the loop counter stride can only be 1 or -1
+  if (StepVal != 1 && StepVal != -1)
     return false;
 
   int LatchIdx = Phi->getBasicBlockIndex(L->getLoopLatch());
@@ -910,7 +927,8 @@ static Value *genLoopLimit(PHINode *IndVar, BasicBlock *ExitingBB,
   assert(isLoopCounter(IndVar, L, SE));
   assert(ExitCount->getType()->isIntegerTy() && "exit count must be integer");
   const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IndVar));
-  assert(AR->getStepRecurrence(*SE)->isOne() && "only handles unit stride");
+  const SCEV *StepAbs = SE->getAbsExpr(AR->getStepRecurrence(*SE), true);
+  assert(StepAbs->isOne() && "only handles unit stride");
 
   // For integer IVs, truncate the IV before computing the limit unless we
   // know apriori that the limit must be a constant when evaluated in the
@@ -1870,6 +1888,134 @@ bool IndVarSimplify::predicateLoopExits(Loop *L, SCEVExpander &Rewriter) {
   return Changed;
 }
 
+/// Look for a PHI node in the loop header to serve as the new LoopCounter.
+/// The requirements are:
+///   1. It should be a induction variable;
+///   2. It must meet the criteria of isLoopCounter();
+///   3. Its type should be integer with a width equal to ExitCount;
+///   4. It is not the Pure LoopCounter itself.
+///
+PHINode *IndVarSimplify::findCandidateLoopCounter(Loop *L,
+                                                  PHINode *PureLoopCounter,
+                                                  const SCEV *ExitCount,
+                                                  ICmpInst *Cond,
+                                                  ScalarEvolution *SE) {
+
+  PHINode *CandidateCounter = nullptr;
+  unsigned ExitCountWidth = SE->getTypeSizeInBits(ExitCount->getType());
+
+  // Look for another IV that can serve as a LoopCounter.
+  for (PHINode &AuxPHI : L->getHeader()->phis()) {
+
+    unsigned PhiWidth;
+
+    // Require that the candidate IV is of integer type
+    if (AuxPHI.getType()->isIntegerTy())
+      PhiWidth = SE->getTypeSizeInBits(AuxPHI.getType());
+    else
+      continue;
+
+    if (L->isAuxiliaryInductionVariable(AuxPHI, *SE) &&
+        &AuxPHI != PureLoopCounter && isLoopCounter(&AuxPHI, L, SE) &&
+        // For type safety and avoid trunc/ext overhead.
+        PhiWidth == ExitCountWidth && DL.isLegalInteger(PhiWidth) &&
+        !isAlmostDeadIV(&AuxPHI, L->getLoopLatch(), Cond)) {
+
+      CandidateCounter = &AuxPHI;
+      break;
+    }
+  }
+
+  return CandidateCounter;
+}
+
+/// Define a PHI node as the Pure LoopCounter if it meets these three
+/// conditions:
+/// 1. It is used for loop termination testing;
+/// 2. It has at most two users: PostIncOrDec and CMP (optional).
+/// 3. Its type is of integer type.
+///
+/// When the counter selected for LFTR is the Pure LoopCounter,
+/// we try to find aother suitable IV to take over its counting
+/// function, thereby eliminating the Pure LoopCounter. For example,
+/// loop:
+///   %9  = phi i64 [ %15, %loop ], [ %8, %entry ]  ; the Pure LoopCounter
+///   %10 = phi i64 [ %14, %loop ], [ %4, %entry ]  ; another inductioin variable
+///   ...                 ; Here, %10 is used to calculate array indices or something. 
+///   %14 = add i64 %10, 1 
+///   %15 = add nsw i64 %9, -1 
+///   %exitcond = icmp ugt i64 %9, 1
+/// is converted into
+/// loop:
+///   %10 = phi i64 [ %14, %loop ], [ %4, %entry ]
+///   ...
+///   %14 = add i64 %10, 1
+///   %exitcond = icmp ne i64 %14, %IV_END
+///
+bool IndVarSimplify::tryToEliminatePureLoopCounter(Loop *L, ScalarEvolution *SE,
+                                                   SCEVExpander &Rewriter,
+                                                   LoopInfo *LI) {
+  bool Changed = false;
+
+  SmallVector<BasicBlock *, 16> ExitingBlocks;
+  L->getExitingBlocks(ExitingBlocks);
+  if (ExitingBlocks.empty())
+    return false;
+
+  for (BasicBlock *ExitingBB : ExitingBlocks) {
+    // Can't handle non-branch yet.
+    if (!isa<BranchInst>(ExitingBB->getTerminator()))
+      continue;
+    BranchInst *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());
+    if (!BI || BI->isUnconditional())
+      continue;
+
+    // Right now, we only handle integer test conditions.
+    ICmpInst *Cond = dyn_cast<ICmpInst>(BI->getCondition());
+    if (!Cond)
+      continue;
+
+    // If our exitting block exits multiple loops, we can only rewrite the
+    // innermost one.  Otherwise, we're changing how many times the innermost
+    // loop runs before it exits.
+    if (LI->getLoopFor(ExitingBB) != L)
+      continue;
+
+    // Get the number of times the back edge is executed when exiting the basic
+    // block. if it can't be calculated, skip this loop.
+    const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
+    if (isa<SCEVCouldNotCompute>(ExitCount) || ExitCount->isZero() ||
+        !Rewriter.isSafeToExpand(ExitCount))
+      continue;
+
+    // Let FindLoopCounter() select the LoopCounter first.
+    PHINode *FoundCounter = FindLoopCounter(L, ExitingBB, ExitCount, SE, DT);
+    if (!FoundCounter)
+      continue;
+
+    Value *PostIncOrDec =
+        FoundCounter->getIncomingValueForBlock(L->getLoopLatch());
+    // If the Pure LoopCounter is selected
+    if ((isLoopExitTestBasedOn(FoundCounter, ExitingBB) ||
+         isLoopExitTestBasedOn(PostIncOrDec, ExitingBB)) &&
+        // Checks if the FoundCounter is only used for loop counting.
+        isAlmostDeadIV(FoundCounter, L->getLoopLatch(), Cond)) {
+
+      // Try to find aother suitable IV as the new LoopCounter
+      PHINode *CandidateCounter =
+          findCandidateLoopCounter(L, FoundCounter, ExitCount, Cond, SE);
+      if (!CandidateCounter)
+        continue;
+
+      // Let this newly found candidate LoopCounter perform LFTR
+      Changed |= linearFunctionTestReplace(L, ExitingBB, ExitCount,
+                                           CandidateCounter, Rewriter);
+    }
+  }
+
+  return Changed;
+}
+
 //===----------------------------------------------------------------------===//
 //  IndVarSimplify driver. Manage several subpasses of IV simplification.
 //===----------------------------------------------------------------------===//
@@ -1983,11 +2129,19 @@ bool IndVarSimplify::run(Loop *L) {
       if (!IndVar)
         continue;
 
-      // Avoid high cost expansions.  Note: This heuristic is questionable in
-      // that our definition of "high cost" is not exactly principled.
-      if (Rewriter.isHighCostExpansion(ExitCount, L, SCEVCheapExpansionBudget,
-                                       TTI, PreHeader->getTerminator()))
-        continue;
+      bool LocalChanged = false;
+      // Enable a more aggressive 'pure loop counter elimination' without
+      // considering the extension cost of ExitCount
+      if (EnablePureLoopCounterElimination)
+        LocalChanged = tryToEliminatePureLoopCounter(L, SE, Rewriter, LI);
+      Changed |= LocalChanged;
+
+      if (!LocalChanged) {
+        // Avoid high cost expansions.  Note: This heuristic is questionable in
+        // that our definition of "high cost" is not exactly principled.
+        if (Rewriter.isHighCostExpansion(ExitCount, L, SCEVCheapExpansionBudget,
+                                         TTI, PreHeader->getTerminator()))
+          continue;
 
       if (!Rewriter.isSafeToExpand(ExitCount))
         continue;
@@ -1995,6 +2149,7 @@ bool IndVarSimplify::run(Loop *L) {
       Changed |= linearFunctionTestReplace(L, ExitingBB,
                                            ExitCount, IndVar,
                                            Rewriter);
+      }
     }
   }
   // Clear the rewriter cache, because values that are in the rewriter's cache
diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll
index f2d938f6452d3..05575c58bff6d 100644
--- a/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll
+++ b/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll
@@ -14,11 +14,11 @@ bb:
 bb4.preheader:                                    ; preds = %bb, %bb16
 ; CHECK-LABEL:  bb4.preheader:
   %b.03 = phi i8 [ 0, %bb ], [ %tmp17, %bb16 ]
-; CHECK: %tmp9 = icmp ugt i8 %b.03, 1
-; CHECK-NOT: %tmp9 = icmp ugt i8 0, 1
+; CHECK: %exitcond = icmp eq i8 %b.03, -1
+; CHECK-NOT: %exitcond = icmp ugt i8 0, 1
 
-  %tmp9 = icmp ugt i8 %b.03, 1
-  br i1 %tmp9, label %bb4.preheader.bb18.loopexit.split_crit_edge, label %bb4.preheader.bb4.preheader.split_crit_edge
+  %exitcond = icmp eq i8 %b.03, -1
+  br i1 %exitcond, label %bb4.preheader.bb18.loopexit.split_crit_edge, label %bb4.preheader.bb4.preheader.split_crit_edge
 
 bb4.preheader.bb4.preheader.split_crit_edge:      ; preds = %bb4.preheader
   br label %bb4.preheader.split
diff --git a/llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll b/llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll
new file mode 100644
index 0000000000000..1c81ae3fc08e8
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll
@@ -0,0 +1,39 @@
+; Test the case where the LoopCounter's stride equals -1.
+; RUN: opt -S -passes=indvars  < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+
+define void @check_step_minus_one(ptr nocapture readonly %0)  {
+; CHECK-LABEL: define void @check_step_minus_one(ptr readonly captures(none) %0) {
+; CHECK:       entry:
+; CHECK-NEXT:  br label [[loop:.*]]
+; CHECK:       loop:
+; CHECK-NEXT:  [[IV:%.*]] = phi i64 [ 31, [[entry:%.*]] ], [ [[PostDec:%.*]], [[loop:%.*]] ]
+; CHECK-NEXT:  [[GEP:%.*]] = getelementptr inbounds i32, ptr %0, i64 [[IV]]
+; CHECK-NEXT:  [[LOAD:%.*]] = load i32, ptr [[GEP]], align 4
+; CHECK-NEXT:  [[ADD:%.*]] = add nsw i32 [[LOAD]], 1
+; CHECK-NEXT:  store i32 [[ADD]], ptr [[GEP]], align 4
+; CHECK-NEXT:  [[PostDec:%.*]] = add nsw i64 [[IV]], -1
+; CHECK-NEXT:  [[CMP:%.*]] = icmp ne i64 [[PostDec]], 6
+; CHECK-NEXT:  br i1 [[CMP]], label [[loop:.*]], label [[end:.*]]
+; CHECK:       end:
+; CHECK-NEXT:    ret void
+;
+entry:                  
+  br label %loop
+
+loop:                                           
+  %1 = phi i64 [ 31, %entry ], [ %6, %loop ]
+  %3 = getelementptr inbounds i32, ptr %0, i64 %1
+  %4 = load i32, ptr %3, align 4
+  %5 = add nsw i32 %4, 1
+  store i32 %5, ptr %3, align 4
+  %6 = add nsw i64 %1, -1
+  %7 = mul nsw i64 %6, %6
+  %8 = icmp samesign ugt i64 %7, 48
+  br i1 %8, label %loop, label %end
+
+end:                                      
+  ret void
+}
+
diff --git a/llvm/test/Transforms/IndVarSimplify/drop-exact.ll b/llvm/test/Transforms/IndVarSimplify/drop-exact.ll
index fb8027df74ee7..2a60c9d73a021 100644
--- a/llvm/test/Transforms/IndVarSimplify/drop-exact.ll
+++ b/llvm/test/Transforms/IndVarSimplify/drop-exact.ll
@@ -13,7 +13,6 @@ define void @drop_exact(ptr %p, ptr %p1) {
 ; CHECK-NEXT:    ret void
 ; CHECK:       bb12:
 ; CHECK-NEXT:    [[TMP13:%.*]] = phi i32 [ -47436, [[BB:%.*]] ], [ [[TMP15:%.*]], [[BB12]] ]
-; CHECK-NEXT:    [[TMP14:%.*]] = phi i32 [ 0, [[BB]] ], [ [[TMP42:%.*]], [[BB12]] ]
 ; CHECK-NEXT:    [[TMP15]] = add nsw i32 [[TMP13]], -1
 ; CHECK-NEXT:    [[TMP16:%.*]] = shl i32 [[TMP15]], 1
 ; CHECK-NEXT:    [[TMP17:%.*]] = sub nsw i32 42831, [[TMP16]]
@@ -23,8 +22,7 @@ define void @drop_exact(ptr %p, ptr %p1) {
 ; CHECK-NEXT:    store i32 [[TMP22]], ptr [[P:%.*]], align 4
 ; CHECK-NEXT:    [[TMP26:%.*]] = zext i32 [[TMP20]] to i64
 ; CHECK-NEXT:    store i64 [[TMP26]], ptr [[P1:%.*]], align 4
-; CHECK-NEXT:    [[TMP42]] = add nuw nsw i32 [[TMP14]], 1
-; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i32 [[TMP42]], 719
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i32 [[TMP15]], -48155
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[BB7:%.*]], label [[BB12]]
 ;
 bb:
@@ -60,7 +58,6 @@ define void @dont_drop_exact(ptr %p, ptr %p1) {
 ; CHECK-NEXT:    ret void
 ; CHECK:       bb12:
 ; CHECK-NEXT:    [[TMP13:%.*]] = phi i32 [ -47436, [[BB:%.*]] ], [ [[TMP15:%.*]], [[BB12]] ]
-; CHECK-NEXT:    [[TMP14:%.*]] = phi i32 [ 0, [[BB]] ], [ [[TMP42:%.*]], [[BB12]] ]
 ; CHECK-NEXT:    [[TMP15]] = add nsw i32 [[TMP13]], -1
 ; CHECK-NEXT:    [[TMP16:%.*]] = shl i32 [[TMP15]], 1
 ; CHECK-NEXT:    [[TMP17:%.*]] = sub nsw i32 42831, [[TMP16]]
@@ -70,8 +67,7 @@ define void @dont_drop_exact(ptr %p, ptr %p1) {
 ; CHECK-NEXT:    store i32 [[TMP22]], ptr [[P:%.*]], align 4
 ; CHECK-NEXT:    [[TMP26:%.*]] = zext i32 [[TMP20]] to i64
 ; CHECK-NEXT:    store i64 [[TMP26]], ptr [[P1:%.*]], align 4
-; CHECK-NEXT:    [[TMP42]] = add nuw nsw i32 [[TMP14]], 1
-; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i32 [[TMP42]], 719
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i32 [[TMP15]], -48155
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[BB7:%.*]], label [[BB12]]
 ;
 bb:
diff --git a/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll b/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll
index 08f9856ac603d..7f889cc6f6d0b 100644
--- a/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll
+++ b/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll
@@ -68,6 +68,7 @@ return:
 define i32 @_ZNK4llvm5APInt3ultERKS0_(i32 %tmp2.i1, ptr %tmp65, ptr %tmp73, ptr %tmp82, ptr %tmp90) {
 ; CHECK-LABEL: @_ZNK4llvm5APInt3ultERKS0_(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SMIN:%.*]]= call i32 @llvm.smin.i32(i32 %tmp2.i1, i32 -1)
 ; CHECK-NEXT:    br label [[BB18:%.*]]
 ; CHECK:       bb13:
 ; CHECK-NEXT:    [[TMP66:%.*]] = load ptr, ptr [[TMP65:%.*]], align 4
@@ -88,11 +89,11 @@ define i32 @_ZNK4llvm5APInt3ultERKS0_(i32 %tmp2.i1, ptr %tmp65, ptr %tmp73, ptr
 ; CHECK-NEXT:    [[TMP95:%.*]] = icmp ult i64 [[TMP86]], [[TMP94]]
 ; CHECK-NEXT:    br i1 [[TMP95]], label [[BB20_LOOPEXIT]], label [[BB17:%.*]]
 ; CHECK:       bb17:
-; CHECK-NEXT:    [[TMP97:%.*]] = add nsw i32 [[I]], -1
+; CHECK-NEXT:    [[TMP97:%.*]] = add i32 [[I]], -1
 ; CHECK-NEXT:    br label [[BB18]]
 ; CHECK:       bb18:
 ; CHECK-NEXT:    [[I]] = phi i32 [ [[TMP2_I1:%.*]], [[ENTRY:%.*]] ], [ [[TMP97]], [[BB17]] ]
-; CHECK-NEXT:    [[TMP99:%.*]] = icmp sgt i32 [[I]], -1
+; CHECK-NEXT:    [[TMP99:%.*]] = icmp ne i32 [[I]], [[SMIN:%.*]]
 ; CHECK-NEXT:    br i1 [[TMP99]], label [[BB13:%.*]], label [[BB20_LOOPEXIT]]
 ; CHECK:       bb20.loopexit:
 ; CHECK-NEXT:    [[TMP_0_PH:%.*]] = phi i32 [ 0, [[BB18]] ], [ 1, [[BB15]] ], [ 0, [[BB13]] ]
@@ -917,7 +918,7 @@ define void @func_24(ptr %init.ptr) {
 ; CHECK-NEXT:    br i1 true, label [[BE]], label [[LEAVE_LOOPEXIT:%.*]]
 ; CHECK:       be:
 ; CHECK-NEXT:    call void @side_effect()
-; CHECK-NEXT:    [[BE_COND:%.*]] = icmp sgt i32 [[IV_DEC]], 4
+; CHECK-NEXT:    [[BE_COND:%.*]] = icmp ne i32 [[IV_DEC]], 4
 ; CHECK-NEXT:    br i1 [[BE_COND]], label [[LOOP]], label [[LEAVE_LOOPEXIT]]
 ; CHECK:       leave.loopexit:
 ; CHECK-NEXT:    br label [[LEAVE]]
diff --git a/llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll b/llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll
new file mode 100644
index 0000000000000..53cf5303d5dc6
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll
@@ -0,0 +1,71 @@
+; Test indvars for eliminating the Pure LoopCounter.
+; RUN: opt -S -passes=indvars -enable-pure-loop-counter-elimination < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+
+define void @pure_loop_counter_elimination(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr nocapture readonly %2)  {
+; CHECK-LABEL: define void @pure_loop_counter_elimination(ptr readonly captures(none) %0, ptr readonly captures(none) %1, ptr readonly captures(none) %2) {
+; CHECK-NEXT:  [[entry:.*]]:
+; CHECK-NEXT:  [[IS:%.*]] = load i32, ptr %1, align 4
+; CHECK-NEXT:  [[IS_SEXT:%.*]] = sext i32 [[IS]] to i64
+; CHECK-NEXT:  [[IE:%.*]] = load i32, ptr %2, align 4
+; CHECK-NEXT:  [[IE_SEXT:%.*]] = sext i32 [[IE]] to i64
+; CHECK-NEXT:  [[IE_IS:%.*]] = sub i64 [[IE_SEXT]], [[IS_SEXT]]
+; CHECK-NEXT:  [[IE_IS_1:%.*]] = add nsw i64 [[IE_IS]], 1
+; CHECK-NEXT:  br label %preheader
+
+; CHECK:       [[preheader:.*]]:                                        
+; CHECK-NEXT:  [[Check:%.*]] = icmp sgt i64 [[IE_SEXT]], [[IS_SEXT]]
+; CHECK-NEXT:  br i1 [[Check]], label %loop.preheader, label %end
+
+; CHECK:       [[loop_preheader:.*]]:                            
+; CHECK-NEXT:  [[IE_Plus_2:%.*]] = add i64 [[IE_SEXT]], 2
+; CHECK-NEXT:  [[MIN:%.*]] = call i64 @llvm.umin.i64(i64 [[IE_IS_1]], i64 1)
+; CHECK-NEXT:  [[TEMP:%.*]] = sub i64 [[IE_Plus_2]], [[MIN]]
+; CHECK-NEXT:  br label %loop
+
+; CHECK:       [[loop:.*]]:                                             
+; CHECK-NEXT:  [[IV:%.*]] = phi i64 [ %15, %loop ], [ [[IS_SEXT]], %loop.preheader ]
+; CHECK-NEXT:  [[ADDR:%.*]] = getelementptr float, ptr %0, i64 [[IV]]
+; CHECK-NEXT:  [[DATA:%.*]] = load float, ptr [[ADDR]], align 4
+; CHECK-NEXT:  [[DATA_PUS_1:%.*]] = fadd fast float [[DATA]], 1.000000e+00
+; CHECK-NEXT:  store float [[DATA_PUS_1]], ptr [[ADDR]], align 4
+; CHECK-NEXT:  [[PostAdd:%.*]] = add i64 [[IV]], 1
+; CHECK-NEXT:  [[ExitCond:%.*]] = icmp ne i64 [[PostAdd]], [[TEMP]]
+; CHECK-NEXT:  br i1 [[ExitCond]], label %loop, label %end.loopexit
+
+; CHECK:       [[end_loopexit:.*]]:                                   
+; CHECK-NEXT:  br label %end
+
+; CHECK:       [[end:.*]]:                                              
+; CHECK-NEXT:   ret void
+; CHECK-LABEL: }
+
+entry: 
+  %3 = load i32, ptr %1, align 4
+  %4 = sext i32 %3 to i64
+  %5 = load i32, ptr %2, align 4
+  %6 = sext i32 %5 to i64
+  %7 = sub nsw i64 %6, %4
+  %8 = add nsw i64 %7, 1
+  br label %preheader
+
+preheader:
+  %cmp = icmp sgt i64 %6, %4
+  br i1 %cmp, label %loop, label %end
+
+loop:                                           
+  %9  = phi i64 [ %15, %loop ], [ %8, %preheader ]
+  %10 = phi i64 [ %14, %loop ], [ %4, %preheader ]
+  %11 = getelementptr float, ptr %0, i64 %10
+  %12 = load float, ptr %11, align 4
+  %13 = fadd fast float %12, 1.000000e+00
+  store float %13, ptr %11, align 4
+  %14 = add i64 %10, 1
+  %15 = add nsw i64 %9, -1
+  %16 = icmp ugt i64 %9, 1
+  br i1 %16, label %loop, label %end
+
+end:                                      
+  ret void
+}
\ No newline at end of file
diff --git a/llvm/test/Transforms/IndVarSimplify/lftr.ll b/llvm/test/Transforms/IndVarSimplify/lftr.ll
index 5ee62ba357ab6..f321daa00a953 100644
--- a/llvm/test/Transforms/IndVarSimplify/lftr.ll
+++ b/llvm/test/Transforms/IndVarSimplify/lftr.ll
@@ -43,7 +43,7 @@ define i32 @pre_to_post_sub() {
 ; CHECK-NEXT:    [[I:%.*]] = phi i32 [ 1000, [[ENTRY:%.*]] ], [ [[I_NEXT:%.*]], [[LOOP]] ]
 ; CHECK-NEXT:    [[I_NEXT]] = sub nsw i32 [[I]], 1
 ; CHECK-NEXT:    store i32 [[I]], ptr @A, align 4
-; CHECK-NEXT:    [[C:%.*]] = icmp samesign ugt i32 [[I]], 0
+; CHECK-NEXT:    [[C:%.*]] = icmp ne i32 [[I_NEXT]], -1
 ; CHECK-NEXT:    br i1 [[C]], label [[LOOP]], label [[LOOPEXIT:%.*]]
 ; CHECK:       loopexit:
 ; CHECK-NEXT:    ret i32 0