[llvm] [IndVarSimplify] Eliminated Pure LoopCounter (PR #146845)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 3 03:10:36 PDT 2025
https://github.com/buggfg created https://github.com/llvm/llvm-project/pull/146845
This patch accomplishes two main tasks:
- It relaxes the stride restriction on the LoopCounter to provide additional optimization opportunities;
- It eliminates the Pure LoopCounter, thereby unlocking potential for further optimizations, such as LoopUnroll.
**Key changes**:
- Relax the stride restriction on the LoopCounter
Usually the **loop's iterations are counted** by an integer-valued variable that proceeds **upward (or downward) by a constant** amount with each iteration[1][2]. However, the current design requires the loop counter to have a unit stride (+1) and does not support -1, which limits optimization potential.
Without disrupting the original design, we have relaxed the stride restrictions in both `isLoopCounter()` and `genLoopLimit()` to allow loop counters with strides of either +1 or -1. This enhancement, combined with LLVM’s existing infrastructure, enables support for common countdown loop patterns—for example, transforming a loop such as `for (i = 31; i * i > 48; i--)` into `for (i = 31; i > 6; i--)`—thereby improving the pass’s versatility and optimization coverage.
[1] Steven S. Muchnick. 1998. Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[2] [Loops - Using and Porting GNU Fortran](https://gcc.gnu.org/onlinedocs/gcc-3.3.6/g77/Loops.html)
- Eliminate the Pure LoopCounter
We define a PHI node as the Pure LoopCounter if it meets these three conditions: (1) it is used for loop termination testing; (2) it has at most two users, PostIncOrDec and Cmp (optional). (3) its type is of integer type.
This optimization is based on a key insight: the Pure LoopCounter can be replaced by another induction variable that satisfies the following conditions: (1) it meets the criteria defined by `isLoopCounter()`; (2) it is of integer type with a width equal to that of `ExitCount`; and (3) it is not the Pure LoopCounter itself.
Eliminating the Pure LoopCounter can expose additional opportunities for downstream optimizations, such as loop unroll. For example
```fortran
DO j = jms, jme
DO k = kms, kme
DO i = ims, ime
u(i,k,j) = u(i,k,j) + dts * ru_tend(i,k,j)
ENDDO
ENDDO
ENDDO
```
This common example can successfully perform four loop unrollings after eliminating the pure loop counter, resulting in a **36%** performance improvement.
**Key details**
- Only active when `-enable-pure-loop-counter-elimination` is enabled(disabled by default)
**Tests**
- Two LIT tests added to confirm the effectiveness of the patch.
- Four existing LIT tests modified because relaxing the stride restrictions on LoopCounters provides more optimization opportunities for LFTR, such as simplifying exit comparisons from `sgt/slt` to `eq/ne`.
- We further verified the correctness of the transformation and evaluated its performance using the SPEC CPU2006 and SPEC CPU2017 benchmark suites.
- The results of the SPEC CPU2017 benchmark are as follows:
| bench | ratio<br>[llvm base] | ratio<br>[with the relaxed LoopCounter] | SpeedUp | ratio<br>[with this PR] | SpeedUp |
| ------------------------------------------------------------ | :------------------: | :-------------------------------------: | :-------: | :---------------------: | :-------: |
| 623.xalancbmk_s | 12.86 | 13.01 | **1.01x** | 13.01 | **1.01x** |
| 648.exchange2_s | 17.38 | 18.87 | **1.09x** | 18.88 | **1.09x** |
| 619.lbm_s | 7.95 | 8.08 | **1.02x** | 8.13 | **1.02x** |
| 621.wrf_s | 11.04 | 11.02 | 1.00x | 11.19 | **1.01x** |
| 628.pop2_s | 3.75 | 3.75 | 1.00x | 3.78 | **1.01x** |
| 654.roms_s | 10.74 | 10.70 | 1.00x | 10.95 | **1.02x** |
* \[with the relaxed LoopCounter]: Relax the stride restriction on the LoopCounter.
* \[with this PR]: Relax stride constraints and eliminate the Pure LoopCounter.
* Platform: X86-intel (I9-11900K, L1-cache 384KB, L2-cache 4MB, L3-cache 16MB, cache line 64B).
**Authors**
The [XSCC compiler team](https://github.com/orgs/OpenXiangShan/teams/xscc) developed this implementation.
>From 4aedb2213373bc99919f51b321b0201b99e9f0bd Mon Sep 17 00:00:00 2001
From: bernadate <3171290993 at qq.com>
Date: Thu, 3 Jul 2025 17:59:44 +0800
Subject: [PATCH] [IndVarSimplify] Eliminated Pure LoopCounter
Co-Authored-By: ict-ql <168183727+ict-ql at users.noreply.github.com>
Co-Authored-By: Chyaka <52224511+liliumshade at users.noreply.github.com>
Co-Authored-By: Lin Wang <wanglulin at ict.ac.cn>
---
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | 171 +++++++++++++++++-
.../Transforms/IndVarSimplify/X86/pr24356.ll | 8 +-
.../check-loop-counter-stride.ll | 39 ++++
.../Transforms/IndVarSimplify/drop-exact.ll | 8 +-
.../IndVarSimplify/eliminate-comparison.ll | 7 +-
.../eliminate-pure-loop-counter.ll | 71 ++++++++
llvm/test/Transforms/IndVarSimplify/lftr.ll | 2 +-
7 files changed, 284 insertions(+), 22 deletions(-)
create mode 100644 llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll
create mode 100644 llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index 334c911191cb8..2bf4e7be70358 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -122,6 +122,10 @@ static cl::opt<bool>
AllowIVWidening("indvars-widen-indvars", cl::Hidden, cl::init(true),
cl::desc("Allow widening of indvars to eliminate s/zext"));
+static cl::opt<bool> EnablePureLoopCounterElimination(
+ "enable-pure-loop-counter-elimination", cl::Hidden, cl::init(false),
+ cl::desc("Enable Pure LoopCounter elimination."));
+
namespace {
class IndVarSimplify {
@@ -160,6 +164,13 @@ class IndVarSimplify {
bool sinkUnusedInvariants(Loop *L);
+ PHINode *findCandidateLoopCounter(Loop *L, PHINode *LoopCounter,
+ const SCEV *ExitCount, ICmpInst *Cond,
+ ScalarEvolution *SE);
+
+ bool tryToEliminatePureLoopCounter(Loop *L, ScalarEvolution *SE,
+ SCEVExpander &Rewriter, LoopInfo *LI);
+
public:
IndVarSimplify(LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
const DataLayout &DL, TargetLibraryInfo *TLI,
@@ -798,7 +809,7 @@ static bool hasConcreteDef(Value *V) {
/// Return true if the given phi is a "counter" in L. A counter is an
/// add recurance (of integer or pointer type) with an arbitrary start, and a
-/// step of 1. Note that L must have exactly one latch.
+/// step of 1/-1. Note that L must have exactly one latch.
static bool isLoopCounter(PHINode* Phi, Loop *L,
ScalarEvolution *SE) {
assert(Phi->getParent() == L->getHeader());
@@ -808,7 +819,13 @@ static bool isLoopCounter(PHINode* Phi, Loop *L,
return false;
const SCEV *S = SE->getSCEV(Phi);
- if (!match(S, m_scev_AffineAddRec(m_SCEV(), m_scev_One(), m_SpecificLoop(L))))
+ const SCEVConstant *Step;
+ if (!match(S, m_scev_AffineAddRec(m_SCEV(), m_SCEVConstant(Step),
+ m_SpecificLoop(L))))
+ return false;
+ int64_t StepVal = Step->getValue()->getSExtValue();
+ // Require that the loop counter stride can only be 1 or -1
+ if (StepVal != 1 && StepVal != -1)
return false;
int LatchIdx = Phi->getBasicBlockIndex(L->getLoopLatch());
@@ -910,7 +927,8 @@ static Value *genLoopLimit(PHINode *IndVar, BasicBlock *ExitingBB,
assert(isLoopCounter(IndVar, L, SE));
assert(ExitCount->getType()->isIntegerTy() && "exit count must be integer");
const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IndVar));
- assert(AR->getStepRecurrence(*SE)->isOne() && "only handles unit stride");
+ const SCEV *StepAbs = SE->getAbsExpr(AR->getStepRecurrence(*SE), true);
+ assert(StepAbs->isOne() && "only handles unit stride");
// For integer IVs, truncate the IV before computing the limit unless we
// know apriori that the limit must be a constant when evaluated in the
@@ -1870,6 +1888,134 @@ bool IndVarSimplify::predicateLoopExits(Loop *L, SCEVExpander &Rewriter) {
return Changed;
}
+/// Look for a PHI node in the loop header to serve as the new LoopCounter.
+/// The requirements are:
+/// 1. It should be a induction variable;
+/// 2. It must meet the criteria of isLoopCounter();
+/// 3. Its type should be integer with a width equal to ExitCount;
+/// 4. It is not the Pure LoopCounter itself.
+///
+PHINode *IndVarSimplify::findCandidateLoopCounter(Loop *L,
+ PHINode *PureLoopCounter,
+ const SCEV *ExitCount,
+ ICmpInst *Cond,
+ ScalarEvolution *SE) {
+
+ PHINode *CandidateCounter = nullptr;
+ unsigned ExitCountWidth = SE->getTypeSizeInBits(ExitCount->getType());
+
+ // Look for another IV that can serve as a LoopCounter.
+ for (PHINode &AuxPHI : L->getHeader()->phis()) {
+
+ unsigned PhiWidth;
+
+ // Require that the candidate IV is of integer type
+ if (AuxPHI.getType()->isIntegerTy())
+ PhiWidth = SE->getTypeSizeInBits(AuxPHI.getType());
+ else
+ continue;
+
+ if (L->isAuxiliaryInductionVariable(AuxPHI, *SE) &&
+ &AuxPHI != PureLoopCounter && isLoopCounter(&AuxPHI, L, SE) &&
+ // For type safety and avoid trunc/ext overhead.
+ PhiWidth == ExitCountWidth && DL.isLegalInteger(PhiWidth) &&
+ !isAlmostDeadIV(&AuxPHI, L->getLoopLatch(), Cond)) {
+
+ CandidateCounter = &AuxPHI;
+ break;
+ }
+ }
+
+ return CandidateCounter;
+}
+
+/// Define a PHI node as the Pure LoopCounter if it meets these three
+/// conditions:
+/// 1. It is used for loop termination testing;
+/// 2. It has at most two users: PostIncOrDec and CMP (optional).
+/// 3. Its type is of integer type.
+///
+/// When the counter selected for LFTR is the Pure LoopCounter,
+/// we try to find aother suitable IV to take over its counting
+/// function, thereby eliminating the Pure LoopCounter. For example,
+/// loop:
+/// %9 = phi i64 [ %15, %loop ], [ %8, %entry ] ; the Pure LoopCounter
+/// %10 = phi i64 [ %14, %loop ], [ %4, %entry ] ; another inductioin variable
+/// ... ; Here, %10 is used to calculate array indices or something.
+/// %14 = add i64 %10, 1
+/// %15 = add nsw i64 %9, -1
+/// %exitcond = icmp ugt i64 %9, 1
+/// is converted into
+/// loop:
+/// %10 = phi i64 [ %14, %loop ], [ %4, %entry ]
+/// ...
+/// %14 = add i64 %10, 1
+/// %exitcond = icmp ne i64 %14, %IV_END
+///
+bool IndVarSimplify::tryToEliminatePureLoopCounter(Loop *L, ScalarEvolution *SE,
+ SCEVExpander &Rewriter,
+ LoopInfo *LI) {
+ bool Changed = false;
+
+ SmallVector<BasicBlock *, 16> ExitingBlocks;
+ L->getExitingBlocks(ExitingBlocks);
+ if (ExitingBlocks.empty())
+ return false;
+
+ for (BasicBlock *ExitingBB : ExitingBlocks) {
+ // Can't handle non-branch yet.
+ if (!isa<BranchInst>(ExitingBB->getTerminator()))
+ continue;
+ BranchInst *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());
+ if (!BI || BI->isUnconditional())
+ continue;
+
+ // Right now, we only handle integer test conditions.
+ ICmpInst *Cond = dyn_cast<ICmpInst>(BI->getCondition());
+ if (!Cond)
+ continue;
+
+ // If our exitting block exits multiple loops, we can only rewrite the
+ // innermost one. Otherwise, we're changing how many times the innermost
+ // loop runs before it exits.
+ if (LI->getLoopFor(ExitingBB) != L)
+ continue;
+
+ // Get the number of times the back edge is executed when exiting the basic
+ // block. if it can't be calculated, skip this loop.
+ const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
+ if (isa<SCEVCouldNotCompute>(ExitCount) || ExitCount->isZero() ||
+ !Rewriter.isSafeToExpand(ExitCount))
+ continue;
+
+ // Let FindLoopCounter() select the LoopCounter first.
+ PHINode *FoundCounter = FindLoopCounter(L, ExitingBB, ExitCount, SE, DT);
+ if (!FoundCounter)
+ continue;
+
+ Value *PostIncOrDec =
+ FoundCounter->getIncomingValueForBlock(L->getLoopLatch());
+ // If the Pure LoopCounter is selected
+ if ((isLoopExitTestBasedOn(FoundCounter, ExitingBB) ||
+ isLoopExitTestBasedOn(PostIncOrDec, ExitingBB)) &&
+ // Checks if the FoundCounter is only used for loop counting.
+ isAlmostDeadIV(FoundCounter, L->getLoopLatch(), Cond)) {
+
+ // Try to find aother suitable IV as the new LoopCounter
+ PHINode *CandidateCounter =
+ findCandidateLoopCounter(L, FoundCounter, ExitCount, Cond, SE);
+ if (!CandidateCounter)
+ continue;
+
+ // Let this newly found candidate LoopCounter perform LFTR
+ Changed |= linearFunctionTestReplace(L, ExitingBB, ExitCount,
+ CandidateCounter, Rewriter);
+ }
+ }
+
+ return Changed;
+}
+
//===----------------------------------------------------------------------===//
// IndVarSimplify driver. Manage several subpasses of IV simplification.
//===----------------------------------------------------------------------===//
@@ -1983,11 +2129,19 @@ bool IndVarSimplify::run(Loop *L) {
if (!IndVar)
continue;
- // Avoid high cost expansions. Note: This heuristic is questionable in
- // that our definition of "high cost" is not exactly principled.
- if (Rewriter.isHighCostExpansion(ExitCount, L, SCEVCheapExpansionBudget,
- TTI, PreHeader->getTerminator()))
- continue;
+ bool LocalChanged = false;
+ // Enable a more aggressive 'pure loop counter elimination' without
+ // considering the extension cost of ExitCount
+ if (EnablePureLoopCounterElimination)
+ LocalChanged = tryToEliminatePureLoopCounter(L, SE, Rewriter, LI);
+ Changed |= LocalChanged;
+
+ if (!LocalChanged) {
+ // Avoid high cost expansions. Note: This heuristic is questionable in
+ // that our definition of "high cost" is not exactly principled.
+ if (Rewriter.isHighCostExpansion(ExitCount, L, SCEVCheapExpansionBudget,
+ TTI, PreHeader->getTerminator()))
+ continue;
if (!Rewriter.isSafeToExpand(ExitCount))
continue;
@@ -1995,6 +2149,7 @@ bool IndVarSimplify::run(Loop *L) {
Changed |= linearFunctionTestReplace(L, ExitingBB,
ExitCount, IndVar,
Rewriter);
+ }
}
}
// Clear the rewriter cache, because values that are in the rewriter's cache
diff --git a/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll b/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll
index f2d938f6452d3..05575c58bff6d 100644
--- a/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll
+++ b/llvm/test/Transforms/IndVarSimplify/X86/pr24356.ll
@@ -14,11 +14,11 @@ bb:
bb4.preheader: ; preds = %bb, %bb16
; CHECK-LABEL: bb4.preheader:
%b.03 = phi i8 [ 0, %bb ], [ %tmp17, %bb16 ]
-; CHECK: %tmp9 = icmp ugt i8 %b.03, 1
-; CHECK-NOT: %tmp9 = icmp ugt i8 0, 1
+; CHECK: %exitcond = icmp eq i8 %b.03, -1
+; CHECK-NOT: %exitcond = icmp ugt i8 0, 1
- %tmp9 = icmp ugt i8 %b.03, 1
- br i1 %tmp9, label %bb4.preheader.bb18.loopexit.split_crit_edge, label %bb4.preheader.bb4.preheader.split_crit_edge
+ %exitcond = icmp eq i8 %b.03, -1
+ br i1 %exitcond, label %bb4.preheader.bb18.loopexit.split_crit_edge, label %bb4.preheader.bb4.preheader.split_crit_edge
bb4.preheader.bb4.preheader.split_crit_edge: ; preds = %bb4.preheader
br label %bb4.preheader.split
diff --git a/llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll b/llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll
new file mode 100644
index 0000000000000..1c81ae3fc08e8
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/check-loop-counter-stride.ll
@@ -0,0 +1,39 @@
+; Test the case where the LoopCounter's stride equals -1.
+; RUN: opt -S -passes=indvars < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+
+define void @check_step_minus_one(ptr nocapture readonly %0) {
+; CHECK-LABEL: define void @check_step_minus_one(ptr readonly captures(none) %0) {
+; CHECK: entry:
+; CHECK-NEXT: br label [[loop:.*]]
+; CHECK: loop:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 31, [[entry:%.*]] ], [ [[PostDec:%.*]], [[loop:%.*]] ]
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds i32, ptr %0, i64 [[IV]]
+; CHECK-NEXT: [[LOAD:%.*]] = load i32, ptr [[GEP]], align 4
+; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[LOAD]], 1
+; CHECK-NEXT: store i32 [[ADD]], ptr [[GEP]], align 4
+; CHECK-NEXT: [[PostDec:%.*]] = add nsw i64 [[IV]], -1
+; CHECK-NEXT: [[CMP:%.*]] = icmp ne i64 [[PostDec]], 6
+; CHECK-NEXT: br i1 [[CMP]], label [[loop:.*]], label [[end:.*]]
+; CHECK: end:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop
+
+loop:
+ %1 = phi i64 [ 31, %entry ], [ %6, %loop ]
+ %3 = getelementptr inbounds i32, ptr %0, i64 %1
+ %4 = load i32, ptr %3, align 4
+ %5 = add nsw i32 %4, 1
+ store i32 %5, ptr %3, align 4
+ %6 = add nsw i64 %1, -1
+ %7 = mul nsw i64 %6, %6
+ %8 = icmp samesign ugt i64 %7, 48
+ br i1 %8, label %loop, label %end
+
+end:
+ ret void
+}
+
diff --git a/llvm/test/Transforms/IndVarSimplify/drop-exact.ll b/llvm/test/Transforms/IndVarSimplify/drop-exact.ll
index fb8027df74ee7..2a60c9d73a021 100644
--- a/llvm/test/Transforms/IndVarSimplify/drop-exact.ll
+++ b/llvm/test/Transforms/IndVarSimplify/drop-exact.ll
@@ -13,7 +13,6 @@ define void @drop_exact(ptr %p, ptr %p1) {
; CHECK-NEXT: ret void
; CHECK: bb12:
; CHECK-NEXT: [[TMP13:%.*]] = phi i32 [ -47436, [[BB:%.*]] ], [ [[TMP15:%.*]], [[BB12]] ]
-; CHECK-NEXT: [[TMP14:%.*]] = phi i32 [ 0, [[BB]] ], [ [[TMP42:%.*]], [[BB12]] ]
; CHECK-NEXT: [[TMP15]] = add nsw i32 [[TMP13]], -1
; CHECK-NEXT: [[TMP16:%.*]] = shl i32 [[TMP15]], 1
; CHECK-NEXT: [[TMP17:%.*]] = sub nsw i32 42831, [[TMP16]]
@@ -23,8 +22,7 @@ define void @drop_exact(ptr %p, ptr %p1) {
; CHECK-NEXT: store i32 [[TMP22]], ptr [[P:%.*]], align 4
; CHECK-NEXT: [[TMP26:%.*]] = zext i32 [[TMP20]] to i64
; CHECK-NEXT: store i64 [[TMP26]], ptr [[P1:%.*]], align 4
-; CHECK-NEXT: [[TMP42]] = add nuw nsw i32 [[TMP14]], 1
-; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[TMP42]], 719
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[TMP15]], -48155
; CHECK-NEXT: br i1 [[EXITCOND]], label [[BB7:%.*]], label [[BB12]]
;
bb:
@@ -60,7 +58,6 @@ define void @dont_drop_exact(ptr %p, ptr %p1) {
; CHECK-NEXT: ret void
; CHECK: bb12:
; CHECK-NEXT: [[TMP13:%.*]] = phi i32 [ -47436, [[BB:%.*]] ], [ [[TMP15:%.*]], [[BB12]] ]
-; CHECK-NEXT: [[TMP14:%.*]] = phi i32 [ 0, [[BB]] ], [ [[TMP42:%.*]], [[BB12]] ]
; CHECK-NEXT: [[TMP15]] = add nsw i32 [[TMP13]], -1
; CHECK-NEXT: [[TMP16:%.*]] = shl i32 [[TMP15]], 1
; CHECK-NEXT: [[TMP17:%.*]] = sub nsw i32 42831, [[TMP16]]
@@ -70,8 +67,7 @@ define void @dont_drop_exact(ptr %p, ptr %p1) {
; CHECK-NEXT: store i32 [[TMP22]], ptr [[P:%.*]], align 4
; CHECK-NEXT: [[TMP26:%.*]] = zext i32 [[TMP20]] to i64
; CHECK-NEXT: store i64 [[TMP26]], ptr [[P1:%.*]], align 4
-; CHECK-NEXT: [[TMP42]] = add nuw nsw i32 [[TMP14]], 1
-; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[TMP42]], 719
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[TMP15]], -48155
; CHECK-NEXT: br i1 [[EXITCOND]], label [[BB7:%.*]], label [[BB12]]
;
bb:
diff --git a/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll b/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll
index 08f9856ac603d..7f889cc6f6d0b 100644
--- a/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll
+++ b/llvm/test/Transforms/IndVarSimplify/eliminate-comparison.ll
@@ -68,6 +68,7 @@ return:
define i32 @_ZNK4llvm5APInt3ultERKS0_(i32 %tmp2.i1, ptr %tmp65, ptr %tmp73, ptr %tmp82, ptr %tmp90) {
; CHECK-LABEL: @_ZNK4llvm5APInt3ultERKS0_(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[SMIN:%.*]]= call i32 @llvm.smin.i32(i32 %tmp2.i1, i32 -1)
; CHECK-NEXT: br label [[BB18:%.*]]
; CHECK: bb13:
; CHECK-NEXT: [[TMP66:%.*]] = load ptr, ptr [[TMP65:%.*]], align 4
@@ -88,11 +89,11 @@ define i32 @_ZNK4llvm5APInt3ultERKS0_(i32 %tmp2.i1, ptr %tmp65, ptr %tmp73, ptr
; CHECK-NEXT: [[TMP95:%.*]] = icmp ult i64 [[TMP86]], [[TMP94]]
; CHECK-NEXT: br i1 [[TMP95]], label [[BB20_LOOPEXIT]], label [[BB17:%.*]]
; CHECK: bb17:
-; CHECK-NEXT: [[TMP97:%.*]] = add nsw i32 [[I]], -1
+; CHECK-NEXT: [[TMP97:%.*]] = add i32 [[I]], -1
; CHECK-NEXT: br label [[BB18]]
; CHECK: bb18:
; CHECK-NEXT: [[I]] = phi i32 [ [[TMP2_I1:%.*]], [[ENTRY:%.*]] ], [ [[TMP97]], [[BB17]] ]
-; CHECK-NEXT: [[TMP99:%.*]] = icmp sgt i32 [[I]], -1
+; CHECK-NEXT: [[TMP99:%.*]] = icmp ne i32 [[I]], [[SMIN:%.*]]
; CHECK-NEXT: br i1 [[TMP99]], label [[BB13:%.*]], label [[BB20_LOOPEXIT]]
; CHECK: bb20.loopexit:
; CHECK-NEXT: [[TMP_0_PH:%.*]] = phi i32 [ 0, [[BB18]] ], [ 1, [[BB15]] ], [ 0, [[BB13]] ]
@@ -917,7 +918,7 @@ define void @func_24(ptr %init.ptr) {
; CHECK-NEXT: br i1 true, label [[BE]], label [[LEAVE_LOOPEXIT:%.*]]
; CHECK: be:
; CHECK-NEXT: call void @side_effect()
-; CHECK-NEXT: [[BE_COND:%.*]] = icmp sgt i32 [[IV_DEC]], 4
+; CHECK-NEXT: [[BE_COND:%.*]] = icmp ne i32 [[IV_DEC]], 4
; CHECK-NEXT: br i1 [[BE_COND]], label [[LOOP]], label [[LEAVE_LOOPEXIT]]
; CHECK: leave.loopexit:
; CHECK-NEXT: br label [[LEAVE]]
diff --git a/llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll b/llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll
new file mode 100644
index 0000000000000..53cf5303d5dc6
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/eliminate-pure-loop-counter.ll
@@ -0,0 +1,71 @@
+; Test indvars for eliminating the Pure LoopCounter.
+; RUN: opt -S -passes=indvars -enable-pure-loop-counter-elimination < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+
+define void @pure_loop_counter_elimination(ptr nocapture readonly %0, ptr nocapture readonly %1, ptr nocapture readonly %2) {
+; CHECK-LABEL: define void @pure_loop_counter_elimination(ptr readonly captures(none) %0, ptr readonly captures(none) %1, ptr readonly captures(none) %2) {
+; CHECK-NEXT: [[entry:.*]]:
+; CHECK-NEXT: [[IS:%.*]] = load i32, ptr %1, align 4
+; CHECK-NEXT: [[IS_SEXT:%.*]] = sext i32 [[IS]] to i64
+; CHECK-NEXT: [[IE:%.*]] = load i32, ptr %2, align 4
+; CHECK-NEXT: [[IE_SEXT:%.*]] = sext i32 [[IE]] to i64
+; CHECK-NEXT: [[IE_IS:%.*]] = sub i64 [[IE_SEXT]], [[IS_SEXT]]
+; CHECK-NEXT: [[IE_IS_1:%.*]] = add nsw i64 [[IE_IS]], 1
+; CHECK-NEXT: br label %preheader
+
+; CHECK: [[preheader:.*]]:
+; CHECK-NEXT: [[Check:%.*]] = icmp sgt i64 [[IE_SEXT]], [[IS_SEXT]]
+; CHECK-NEXT: br i1 [[Check]], label %loop.preheader, label %end
+
+; CHECK: [[loop_preheader:.*]]:
+; CHECK-NEXT: [[IE_Plus_2:%.*]] = add i64 [[IE_SEXT]], 2
+; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.umin.i64(i64 [[IE_IS_1]], i64 1)
+; CHECK-NEXT: [[TEMP:%.*]] = sub i64 [[IE_Plus_2]], [[MIN]]
+; CHECK-NEXT: br label %loop
+
+; CHECK: [[loop:.*]]:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ %15, %loop ], [ [[IS_SEXT]], %loop.preheader ]
+; CHECK-NEXT: [[ADDR:%.*]] = getelementptr float, ptr %0, i64 [[IV]]
+; CHECK-NEXT: [[DATA:%.*]] = load float, ptr [[ADDR]], align 4
+; CHECK-NEXT: [[DATA_PUS_1:%.*]] = fadd fast float [[DATA]], 1.000000e+00
+; CHECK-NEXT: store float [[DATA_PUS_1]], ptr [[ADDR]], align 4
+; CHECK-NEXT: [[PostAdd:%.*]] = add i64 [[IV]], 1
+; CHECK-NEXT: [[ExitCond:%.*]] = icmp ne i64 [[PostAdd]], [[TEMP]]
+; CHECK-NEXT: br i1 [[ExitCond]], label %loop, label %end.loopexit
+
+; CHECK: [[end_loopexit:.*]]:
+; CHECK-NEXT: br label %end
+
+; CHECK: [[end:.*]]:
+; CHECK-NEXT: ret void
+; CHECK-LABEL: }
+
+entry:
+ %3 = load i32, ptr %1, align 4
+ %4 = sext i32 %3 to i64
+ %5 = load i32, ptr %2, align 4
+ %6 = sext i32 %5 to i64
+ %7 = sub nsw i64 %6, %4
+ %8 = add nsw i64 %7, 1
+ br label %preheader
+
+preheader:
+ %cmp = icmp sgt i64 %6, %4
+ br i1 %cmp, label %loop, label %end
+
+loop:
+ %9 = phi i64 [ %15, %loop ], [ %8, %preheader ]
+ %10 = phi i64 [ %14, %loop ], [ %4, %preheader ]
+ %11 = getelementptr float, ptr %0, i64 %10
+ %12 = load float, ptr %11, align 4
+ %13 = fadd fast float %12, 1.000000e+00
+ store float %13, ptr %11, align 4
+ %14 = add i64 %10, 1
+ %15 = add nsw i64 %9, -1
+ %16 = icmp ugt i64 %9, 1
+ br i1 %16, label %loop, label %end
+
+end:
+ ret void
+}
\ No newline at end of file
diff --git a/llvm/test/Transforms/IndVarSimplify/lftr.ll b/llvm/test/Transforms/IndVarSimplify/lftr.ll
index 5ee62ba357ab6..f321daa00a953 100644
--- a/llvm/test/Transforms/IndVarSimplify/lftr.ll
+++ b/llvm/test/Transforms/IndVarSimplify/lftr.ll
@@ -43,7 +43,7 @@ define i32 @pre_to_post_sub() {
; CHECK-NEXT: [[I:%.*]] = phi i32 [ 1000, [[ENTRY:%.*]] ], [ [[I_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[I_NEXT]] = sub nsw i32 [[I]], 1
; CHECK-NEXT: store i32 [[I]], ptr @A, align 4
-; CHECK-NEXT: [[C:%.*]] = icmp samesign ugt i32 [[I]], 0
+; CHECK-NEXT: [[C:%.*]] = icmp ne i32 [[I_NEXT]], -1
; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[LOOPEXIT:%.*]]
; CHECK: loopexit:
; CHECK-NEXT: ret i32 0
More information about the llvm-commits
mailing list