[llvm] 0633d5c - [LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.

Roman Lebedev via llvm-commits llvm-commits at lists.llvm.org
Mon May 17 10:33:59 PDT 2021


Author: Roman Lebedev
Date: 2021-05-17T20:33:33+03:00
New Revision: 0633d5ce7bd9339228d301d4b1ee49fbd0a78b56

URL: https://github.com/llvm/llvm-project/commit/0633d5ce7bd9339228d301d4b1ee49fbd0a78b56
DIFF: https://github.com/llvm/llvm-project/commit/0633d5ce7bd9339228d301d4b1ee49fbd0a78b56.diff

LOG: [LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.

I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..

This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(unsigned val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one, since that is what i need:
```
int countActiveBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

I've followed in footstep of 'left-shift until bittest' idiom (D91038),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.

This can have a shocking effect on certain benchmarks:
```
raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
2021-05-09T01:06:05+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3600.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 5.26, 6.29, 3.49
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean          145 ms          145 ms          128   0.145319         0.999981   10.1568M       69.8949M        69.8936M      6.88159       6.88146   0.145322
p1319978.orf/threads:32/process_time/real_time_median        145 ms          145 ms          128   0.145317         0.999986   10.1568M       69.8941M        69.8931M      6.88151       6.88141   0.145319
p1319978.orf/threads:32/process_time/real_time_stddev      0.766 ms        0.766 ms          128   766.586u         15.1302u          0       354.167k        354.098k    0.0348699     0.0348631   766.469u
RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
2021-05-09T01:06:24+03:00
Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Run on (32 X 3599.95 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 4.05, 5.95, 3.43
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean         99.8 ms         99.8 ms          128  0.0997758         0.999972   10.1568M       101.797M        101.794M      10.0225       10.0222  0.0997786
p1319978.orf/threads:32/process_time/real_time_median       99.7 ms         99.7 ms          128  0.0997165         0.999985   10.1568M       101.857M        101.854M      10.0284       10.0281  0.0997195
p1319978.orf/threads:32/process_time/real_time_stddev      0.224 ms        0.224 ms          128   224.166u          34.345u          0        226.81k        227.231k    0.0223309     0.0223723   224.586u
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
p1319978.orf/threads:32/process_time/real_time_mean                  -0.3134         -0.3134           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_median                -0.3138         -0.3138           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_stddev                -0.7073         -0.7078             1             0             1             0

```

Reviewed By: craig.topper, zhuhan0

Differential Revision: https://reviews.llvm.org/D102116

Added: 
    

Modified: 
    llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
    llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero-debuginfo.ll
    llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index 34185e06719e..97c72f3bb812 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -112,6 +112,8 @@ STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");
 STATISTIC(
     NumShiftUntilBitTest,
     "Number of uncountable loops recognized as 'shift until bitttest' idiom");
+STATISTIC(NumShiftUntilZero,
+          "Number of uncountable loops recognized as 'shift until zero' idiom");
 
 bool DisableLIRP::All;
 static cl::opt<bool, true>
@@ -248,6 +250,7 @@ class LoopIdiomRecognize {
                                 bool IsCntPhiUsedOutsideLoop);
 
   bool recognizeShiftUntilBitTest();
+  bool recognizeShiftUntilZero();
 
   /// @}
 };
@@ -1373,7 +1376,7 @@ bool LoopIdiomRecognize::runOnNoncountableLoop() {
                     << CurLoop->getHeader()->getName() << "\n");
 
   return recognizePopcount() || recognizeAndInsertFFS() ||
-         recognizeShiftUntilBitTest();
+         recognizeShiftUntilBitTest() || recognizeShiftUntilZero();
 }
 
 /// Check if the given conditional branch is based on the comparison between
@@ -2262,7 +2265,7 @@ bool LoopIdiomRecognize::recognizeShiftUntilBitTest() {
   assert(LoopPreheaderBB && "There is always a loop preheader.");
 
   BasicBlock *SuccessorBB = CurLoop->getExitBlock();
-  assert(LoopPreheaderBB && "There is only a single successor.");
+  assert(SuccessorBB && "There is only a single successor.");
 
   IRBuilder<> Builder(LoopPreheaderBB->getTerminator());
   Builder.SetCurrentDebugLocation(cast<Instruction>(XCurr)->getDebugLoc());
@@ -2395,3 +2398,329 @@ bool LoopIdiomRecognize::recognizeShiftUntilBitTest() {
   ++NumShiftUntilBitTest;
   return MadeChange;
 }
+
+/// Return true if the idiom is detected in the loop.
+///
+/// The core idiom we are trying to detect is:
+/// \code
+///   entry:
+///     <...>
+///     %start = <...>
+///     %extraoffset = <...>
+///     <...>
+///     br label %for.cond
+///
+///   loop:
+///     %iv = phi i8 [ %start, %entry ], [ %iv.next, %for.cond ]
+///     %nbits = add nsw i8 %iv, %extraoffset
+///     %val.shifted = lshr i8 %val, %nbits
+///     %val.shifted.iszero = icmp eq i8 %val.shifted, 0
+///     %iv.next = add i8 %iv, 1
+///     <...>
+///     br i1 %val.shifted.iszero, label %end, label %loop
+///
+///   end:
+///     %iv.res = phi i8 [ %iv, %loop ] <...>
+///     %nbits.res = phi i8 [ %nbits, %loop ] <...>
+///     %val.shifted.res = phi i8 [ %val.shifted, %loop ] <...>
+///     %val.shifted.iszero.res = phi i1 [ %val.shifted.iszero, %loop ] <...>
+///     %iv.next.res = phi i8 [ %iv.next, %loop ] <...>
+///     <...>
+/// \endcode
+static bool detectShiftUntilZeroIdiom(Loop *CurLoop, ScalarEvolution *SE,
+                                      Instruction *&ValShiftedIsZero,
+                                      Instruction *&IV, Value *&Start,
+                                      Value *&Val, const SCEV *&ExtraOffsetExpr,
+                                      bool &InvertedCond) {
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE
+             " Performing shift-until-zero idiom detection.\n");
+
+  // Give up if the loop has multiple blocks or multiple backedges.
+  if (CurLoop->getNumBlocks() != 1 || CurLoop->getNumBackEdges() != 1) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE " Bad block/backedge count.\n");
+    return false;
+  }
+
+  Instruction *ValShifted, *NBits, *IVNext;
+  Value *ExtraOffset;
+
+  BasicBlock *LoopHeaderBB = CurLoop->getHeader();
+  BasicBlock *LoopPreheaderBB = CurLoop->getLoopPreheader();
+  assert(LoopPreheaderBB && "There is always a loop preheader.");
+
+  using namespace PatternMatch;
+
+  // Step 1: Check if the loop backedge, condition is in desirable form.
+
+  ICmpInst::Predicate Pred;
+  BasicBlock *TrueBB, *FalseBB;
+  if (!match(LoopHeaderBB->getTerminator(),
+             m_Br(m_Instruction(ValShiftedIsZero), m_BasicBlock(TrueBB),
+                  m_BasicBlock(FalseBB))) ||
+      !match(ValShiftedIsZero,
+             m_ICmp(Pred, m_Instruction(ValShifted), m_Zero())) ||
+      !ICmpInst::isEquality(Pred)) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE " Bad backedge structure.\n");
+    return false;
+  }
+
+  // Step 2: Check if the comparison's operand is in desirable form.
+
+  if (!match(ValShifted, m_LShr(m_Value(Val), m_Instruction(NBits)))) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE " Bad comparisons value computation.\n");
+    return false;
+  }
+
+  // Step 3: Check if the shift amount is in desirable form.
+
+  if (match(NBits, m_c_Add(m_Instruction(IV),
+                           m_LoopInvariant(m_Value(ExtraOffset), CurLoop))) &&
+      (NBits->hasNoSignedWrap() || NBits->hasNoUnsignedWrap()))
+    ExtraOffsetExpr = SE->getNegativeSCEV(SE->getSCEV(ExtraOffset));
+  else if (match(NBits,
+                 m_Sub(m_Instruction(IV),
+                       m_LoopInvariant(m_Value(ExtraOffset), CurLoop))) &&
+           NBits->hasNoSignedWrap())
+    ExtraOffsetExpr = SE->getSCEV(ExtraOffset);
+  else {
+    IV = NBits;
+    ExtraOffsetExpr = SE->getZero(NBits->getType());
+  }
+
+  // Step 4: Check if the recurrence is in desirable form.
+  auto *IVPN = dyn_cast<PHINode>(IV);
+  if (!IVPN || IVPN->getParent() != LoopHeaderBB) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE " Not an expected PHI node.\n");
+    return false;
+  }
+
+  Start = IVPN->getIncomingValueForBlock(LoopPreheaderBB);
+  IVNext = dyn_cast<Instruction>(IVPN->getIncomingValueForBlock(LoopHeaderBB));
+
+  if (!IVNext || !match(IVNext, m_Add(m_Specific(IVPN), m_One()))) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE " Bad recurrence.\n");
+    return false;
+  }
+
+  // Step 4: Check if the backedge's destinations are in desirable form.
+
+  assert(ICmpInst::isEquality(Pred) &&
+         "Should only get equality predicates here.");
+
+  // cmp-br is commutative, so canonicalize to a single variant.
+  InvertedCond = Pred != ICmpInst::Predicate::ICMP_EQ;
+  if (InvertedCond) {
+    Pred = ICmpInst::getInversePredicate(Pred);
+    std::swap(TrueBB, FalseBB);
+  }
+
+  // We expect to exit loop when comparison yields true,
+  // so when it yields false we should branch back to loop header.
+  if (FalseBB != LoopHeaderBB) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE " Bad backedge flow.\n");
+    return false;
+  }
+
+  // Okay, idiom checks out.
+  return true;
+}
+
+/// Look for the following loop:
+/// \code
+///   entry:
+///     <...>
+///     %start = <...>
+///     %extraoffset = <...>
+///     <...>
+///     br label %for.cond
+///
+///   loop:
+///     %iv = phi i8 [ %start, %entry ], [ %iv.next, %for.cond ]
+///     %nbits = add nsw i8 %iv, %extraoffset
+///     %val.shifted = lshr i8 %val, %nbits
+///     %val.shifted.iszero = icmp eq i8 %val.shifted, 0
+///     %iv.next = add i8 %iv, 1
+///     <...>
+///     br i1 %val.shifted.iszero, label %end, label %loop
+///
+///   end:
+///     %iv.res = phi i8 [ %iv, %loop ] <...>
+///     %nbits.res = phi i8 [ %nbits, %loop ] <...>
+///     %val.shifted.res = phi i8 [ %val.shifted, %loop ] <...>
+///     %val.shifted.iszero.res = phi i1 [ %val.shifted.iszero, %loop ] <...>
+///     %iv.next.res = phi i8 [ %iv.next, %loop ] <...>
+///     <...>
+/// \endcode
+///
+/// And transform it into:
+/// \code
+///   entry:
+///     <...>
+///     %start = <...>
+///     %extraoffset = <...>
+///     <...>
+///     %val.numleadingzeros = call i8 @llvm.ctlz.i8(i8 %val, i1 0)
+///     %val.numactivebits = sub i8 8, %val.numleadingzeros
+///     %extraoffset.neg = sub i8 0, %extraoffset
+///     %tmp = add i8 %val.numactivebits, %extraoffset.neg
+///     %iv.final = call i8 @llvm.smax.i8(i8 %tmp, i8 %start)
+///     %loop.tripcount = sub i8 %iv.final, %start
+///     br label %loop
+///
+///   loop:
+///     %loop.iv = phi i8 [ 0, %entry ], [ %loop.iv.next, %loop ]
+///     %loop.iv.next = add i8 %loop.iv, 1
+///     %loop.ivcheck = icmp eq i8 %loop.iv.next, %loop.tripcount
+///     %iv = add i8 %loop.iv, %start
+///     <...>
+///     br i1 %loop.ivcheck, label %end, label %loop
+///
+///   end:
+///     %iv.res = phi i8 [ %iv.final, %loop ] <...>
+///     <...>
+/// \endcode
+bool LoopIdiomRecognize::recognizeShiftUntilZero() {
+  bool MadeChange = false;
+
+  Instruction *ValShiftedIsZero, *IV;
+  Value *Start, *Val;
+  const SCEV *ExtraOffsetExpr;
+  bool InvertedCond;
+  if (!detectShiftUntilZeroIdiom(CurLoop, SE, ValShiftedIsZero, IV, Start, Val,
+                                 ExtraOffsetExpr, InvertedCond)) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE
+               " shift-until-zero idiom detection failed.\n");
+    return MadeChange;
+  }
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE " shift-until-zero idiom detected!\n");
+
+  // Ok, it is the idiom we were looking for, we *could* transform this loop,
+  // but is it profitable to transform?
+
+  BasicBlock *LoopHeaderBB = CurLoop->getHeader();
+  BasicBlock *LoopPreheaderBB = CurLoop->getLoopPreheader();
+  assert(LoopPreheaderBB && "There is always a loop preheader.");
+
+  BasicBlock *SuccessorBB = CurLoop->getExitBlock();
+  assert(SuccessorBB && "There is only a single successor.");
+
+  IRBuilder<> Builder(LoopPreheaderBB->getTerminator());
+  Builder.SetCurrentDebugLocation(IV->getDebugLoc());
+
+  Intrinsic::ID IntrID = Intrinsic::ctlz;
+  Type *Ty = Val->getType();
+  unsigned Bitwidth = Ty->getScalarSizeInBits();
+
+  TargetTransformInfo::TargetCostKind CostKind =
+      TargetTransformInfo::TCK_SizeAndLatency;
+
+  // The rewrite is considered to be unprofitable iff and only iff the
+  // intrinsic we'll use are not cheap. Note that we are okay with *just*
+  // making the loop countable, even if nothing else changes.
+  IntrinsicCostAttributes Attrs(
+      IntrID, Ty, {UndefValue::get(Ty), /*is_zero_undef=*/Builder.getFalse()});
+  InstructionCost Cost = TTI->getIntrinsicInstrCost(Attrs, CostKind);
+  if (Cost > TargetTransformInfo::TCC_Basic) {
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE
+               " Intrinsic is too costly, not beneficial\n");
+    return MadeChange;
+  }
+
+  // Ok, transform appears worthwhile.
+  MadeChange = true;
+
+  bool OffsetIsZero = false;
+  if (auto *ExtraOffsetExprC = dyn_cast<SCEVConstant>(ExtraOffsetExpr))
+    OffsetIsZero = ExtraOffsetExprC->isZero();
+
+  // Step 1: Compute the loop's final IV value / trip count.
+
+  CallInst *ValNumLeadingZeros = Builder.CreateIntrinsic(
+      IntrID, Ty, {Val, /*is_zero_undef=*/Builder.getFalse()},
+      /*FMFSource=*/nullptr, Val->getName() + ".numleadingzeros");
+  Value *ValNumActiveBits = Builder.CreateSub(
+      ConstantInt::get(Ty, Ty->getScalarSizeInBits()), ValNumLeadingZeros,
+      Val->getName() + ".numactivebits", /*HasNUW=*/true,
+      /*HasNSW=*/Bitwidth != 2);
+
+  SCEVExpander Expander(*SE, *DL, "loop-idiom");
+  Expander.setInsertPoint(&*Builder.GetInsertPoint());
+  Value *ExtraOffset = Expander.expandCodeFor(ExtraOffsetExpr);
+
+  Value *ValNumActiveBitsOffset = Builder.CreateAdd(
+      ValNumActiveBits, ExtraOffset, ValNumActiveBits->getName() + ".offset",
+      /*HasNUW=*/OffsetIsZero, /*HasNSW=*/true);
+  Value *IVFinal = Builder.CreateIntrinsic(Intrinsic::smax, {Ty},
+                                           {ValNumActiveBitsOffset, Start},
+                                           /*FMFSource=*/nullptr, "iv.final");
+
+  auto *LoopBackedgeTakenCount = cast<Instruction>(Builder.CreateSub(
+      IVFinal, Start, CurLoop->getName() + ".backedgetakencount",
+      /*HasNUW=*/OffsetIsZero, /*HasNSW=*/true));
+  // FIXME: or when the offset was `add nuw`
+
+  // We know loop's backedge-taken count, but what's loop's trip count?
+  Value *LoopTripCount =
+      Builder.CreateAdd(LoopBackedgeTakenCount, ConstantInt::get(Ty, 1),
+                        CurLoop->getName() + ".tripcount", /*HasNUW=*/true,
+                        /*HasNSW=*/Bitwidth != 2);
+
+  // Step 2: Adjust the successor basic block to recieve the original
+  //         induction variable's final value instead of the orig. IV itself.
+
+  IV->replaceUsesOutsideBlock(IVFinal, LoopHeaderBB);
+
+  // Step 3: Rewrite the loop into a countable form, with canonical IV.
+
+  // The new canonical induction variable.
+  Builder.SetInsertPoint(&LoopHeaderBB->front());
+  auto *CIV = Builder.CreatePHI(Ty, 2, CurLoop->getName() + ".iv");
+
+  // The induction itself.
+  Builder.SetInsertPoint(LoopHeaderBB->getFirstNonPHI());
+  auto *CIVNext =
+      Builder.CreateAdd(CIV, ConstantInt::get(Ty, 1), CIV->getName() + ".next",
+                        /*HasNUW=*/true, /*HasNSW=*/Bitwidth != 2);
+
+  // The loop trip count check.
+  auto *CIVCheck = Builder.CreateICmpEQ(CIVNext, LoopTripCount,
+                                        CurLoop->getName() + ".ivcheck");
+  auto *NewIVCheck = CIVCheck;
+  if (InvertedCond) {
+    NewIVCheck = Builder.CreateNot(CIVCheck);
+    NewIVCheck->takeName(ValShiftedIsZero);
+  }
+
+  // The original IV, but rebased to be an offset to the CIV.
+  auto *IVDePHId = Builder.CreateAdd(CIV, Start, "", /*HasNUW=*/false,
+                                     /*HasNSW=*/true); // FIXME: what about NUW?
+  IVDePHId->takeName(IV);
+
+  // The loop terminator.
+  Builder.SetInsertPoint(LoopHeaderBB->getTerminator());
+  Builder.CreateCondBr(CIVCheck, SuccessorBB, LoopHeaderBB);
+  LoopHeaderBB->getTerminator()->eraseFromParent();
+
+  // Populate the IV PHI.
+  CIV->addIncoming(ConstantInt::get(Ty, 0), LoopPreheaderBB);
+  CIV->addIncoming(CIVNext, LoopHeaderBB);
+
+  // Step 4: Forget the "non-computable" trip-count SCEV associated with the
+  //   loop. The loop would otherwise not be deleted even if it becomes empty.
+
+  SE->forgetLoop(CurLoop);
+
+  // Step 5: Try to cleanup the loop's body somewhat.
+  IV->replaceAllUsesWith(IVDePHId);
+  IV->eraseFromParent();
+
+  ValShiftedIsZero->replaceAllUsesWith(NewIVCheck);
+  ValShiftedIsZero->eraseFromParent();
+
+  // Other passes will take care of actually deleting the loop if possible.
+
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE " shift-until-zero idiom optimized!\n");
+
+  ++NumShiftUntilZero;
+  return MadeChange;
+}

diff  --git a/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero-debuginfo.ll b/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero-debuginfo.ll
index 9184cf194d2e..60c8116d70d7 100644
--- a/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero-debuginfo.ll
+++ b/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero-debuginfo.ll
@@ -7,25 +7,34 @@ declare void @escape_outer(i8, i8, i8, i1, i8)
 define i8 @p(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @p(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br label [[LOOP:%.*]], !dbg [[DBG20:![0-9]+]]
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false), !dbg [[DBG20:![0-9]+]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]], !dbg [[DBG20]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i8 0, [[EXTRAOFFSET:%.*]], !dbg [[DBG21:![0-9]+]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i8 [[VAL_NUMACTIVEBITS]], [[TMP0]], !dbg [[DBG20]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]]), !dbg [[DBG20]]
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i8 [[IV_FINAL]], [[START]], !dbg [[DBG20]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1, !dbg [[DBG20]]
+; CHECK-NEXT:    br label [[LOOP:%.*]], !dbg [[DBG21]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ], !dbg [[DBG21:![0-9]+]]
-; CHECK-NEXT:    call void @llvm.dbg.value(metadata i8 [[IV]], metadata [[META9:![0-9]+]], metadata !DIExpression()), !dbg [[DBG21]]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET:%.*]], !dbg [[DBG22:![0-9]+]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ], !dbg [[DBG20]]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1, !dbg [[DBG20]]
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]], !dbg [[DBG20]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]], !dbg [[DBG20]]
+; CHECK-NEXT:    call void @llvm.dbg.value(metadata i8 [[IV]], metadata [[META9:![0-9]+]], metadata !DIExpression()), !dbg [[DBG20]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET]], !dbg [[DBG22:![0-9]+]]
 ; CHECK-NEXT:    call void @llvm.dbg.value(metadata i8 [[NBITS]], metadata [[META11:![0-9]+]], metadata !DIExpression()), !dbg [[DBG22]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[NBITS]], !dbg [[DBG23:![0-9]+]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[NBITS]], !dbg [[DBG23:![0-9]+]]
 ; CHECK-NEXT:    call void @llvm.dbg.value(metadata i8 [[VAL_SHIFTED]], metadata [[META12:![0-9]+]], metadata !DIExpression()), !dbg [[DBG23]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i8 [[VAL_SHIFTED]], 0, !dbg [[DBG24:![0-9]+]]
-; CHECK-NEXT:    call void @llvm.dbg.value(metadata i1 [[VAL_SHIFTED_ISZERO]], metadata [[META13:![0-9]+]], metadata !DIExpression()), !dbg [[DBG24]]
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1, !dbg [[DBG25:![0-9]+]]
+; CHECK-NEXT:    call void @llvm.dbg.value(metadata i1 [[LOOP_IVCHECK]], metadata [[META13:![0-9]+]], metadata !DIExpression()), !dbg [[DBG24:![0-9]+]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1, !dbg [[DBG25:![0-9]+]]
 ; CHECK-NEXT:    call void @llvm.dbg.value(metadata i8 [[IV_NEXT]], metadata [[META14:![0-9]+]], metadata !DIExpression()), !dbg [[DBG25]]
-; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i8 [[IV_NEXT]]), !dbg [[DBG26:![0-9]+]]
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]], !dbg [[DBG27:![0-9]+]]
+; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i8 [[IV_NEXT]]), !dbg [[DBG26:![0-9]+]]
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]], !dbg [[DBG27:![0-9]+]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ], !dbg [[DBG28:![0-9]+]]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ], !dbg [[DBG28:![0-9]+]]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i8 [ [[NBITS]], [[LOOP]] ], !dbg [[DBG29:![0-9]+]]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ], !dbg [[DBG30:![0-9]+]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ], !dbg [[DBG31:![0-9]+]]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ], !dbg [[DBG31:![0-9]+]]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i8 [ [[IV_NEXT]], [[LOOP]] ], !dbg [[DBG32:![0-9]+]]
 ; CHECK-NEXT:    call void @llvm.dbg.value(metadata i8 [[IV_RES]], metadata [[META15:![0-9]+]], metadata !DIExpression()), !dbg [[DBG28]]
 ; CHECK-NEXT:    call void @llvm.dbg.value(metadata i8 [[NBITS_RES]], metadata [[META16:![0-9]+]], metadata !DIExpression()), !dbg [[DBG29]]

diff  --git a/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll b/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll
index 9f5c5c8aca81..ed11afecf1cb 100644
--- a/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll
+++ b/llvm/test/Transforms/LoopIdiom/X86/logical-right-shift-until-zero.ll
@@ -11,20 +11,29 @@ declare i8 @gen.i8()
 define i8 @p0(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @p0(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i8 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i8 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i8 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i8 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i8 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i8 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i8 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i8 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer(i8 [[IV_RES]], i8 [[NBITS_RES]], i8 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i8 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i8 [[IV_RES]]
@@ -59,20 +68,29 @@ end:
 define i8 @p1(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @p1(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i8 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i8 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i8 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i8 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i8 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i8 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i8 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i8 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i8 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i8 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer(i8 [[IV_RES]], i8 [[NBITS_RES]], i8 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i8 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i8 [[IV_RES]]
@@ -107,20 +125,28 @@ end:
 define i8 @p2(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @p2(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i8 [[VAL_NUMACTIVEBITS]], [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i8 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i8 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i8 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i8 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i8 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i8 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i8 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i8 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer(i8 [[IV_RES]], i8 [[NBITS_RES]], i8 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i8 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i8 [[IV_RES]]
@@ -299,18 +325,26 @@ end:
 define i8 @p6(i8 %val, i8 %start) {
 ; CHECK-LABEL: @p6(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nuw nsw i8 [[VAL_NUMACTIVEBITS]], 0
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nuw nsw i8 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[IV]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i8 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[IV]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i8 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[IV]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[IV]], i8 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i8 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i8 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer(i8 [[IV_RES]], i8 [[IV_RES]], i8 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i8 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i8 [[IV_RES]]
@@ -346,20 +380,29 @@ declare void @escape_outer.i7(i7, i7, i7, i1, i7)
 define i7 @p7(i7 %val, i7 %start, i7 %extraoffset) {
 ; CHECK-LABEL: @p7(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i7 @llvm.ctlz.i7(i7 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i7 7, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i7 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i7 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i7 @llvm.smax.i7(i7 [[VAL_NUMACTIVEBITS_OFFSET]], i7 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i7 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i7 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i7 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i7 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i7 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i7 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i7 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i7(i7 [[IV]], i7 [[NBITS]], i7 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i7 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i7 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i7 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i7 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i7 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i7 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i7 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i7 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i7(i7 [[IV]], i7 [[NBITS]], i7 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i7 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i7 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i7 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i7 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i7 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i7 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i7(i7 [[IV_RES]], i7 [[NBITS_RES]], i7 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i7 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i7 [[IV_RES]]
@@ -442,17 +485,27 @@ end:
 define i8 @t9(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @t9(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i8 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i8 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i8 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISNOTZERO:%.*]] = icmp ne i8 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISNOTZERO:%.*]] = xor i1 [[LOOP_IVCHECK]], true
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1
 ; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISNOTZERO]], i8 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISNOTZERO]], label [[LOOP]], label [[END:%.*]]
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i8 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_ISNOTZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISNOTZERO]], [[LOOP]] ]
@@ -831,20 +884,29 @@ end:
 define i8 @t17(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @t17(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i8 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i8 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i8 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[EXTRAOFFSET:%.*]], [[IV]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i8 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i8 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[EXTRAOFFSET]], [[IV]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i8 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i8 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i8 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer(i8 [[IV_RES]], i8 [[NBITS_RES]], i8 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i8 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i8 [[IV_RES]]
@@ -1178,22 +1240,31 @@ end:
 define i8 @n24(i8 %val, i8 %start, i8 %extraoffset) {
 ; CHECK-LABEL: @n24(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i8 @llvm.ctlz.i8(i8 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i8 8, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i8 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i8 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i8 @llvm.smax.i8(i8 [[VAL_NUMACTIVEBITS_OFFSET]], i8 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i8 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i8 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i8 [[VAL_SHIFTED]], 0
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i8 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i8 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i8 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i8 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i8 [[VAL]], [[NBITS]]
 ; CHECK-NEXT:    [[NOT_IV_NEXT:%.*]] = add i8 [[IV]], 1
-; CHECK-NEXT:    [[IV_NEXT]] = add i8 [[IV]], 1
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i8 [[IV]], 1
 ; CHECK-NEXT:    [[ALSO_IV_NEXT:%.*]] = add i8 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i8 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    call void @escape_inner(i8 [[IV]], i8 [[NBITS]], i8 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i8 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i8 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i8 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i8 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i8 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer(i8 [[IV_RES]], i8 [[NBITS_RES]], i8 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i8 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i8 [[IV_RES]]
@@ -1237,18 +1308,26 @@ declare void @escape_outer.i3(i3, i3, i3, i1, i3)
 define i1 @t25_nooffset_i1(i1 %val, i1 %start) {
 ; CHECK-LABEL: @t25_nooffset_i1(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i1 @llvm.ctlz.i1(i1 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i1 true, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nuw nsw i1 [[VAL_NUMACTIVEBITS]], false
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i1 @llvm.smax.i1(i1 [[VAL_NUMACTIVEBITS_OFFSET]], i1 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nuw nsw i1 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i1 [[LOOP_BACKEDGETAKENCOUNT]], true
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i1 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL:%.*]], [[IV]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i1 [[VAL_SHIFTED]], false
-; CHECK-NEXT:    [[IV_NEXT]] = add i1 [[IV]], true
-; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[IV]], i1 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i1 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i1 [ false, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i1 [[LOOP_IV]], true
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i1 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i1 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL]], [[IV]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i1 [[IV]], true
+; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[IV]], i1 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i1 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i1 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i1 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i1(i1 [[IV_RES]], i1 [[IV_RES]], i1 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i1 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i1 [[IV_RES]]
@@ -1279,18 +1358,26 @@ end:
 define i2 @t26_nooffset_i2(i2 %val, i2 %start) {
 ; CHECK-LABEL: @t26_nooffset_i2(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i2 @llvm.ctlz.i2(i2 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw i2 -2, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nuw nsw i2 [[VAL_NUMACTIVEBITS]], 0
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i2 @llvm.smax.i2(i2 [[VAL_NUMACTIVEBITS_OFFSET]], i2 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nuw nsw i2 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw i2 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i2 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL:%.*]], [[IV]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i2 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i2 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[IV]], i2 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i2 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i2 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw i2 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i2 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i2 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL]], [[IV]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i2 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[IV]], i2 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i2 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i2 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i2 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i2(i2 [[IV_RES]], i2 [[IV_RES]], i2 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i2 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i2 [[IV_RES]]
@@ -1321,18 +1408,26 @@ end:
 define i3 @t27_nooffset_i3(i3 %val, i3 %start) {
 ; CHECK-LABEL: @t27_nooffset_i3(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i3 @llvm.ctlz.i3(i3 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i3 3, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nuw nsw i3 [[VAL_NUMACTIVEBITS]], 0
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i3 @llvm.smax.i3(i3 [[VAL_NUMACTIVEBITS_OFFSET]], i3 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nuw nsw i3 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i3 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i3 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL:%.*]], [[IV]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i3 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i3 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[IV]], i3 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i3 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i3 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i3 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i3 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i3 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL]], [[IV]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i3 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[IV]], i3 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i3 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i3 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i3 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i3(i3 [[IV_RES]], i3 [[IV_RES]], i3 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i3 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i3 [[IV_RES]]
@@ -1364,20 +1459,28 @@ end:
 define i1 @t27_addnsw_i1(i1 %val, i1 %start, i1 %extraoffset) {
 ; CHECK-LABEL: @t27_addnsw_i1(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i1 @llvm.ctlz.i1(i1 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i1 true, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i1 [[VAL_NUMACTIVEBITS]], [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i1 @llvm.smax.i1(i1 [[VAL_NUMACTIVEBITS_OFFSET]], i1 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i1 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i1 [[LOOP_BACKEDGETAKENCOUNT]], true
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i1 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i1 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i1 [[VAL_SHIFTED]], false
-; CHECK-NEXT:    [[IV_NEXT]] = add i1 [[IV]], true
-; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[NBITS]], i1 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i1 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i1 [ false, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i1 [[LOOP_IV]], true
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i1 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i1 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i1 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i1 [[IV]], true
+; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[NBITS]], i1 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i1 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i1 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i1 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i1 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i1(i1 [[IV_RES]], i1 [[NBITS_RES]], i1 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i1 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i1 [[IV_RES]]
@@ -1410,20 +1513,29 @@ end:
 define i2 @t28_addnsw_i2(i2 %val, i2 %start, i2 %extraoffset) {
 ; CHECK-LABEL: @t28_addnsw_i2(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i2 @llvm.ctlz.i2(i2 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw i2 -2, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i2 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i2 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i2 @llvm.smax.i2(i2 [[VAL_NUMACTIVEBITS_OFFSET]], i2 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i2 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw i2 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i2 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i2 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i2 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i2 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[NBITS]], i2 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i2 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i2 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw i2 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i2 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i2 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i2 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i2 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[NBITS]], i2 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i2 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i2 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i2 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i2 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i2(i2 [[IV_RES]], i2 [[NBITS_RES]], i2 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i2 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i2 [[IV_RES]]
@@ -1456,20 +1568,29 @@ end:
 define i3 @t29_addnsw_i3(i3 %val, i3 %start, i3 %extraoffset) {
 ; CHECK-LABEL: @t29_addnsw_i3(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i3 @llvm.ctlz.i3(i3 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i3 3, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i3 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i3 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i3 @llvm.smax.i3(i3 [[VAL_NUMACTIVEBITS_OFFSET]], i3 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i3 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i3 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i3 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i3 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i3 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i3 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[NBITS]], i3 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i3 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i3 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i3 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i3 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i3 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nsw i3 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i3 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[NBITS]], i3 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i3 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i3 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i3 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i3 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i3(i3 [[IV_RES]], i3 [[NBITS_RES]], i3 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i3 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i3 [[IV_RES]]
@@ -1503,20 +1624,28 @@ end:
 define i1 @t30_addnuw_i1(i1 %val, i1 %start, i1 %extraoffset) {
 ; CHECK-LABEL: @t30_addnuw_i1(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i1 @llvm.ctlz.i1(i1 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i1 true, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i1 [[VAL_NUMACTIVEBITS]], [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i1 @llvm.smax.i1(i1 [[VAL_NUMACTIVEBITS_OFFSET]], i1 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i1 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i1 [[LOOP_BACKEDGETAKENCOUNT]], true
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i1 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i1 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i1 [[VAL_SHIFTED]], false
-; CHECK-NEXT:    [[IV_NEXT]] = add i1 [[IV]], true
-; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[NBITS]], i1 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i1 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i1 [ false, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i1 [[LOOP_IV]], true
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i1 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i1 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i1 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i1 [[IV]], true
+; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[NBITS]], i1 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i1 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i1 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i1 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i1 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i1(i1 [[IV_RES]], i1 [[NBITS_RES]], i1 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i1 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i1 [[IV_RES]]
@@ -1549,20 +1678,29 @@ end:
 define i2 @t31_addnuw_i2(i2 %val, i2 %start, i2 %extraoffset) {
 ; CHECK-LABEL: @t31_addnuw_i2(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i2 @llvm.ctlz.i2(i2 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw i2 -2, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i2 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i2 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i2 @llvm.smax.i2(i2 [[VAL_NUMACTIVEBITS_OFFSET]], i2 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i2 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw i2 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i2 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i2 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i2 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i2 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[NBITS]], i2 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i2 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i2 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw i2 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i2 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i2 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i2 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i2 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[NBITS]], i2 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i2 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i2 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i2 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i2 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i2(i2 [[IV_RES]], i2 [[NBITS_RES]], i2 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i2 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i2 [[IV_RES]]
@@ -1595,20 +1733,29 @@ end:
 define i3 @t32_addnuw_i3(i3 %val, i3 %start, i3 %extraoffset) {
 ; CHECK-LABEL: @t32_addnuw_i3(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i3 @llvm.ctlz.i3(i3 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i3 3, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i3 0, [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i3 [[VAL_NUMACTIVEBITS]], [[TMP0]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i3 @llvm.smax.i3(i3 [[VAL_NUMACTIVEBITS_OFFSET]], i3 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i3 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i3 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i3 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i3 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i3 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i3 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[NBITS]], i3 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i3 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i3 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i3 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i3 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i3 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = add nuw i3 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i3 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[NBITS]], i3 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i3 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i3 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i3 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i3 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i3(i3 [[IV_RES]], i3 [[NBITS_RES]], i3 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i3 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i3 [[IV_RES]]
@@ -1643,20 +1790,28 @@ end:
 define i1 @t33_subnsw_i1(i1 %val, i1 %start, i1 %extraoffset) {
 ; CHECK-LABEL: @t33_subnsw_i1(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i1 @llvm.ctlz.i1(i1 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i1 true, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i1 [[VAL_NUMACTIVEBITS]], [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i1 @llvm.smax.i1(i1 [[VAL_NUMACTIVEBITS_OFFSET]], i1 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i1 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i1 [[LOOP_BACKEDGETAKENCOUNT]], true
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i1 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i1 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i1 [[VAL_SHIFTED]], false
-; CHECK-NEXT:    [[IV_NEXT]] = add i1 [[IV]], true
-; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[NBITS]], i1 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i1 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i1 [ false, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i1 [[LOOP_IV]], true
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i1 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i1 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i1 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i1 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i1 [[IV]], true
+; CHECK-NEXT:    call void @escape_inner.i1(i1 [[IV]], i1 [[NBITS]], i1 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i1 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i1 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i1 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i1 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i1 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i1(i1 [[IV_RES]], i1 [[NBITS_RES]], i1 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i1 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i1 [[IV_RES]]
@@ -1689,20 +1844,28 @@ end:
 define i2 @t34_addnuw_i2(i2 %val, i2 %start, i2 %extraoffset) {
 ; CHECK-LABEL: @t34_addnuw_i2(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i2 @llvm.ctlz.i2(i2 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw i2 -2, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i2 [[VAL_NUMACTIVEBITS]], [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i2 @llvm.smax.i2(i2 [[VAL_NUMACTIVEBITS_OFFSET]], i2 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i2 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw i2 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i2 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i2 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i2 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i2 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[NBITS]], i2 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i2 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i2 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw i2 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i2 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i2 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i2 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i2 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i2 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i2(i2 [[IV]], i2 [[NBITS]], i2 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i2 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i2 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i2 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i2 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i2 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i2(i2 [[IV_RES]], i2 [[NBITS_RES]], i2 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i2 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i2 [[IV_RES]]
@@ -1735,20 +1898,28 @@ end:
 define i3 @t35_addnuw_i3(i3 %val, i3 %start, i3 %extraoffset) {
 ; CHECK-LABEL: @t35_addnuw_i3(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[VAL_NUMLEADINGZEROS:%.*]] = call i3 @llvm.ctlz.i3(i3 [[VAL:%.*]], i1 false)
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS:%.*]] = sub nuw nsw i3 3, [[VAL_NUMLEADINGZEROS]]
+; CHECK-NEXT:    [[VAL_NUMACTIVEBITS_OFFSET:%.*]] = add nsw i3 [[VAL_NUMACTIVEBITS]], [[EXTRAOFFSET:%.*]]
+; CHECK-NEXT:    [[IV_FINAL:%.*]] = call i3 @llvm.smax.i3(i3 [[VAL_NUMACTIVEBITS_OFFSET]], i3 [[START:%.*]])
+; CHECK-NEXT:    [[LOOP_BACKEDGETAKENCOUNT:%.*]] = sub nsw i3 [[IV_FINAL]], [[START]]
+; CHECK-NEXT:    [[LOOP_TRIPCOUNT:%.*]] = add nuw nsw i3 [[LOOP_BACKEDGETAKENCOUNT]], 1
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i3 [ [[START:%.*]], [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i3 [[IV]], [[EXTRAOFFSET:%.*]]
-; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL:%.*]], [[NBITS]]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO:%.*]] = icmp eq i3 [[VAL_SHIFTED]], 0
-; CHECK-NEXT:    [[IV_NEXT]] = add i3 [[IV]], 1
-; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[NBITS]], i3 [[VAL_SHIFTED]], i1 [[VAL_SHIFTED_ISZERO]], i3 [[IV_NEXT]])
-; CHECK-NEXT:    br i1 [[VAL_SHIFTED_ISZERO]], label [[END:%.*]], label [[LOOP]]
+; CHECK-NEXT:    [[LOOP_IV:%.*]] = phi i3 [ 0, [[ENTRY:%.*]] ], [ [[LOOP_IV_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[LOOP_IV_NEXT]] = add nuw nsw i3 [[LOOP_IV]], 1
+; CHECK-NEXT:    [[LOOP_IVCHECK:%.*]] = icmp eq i3 [[LOOP_IV_NEXT]], [[LOOP_TRIPCOUNT]]
+; CHECK-NEXT:    [[IV:%.*]] = add nsw i3 [[LOOP_IV]], [[START]]
+; CHECK-NEXT:    [[NBITS:%.*]] = sub nsw i3 [[IV]], [[EXTRAOFFSET]]
+; CHECK-NEXT:    [[VAL_SHIFTED:%.*]] = lshr i3 [[VAL]], [[NBITS]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add i3 [[IV]], 1
+; CHECK-NEXT:    call void @escape_inner.i3(i3 [[IV]], i3 [[NBITS]], i3 [[VAL_SHIFTED]], i1 [[LOOP_IVCHECK]], i3 [[IV_NEXT]])
+; CHECK-NEXT:    br i1 [[LOOP_IVCHECK]], label [[END:%.*]], label [[LOOP]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV]], [[LOOP]] ]
+; CHECK-NEXT:    [[IV_RES:%.*]] = phi i3 [ [[IV_FINAL]], [[LOOP]] ]
 ; CHECK-NEXT:    [[NBITS_RES:%.*]] = phi i3 [ [[NBITS]], [[LOOP]] ]
 ; CHECK-NEXT:    [[VAL_SHIFTED_RES:%.*]] = phi i3 [ [[VAL_SHIFTED]], [[LOOP]] ]
-; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[VAL_SHIFTED_ISZERO]], [[LOOP]] ]
+; CHECK-NEXT:    [[VAL_SHIFTED_ISZERO_RES:%.*]] = phi i1 [ [[LOOP_IVCHECK]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_NEXT_RES:%.*]] = phi i3 [ [[IV_NEXT]], [[LOOP]] ]
 ; CHECK-NEXT:    call void @escape_outer.i3(i3 [[IV_RES]], i3 [[NBITS_RES]], i3 [[VAL_SHIFTED_RES]], i1 [[VAL_SHIFTED_ISZERO_RES]], i3 [[IV_NEXT_RES]])
 ; CHECK-NEXT:    ret i3 [[IV_RES]]


        


More information about the llvm-commits mailing list