[llvm] 5c0d1b2 - [LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 3 14:10:07 PDT 2021


Author: Philip Reames
Date: 2021-06-03T14:09:16-07:00
New Revision: 5c0d1b2f902aa6a9cf47cc7e42c5b83bb2217cf9

URL: https://github.com/llvm/llvm-project/commit/5c0d1b2f902aa6a9cf47cc7e42c5b83bb2217cf9
DIFF: https://github.com/llvm/llvm-project/commit/5c0d1b2f902aa6a9cf47cc7e42c5b83bb2217cf9.diff

LOG: [LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process

This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.

The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.

Differential Revision: https://reviews.llvm.org/D103620

Added: 
    

Modified: 
    llvm/include/llvm/Transforms/Utils/UnrollLoop.h
    llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
    llvm/lib/Transforms/Utils/LoopUnroll.cpp
    llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
    llvm/test/Transforms/LoopUnroll/pr45939-peel-count-and-complete-unroll.ll

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
index 11e5b3b41eb65..1f09f648c0df8 100644
--- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
+++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h
@@ -70,7 +70,6 @@ struct UnrollLoopOptions {
   bool Force;
   bool AllowRuntime;
   bool AllowExpensiveTripCount;
-  bool PreserveCondBr;
   unsigned TripMultiple;
   unsigned PeelCount;
   bool UnrollRemainder;

diff  --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index 6ee0ce78cb480..e228b06917187 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1166,8 +1166,7 @@ static LoopUnrollResult tryToUnrollLoop(
   LoopUnrollResult UnrollResult = UnrollLoop(
       L,
       {UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount,
-       UseUpperBound, TripMultiple, PP.PeelCount, UP.UnrollRemainder,
-       ForgetAllSCEV},
+       TripMultiple, PP.PeelCount, UP.UnrollRemainder, ForgetAllSCEV},
       LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop);
   if (UnrollResult == LoopUnrollResult::Unmodified)
     return LoopUnrollResult::Unmodified;

diff  --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 5f7395d4cbec7..fa0a51424a8a1 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -245,18 +245,9 @@ void llvm::simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI,
 /// branch instruction. However, if the trip count (and multiple) are not known,
 /// loop unrolling will mostly produce more code that is no faster.
 ///
-/// TripCount is the upper bound of the iteration on which control exits
-/// LatchBlock. Control may exit the loop prior to TripCount iterations either
-/// via an early branch in other loop block or via LatchBlock terminator. This
-/// is relaxed from the general definition of trip count which is the number of
-/// times the loop header executes. Note that UnrollLoop assumes that the loop
-/// counter test is in LatchBlock in order to remove unnecesssary instances of
-/// the test.  If control can exit the loop from the LatchBlock's terminator
-/// prior to TripCount iterations, flag PreserveCondBr needs to be set.
-///
-/// PreserveCondBr indicates whether the conditional branch of the LatchBlock
-/// needs to be preserved.  It is needed when we use trip count upper bound to
-/// fully unroll the loop.
+/// TripCount is an upper bound on the number of times the loop header runs.
+/// Note that the trip count does not need to be exact, it can be any upper
+/// bound on the true trip count.
 ///
 /// Similarly, TripMultiple divides the number of times that the LatchBlock may
 /// execute without exiting the loop.
@@ -329,18 +320,6 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
   assert(ULO.TripMultiple > 0);
   assert(ULO.TripCount == 0 || ULO.TripCount % ULO.TripMultiple == 0);
 
-  // Are we eliminating the loop control altogether?
-  bool CompletelyUnroll = ULO.Count == ULO.TripCount;
-
-  // We assume a run-time trip count if the compiler cannot
-  // figure out the loop trip count and the unroll-runtime
-  // flag is specified.
-  bool RuntimeTripCount =
-      (ULO.TripCount == 0 && ULO.Count > 0 && ULO.AllowRuntime);
-
-  assert((!RuntimeTripCount || !ULO.PeelCount) &&
-         "Did not expect runtime trip-count unrolling "
-         "and peeling for the same loop");
 
   bool Peeled = false;
   if (ULO.PeelCount) {
@@ -360,6 +339,21 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
     }
   }
 
+  // Are we eliminating the loop control altogether?  Note that we can know
+  // we're eliminating the backedge without knowing exactly which iteration
+  // of the unrolled body exits.
+  const bool CompletelyUnroll = ULO.Count == ULO.TripCount;
+
+  // We assume a run-time trip count if the compiler cannot
+  // figure out the loop trip count and the unroll-runtime
+  // flag is specified.
+  bool RuntimeTripCount =
+      (ULO.TripCount == 0 && ULO.Count > 0 && ULO.AllowRuntime);
+
+  assert((!RuntimeTripCount || !ULO.PeelCount) &&
+         "Did not expect runtime trip-count unrolling "
+         "and peeling for the same loop");
+
   // All these values should be taken only after peeling because they might have
   // changed.
   BasicBlock *Preheader = L->getLoopPreheader();
@@ -417,6 +411,10 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
       dbgs() << "  No single exiting block\n";
   });
 
+  const unsigned ExactTripCount = ExitingBI ?
+    SE->getSmallConstantTripCount(L,ExitingBI->getParent()) : 0;
+  const bool ExactUnroll = (ExactTripCount && ExactTripCount == ULO.Count);
+
   // Loops containing convergent instructions must have a count that divides
   // their TripMultiple.
   LLVM_DEBUG(
@@ -759,9 +757,21 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
 
     auto WillExit = [&](unsigned i, unsigned j) -> Optional<bool> {
       if (CompletelyUnroll) {
-        if (ULO.PreserveCondBr && j && !(PreserveOnlyFirst && i != 0))
-          return None;
-        return j == 0;
+        if (PreserveOnlyFirst) {
+          if (i == 0)
+            return None;
+          return j == 0;
+        }
+        if (ExactUnroll)
+          return j == 0;
+        // Full, but non-exact unrolling
+        if (j == 0)
+          return true;
+        if (MaxTripCount && j >= MaxTripCount)
+          return false;
+        if (ExactTripCount && j != ExactTripCount)
+          return false;
+        return None;
       }
 
       if (RuntimeTripCount && j != 0)

diff  --git a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
index cc85fb68df1f2..767b0728cdf35 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
@@ -986,8 +986,7 @@ bool llvm::UnrollRuntimeLoopRemainder(
         UnrollLoop(remainderLoop,
                    {/*Count*/ Count - 1, /*TripCount*/ Count - 1,
                     /*Force*/ false, /*AllowRuntime*/ false,
-                    /*AllowExpensiveTripCount*/ false, /*PreserveCondBr*/ true,
-                    /*TripMultiple*/ 1,
+                    /*AllowExpensiveTripCount*/ false, /*TripMultiple*/ 1,
                     /*PeelCount*/ 0, /*UnrollRemainder*/ false, ForgetAllSCEV},
                    LI, SE, DT, AC, TTI, /*ORE*/ nullptr, PreserveLCSSA);
   }

diff  --git a/llvm/test/Transforms/LoopUnroll/pr45939-peel-count-and-complete-unroll.ll b/llvm/test/Transforms/LoopUnroll/pr45939-peel-count-and-complete-unroll.ll
index d7325692fab32..0b9ea76b3c80d 100644
--- a/llvm/test/Transforms/LoopUnroll/pr45939-peel-count-and-complete-unroll.ll
+++ b/llvm/test/Transforms/LoopUnroll/pr45939-peel-count-and-complete-unroll.ll
@@ -36,19 +36,47 @@ define void @test1() {
 ; PEEL2:       entry.peel.newph:
 ; PEEL2-NEXT:    br label [[FOR_BODY:%.*]]
 ; PEEL2:       for.body:
-; PEEL2-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_PEEL4]]
-; PEEL2-NEXT:    [[TMP2:%.*]] = trunc i64 [[INDVARS_IV_NEXT_PEEL4]] to i32
+; PEEL2-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PEEL4]], [[ENTRY_PEEL_NEWPH]] ], [ [[INDVARS_IV_NEXT_7:%.*]], [[FOR_BODY_6:%.*]] ]
+; PEEL2-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV]]
+; PEEL2-NEXT:    [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32
 ; PEEL2-NEXT:    store i32 [[TMP2]], i32* [[ARRAYIDX]], align 4
-; PEEL2-NEXT:    store i32 3, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 3), align 4
-; PEEL2-NEXT:    store i32 4, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 4), align 4
-; PEEL2-NEXT:    store i32 5, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 5), align 4
-; PEEL2-NEXT:    store i32 6, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 6), align 4
-; PEEL2-NEXT:    store i32 7, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 0, i64 7), align 4
-; PEEL2-NEXT:    store i32 8, i32* getelementptr inbounds ([8 x i32], [8 x i32]* @a, i64 1, i64 0), align 4
-; PEEL2-NEXT:    store i32 9, i32* getelementptr ([8 x i32], [8 x i32]* @a, i64 1, i64 1), align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; PEEL2-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT]]
+; PEEL2-NEXT:    [[TMP3:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
+; PEEL2-NEXT:    store i32 [[TMP3]], i32* [[ARRAYIDX_1]], align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT]], 1
+; PEEL2-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_1]]
+; PEEL2-NEXT:    [[TMP4:%.*]] = trunc i64 [[INDVARS_IV_NEXT_1]] to i32
+; PEEL2-NEXT:    store i32 [[TMP4]], i32* [[ARRAYIDX_2]], align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_1]], 1
+; PEEL2-NEXT:    [[ARRAYIDX_3:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_2]]
+; PEEL2-NEXT:    [[TMP5:%.*]] = trunc i64 [[INDVARS_IV_NEXT_2]] to i32
+; PEEL2-NEXT:    store i32 [[TMP5]], i32* [[ARRAYIDX_3]], align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
+; PEEL2-NEXT:    [[ARRAYIDX_4:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_3]]
+; PEEL2-NEXT:    [[TMP6:%.*]] = trunc i64 [[INDVARS_IV_NEXT_3]] to i32
+; PEEL2-NEXT:    store i32 [[TMP6]], i32* [[ARRAYIDX_4]], align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_3]], 1
+; PEEL2-NEXT:    [[ARRAYIDX_5:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_4]]
+; PEEL2-NEXT:    [[TMP7:%.*]] = trunc i64 [[INDVARS_IV_NEXT_4]] to i32
+; PEEL2-NEXT:    store i32 [[TMP7]], i32* [[ARRAYIDX_5]], align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_4]], 1
+; PEEL2-NEXT:    [[EXITCOND_5:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT_5]], 8
+; PEEL2-NEXT:    br i1 [[EXITCOND_5]], label [[FOR_BODY_6]], label [[FOR_EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
+; PEEL2:       for.exit.loopexit:
 ; PEEL2-NEXT:    br label [[FOR_EXIT]]
 ; PEEL2:       for.exit:
 ; PEEL2-NEXT:    ret void
+; PEEL2:       for.body.6:
+; PEEL2-NEXT:    [[ARRAYIDX_6:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_5]]
+; PEEL2-NEXT:    [[TMP8:%.*]] = trunc i64 [[INDVARS_IV_NEXT_5]] to i32
+; PEEL2-NEXT:    store i32 [[TMP8]], i32* [[ARRAYIDX_6]], align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_5]], 1
+; PEEL2-NEXT:    [[ARRAYIDX_7:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_6]]
+; PEEL2-NEXT:    [[TMP9:%.*]] = trunc i64 [[INDVARS_IV_NEXT_6]] to i32
+; PEEL2-NEXT:    store i32 [[TMP9]], i32* [[ARRAYIDX_7]], align 4
+; PEEL2-NEXT:    [[INDVARS_IV_NEXT_7]] = add nuw nsw i64 [[INDVARS_IV_NEXT_6]], 1
+; PEEL2-NEXT:    br label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
 ;
 ; PEEL8-LABEL: @test1(
 ; PEEL8-NEXT:  entry:
@@ -132,40 +160,58 @@ define void @test1() {
 ; PEEL8:       entry.peel.newph:
 ; PEEL8-NEXT:    br label [[FOR_BODY:%.*]]
 ; PEEL8:       for.body:
-; PEEL8-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_PEEL34]]
-; PEEL8-NEXT:    [[TMP8:%.*]] = trunc i64 [[INDVARS_IV_NEXT_PEEL34]] to i32
+; PEEL8-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_PEEL34]], [[ENTRY_PEEL_NEWPH]] ], [ [[INDVARS_IV_NEXT_7:%.*]], [[FOR_BODY_7:%.*]] ]
+; PEEL8-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV]]
+; PEEL8-NEXT:    [[TMP8:%.*]] = trunc i64 [[INDVARS_IV]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP8]], i32* [[ARRAYIDX]], align 4
-; PEEL8-NEXT:    [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_PEEL34]], 1
+; PEEL8-NEXT:    [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY_1:%.*]], label [[FOR_EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
+; PEEL8:       for.exit.loopexit:
+; PEEL8-NEXT:    br label [[FOR_EXIT]]
+; PEEL8:       for.exit:
+; PEEL8-NEXT:    ret void
+; PEEL8:       for.body.1:
 ; PEEL8-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT]]
 ; PEEL8-NEXT:    [[TMP9:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP9]], i32* [[ARRAYIDX_1]], align 4
 ; PEEL8-NEXT:    [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY_2:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
+; PEEL8:       for.body.2:
 ; PEEL8-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_1]]
 ; PEEL8-NEXT:    [[TMP10:%.*]] = trunc i64 [[INDVARS_IV_NEXT_1]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP10]], i32* [[ARRAYIDX_2]], align 4
 ; PEEL8-NEXT:    [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_1]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY_3:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
+; PEEL8:       for.body.3:
 ; PEEL8-NEXT:    [[ARRAYIDX_3:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_2]]
 ; PEEL8-NEXT:    [[TMP11:%.*]] = trunc i64 [[INDVARS_IV_NEXT_2]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP11]], i32* [[ARRAYIDX_3]], align 4
 ; PEEL8-NEXT:    [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY_4:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
+; PEEL8:       for.body.4:
 ; PEEL8-NEXT:    [[ARRAYIDX_4:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_3]]
 ; PEEL8-NEXT:    [[TMP12:%.*]] = trunc i64 [[INDVARS_IV_NEXT_3]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP12]], i32* [[ARRAYIDX_4]], align 4
 ; PEEL8-NEXT:    [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_3]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY_5:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
+; PEEL8:       for.body.5:
 ; PEEL8-NEXT:    [[ARRAYIDX_5:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_4]]
 ; PEEL8-NEXT:    [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT_4]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP13]], i32* [[ARRAYIDX_5]], align 4
 ; PEEL8-NEXT:    [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_4]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY_6:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
+; PEEL8:       for.body.6:
 ; PEEL8-NEXT:    [[ARRAYIDX_6:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_5]]
 ; PEEL8-NEXT:    [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT_5]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP14]], i32* [[ARRAYIDX_6]], align 4
 ; PEEL8-NEXT:    [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_5]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY_7]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
+; PEEL8:       for.body.7:
 ; PEEL8-NEXT:    [[ARRAYIDX_7:%.*]] = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 [[INDVARS_IV_NEXT_6]]
 ; PEEL8-NEXT:    [[TMP15:%.*]] = trunc i64 [[INDVARS_IV_NEXT_6]] to i32
 ; PEEL8-NEXT:    store i32 [[TMP15]], i32* [[ARRAYIDX_7]], align 4
-; PEEL8-NEXT:    br label [[FOR_EXIT]]
-; PEEL8:       for.exit:
-; PEEL8-NEXT:    ret void
+; PEEL8-NEXT:    [[INDVARS_IV_NEXT_7]] = add nuw nsw i64 [[INDVARS_IV_NEXT_6]], 1
+; PEEL8-NEXT:    br i1 true, label [[FOR_BODY]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP2:![0-9]+]]
 ;
 ; PEEL2UNROLL2-LABEL: @test1(
 ; PEEL2UNROLL2-NEXT:  entry:


        


More information about the llvm-commits mailing list