[llvm] 7fe41ac - Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"

Tue May 25 14:38:19 PDT 2021

If you have the before and after IR, that's really all I probably need.  
If you share those, I can take the investigation from there.

Philip

On 5/25/21 2:12 PM, Nemanja Ivanovic wrote:
> Hi Philip,
> I am sorry about the late reply - this was a 4 day weekend here in 
> Toronto. We kind of started with looking at the difference in the 
> post-LV IR with and without your patch for the SingleSource test case. 
> Stefan is currently looking at it and we're hoping to figure out 
> what's going on soon. Would it help you if we shared the two IR files 
> with you to get your opinion on what might be happening as well?
> Nemanja Ivanovic
> LLVM PPC Backend Development
> IBM Toronto Lab
> Email: nemanjai at ca.ibm.com
> Phone: 905-413-3388
>
>     ----- Original message -----
>     From: Philip Reames <listmail at philipreames.com>
>     To: Nemanja Ivanovic <nemanjai at ca.ibm.com>
>     Cc: akuegel at google.com, benny.kra at gmail.com,
>     llvm-commits at lists.llvm.org, llvmlistbot at llvm.org, LLVM on Power
>     <powerllvm at ca.ibm.com>
>     Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert "[LV]
>     Unconditionally branch from middle to scalar preheader if the
>     scalar loop must execute"
>     Date: Wed, May 19, 2021 2:15 PM
>
>     Looking at the various failures, I see both LE and BE bots
>     failing.  The BE bot shows a miscompile in one of the test suite
>     benchmarks.  The LE bots appear to be showing a crash while using
>     a stage1 build clang to build stage2 clang.
>
>     LE example: https://lab.llvm.org/buildbot#builders/19/builds/4236
>     <https://lab.llvm.org/buildbot#builders/19/builds/4236>
>     BE example: https://lab.llvm.org/buildbot#builders/100/builds/5762
>     <https://lab.llvm.org/buildbot#builders/100/builds/5762>
>
>     So both are showing miscompiles, just with different symptoms. 
>     Frankly, the BE looks easier to debug (much code making it into
>     the miscompiled binary.)
>
>     Oddly, pretty much only PPC bots are failing.  The one exception
>     is a stage2 failure on AArch64
>     (https://lab.llvm.org/buildbot/#/builders/111/builds/2027
>     <https://lab.llvm.org/buildbot/#/builders/111/builds/2027>) which
>     looks similar to the LE failure above.
>
>     Given this appears to be target specific, I am *guessing* there's
>     some vectorizer hook which is causing a different codepath to be
>     executed.  Before we start trying to get me access to hardware, do
>     you have any guesses on what that hook might be?  If you can give
>     me a good hint on where to look, I suspect I can probably find the
>     issue that way.
>
>     Philip
>
>     On 5/18/21 3:07 AM, Nemanja Ivanovic wrote:
>>     Hi Philip,
>>     I am not sure what happened with your first attempt to contact us
>>     and how we missed it. We would be more than happy to help you
>>     debug this issue. Do you know if this only affects the big endian
>>     bot or if it also fails on little endian bots? In the latter
>>     case, we can certainly provide access to a little endian machine
>>     hosted at OSU/OSL. In the former case, we don't have a machine
>>     available and we'll have to do the debugging and report to you
>>     (which might take a bit longer).
>>     Nemanja Ivanovic
>>     LLVM PPC Backend Development
>>     IBM Toronto Lab
>>     Email: nemanjai at ca.ibm.com <mailto:nemanjai at ca.ibm.com>
>>     Phone: 905-413-3388
>>
>>         ----- Original message -----
>>         From: Philip Reames <listmail at philipreames.com>
>>         <mailto:listmail at philipreames.com>
>>         To: Adrian Kuegel <akuegel at google.com>
>>         <mailto:akuegel at google.com>
>>         Cc: Benjamin Kramer <benny.kra at gmail.com>
>>         <mailto:benny.kra at gmail.com>, Adrian Kuegel
>>         <llvmlistbot at llvm.org> <mailto:llvmlistbot at llvm.org>,
>>         llvm-commits <llvm-commits at lists.llvm.org>
>>         <mailto:llvm-commits at lists.llvm.org>, powerllvm at ca.ibm.com
>>         <mailto:powerllvm at ca.ibm.com>
>>         Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert "[LV]
>>         Unconditionally branch from middle to scalar preheader if the
>>         scalar loop must execute"
>>         Date: Mon, May 17, 2021 11:59 PM
>>
>>         I tried another cycle to see if this had been resolved, but
>>         am still seeing build bot failures, nearly exclusively on
>>         PPC.  (And one arm self host bot.)  @PPC Bot Owner - I need
>>         help reducing a test case for the failure seen here: ‍ ‍ ‍ ‍
>>         ‍ ‍ ‍ ‍ ZjQcmQRYFpfptBannerStart
>>         This Message Is From an External Sender
>>         This message came from outside your organization.
>>         ZjQcmQRYFpfptBannerEnd
>>
>>         I tried another cycle to see if this had been resolved, but
>>         am still seeing build bot failures, nearly exclusively on
>>         PPC.  (And one arm self host bot.)
>>
>>         @PPC Bot Owner - I need help reducing a test case for the
>>         failure seen here:
>>         https://lab.llvm.org/buildbot#builders/100/builds/5762
>>         <https://lab.llvm.org/buildbot#builders/100/builds/5762>. I
>>         am stuck, and unable to make progress for nearly 3 months
>>         now.  I would greatly appreciate help.
>>
>>         Philip
>>
>>         On 5/5/21 11:59 PM, Adrian Kuegel wrote:
>>>         Sounds good to me, and thanks for sending the heads up :)
>>>         On Wed, May 5, 2021 at 10:01 PM Philip Reames
>>>         <listmail at philipreames.com
>>>         <mailto:listmail at philipreames.com>> wrote:
>>>
>>>             FYI, I'm going to try recommitting this without changes
>>>             in a day or so.
>>>
>>>             I never heard back from a PPC bot owner, and I don't
>>>             have enough
>>>             information to really debug anything from the builtbot
>>>             log.  I did run
>>>             across a latent issue which this patch may very well
>>>             have exposed at
>>>             much higher frequency; the previous patch in this series
>>>             (which is much
>>>             more restrictive) does appear to have increased
>>>             frequency.  That was
>>>             worked around in 80e80250.  My educated guess is that
>>>             same issue
>>>             triggered the miscompile seen on the ppc bot, but that
>>>             is more of a
>>>             guess than I'd really prefer.
>>>
>>>             I'm going to submit this during off hours, and watch the
>>>             bots fairly
>>>             closely after submit.  Hopefully this either cycles
>>>             clean or I get a
>>>             better clue as to what the root issue is.
>>>
>>>             Philip
>>>
>>>             On 2/8/21 9:09 PM, Philip Reames via llvm-commits wrote:
>>>             > Ben,
>>>             >
>>>             > Thanks for the clarification.  The log does not make
>>>             the fact this is
>>>             > an execution failure obvious.
>>>             >
>>>             > No, I don't have access to a PPC machine.
>>>             >
>>>             > I am going to need some assistance from the bot owner
>>>             on this. At a
>>>             > minimum, IR for the test in question (before
>>>             optimization, but on
>>>             > target platform) seems like a reasonable ask.
>>>             >
>>>             > I strongly suspect this change is simply exposing
>>>             another latent
>>>             > issue.  Or at least, I've reviewed the change and
>>>             don't see anything
>>>             > likely to cause runtime crashes w/o also tripping
>>>             compiler asserts.
>>>             >
>>>             > Philip
>>>             >
>>>             >
>>>             > On 2/8/21 5:21 AM, Benjamin Kramer wrote:
>>>             >> `execution_time` failures mean that the bot succeeded
>>>             building a test
>>>             >> but it failed when running it. I'm relatively certain
>>>             that this is the
>>>             >> same issue Adrian is seeing -- binaries segfaulting
>>>             early on PPC.
>>>             >>
>>>             >> The bot log output isn't helpful at all for
>>>             investigating why this is
>>>             >> happening. Do you happen to have access to a PPC machine?
>>>             >>
>>>             >> On Fri, Feb 5, 2021 at 6:05 PM Philip Reames via
>>>             llvm-commits
>>>             >> <llvm-commits at lists.llvm.org
>>>             <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>             >>> Adrian,
>>>             >>>
>>>             >>> I'm going to need you to provide a bit more
>>>             information here. The test
>>>             >>> failure in stage1 was fixed at the time you reverted
>>>             this patch.  The
>>>             >>> remaining failure in the bot is very unclear.  What
>>>             is a execution_time
>>>             >>> failure? From the log output, the "failing" run
>>>             finished in 0.5
>>>             >>> seconds,
>>>             >>> whereas the previous "succeeding" run finished in 11
>>>             seconds. Without
>>>             >>> further context, I'd say that's no failure.
>>>             >>>
>>>             >>> I'll also note that I did not receive email from
>>>             this bot.  I received
>>>             >>> notice from the various other bots and fixed the ARM
>>>             test issue, but
>>>             >>> unless I missed it in with the others, this bot is
>>>             not notifying.
>>>             >>>
>>>             >>> In general, I'm a fan of fast reverts, but I have to
>>>             admit, this one
>>>             >>> appears borderline at the moment.
>>>             >>>
>>>             >>> Philip
>>>             >>>
>>>             >>> On 2/5/21 3:53 AM, Adrian Kuegel via llvm-commits wrote:
>>>             >>>> Author: Adrian Kuegel
>>>             >>>> Date: 2021-02-05T12:51:03+01:00
>>>             >>>> New Revision: 7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c
>>>             >>>>
>>>             >>>> URL:
>>>             >>>>
>>>             https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c
>>>             <https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c>
>>>             >>>> DIFF:
>>>             >>>>
>>>             https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c.diff
>>>             <https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c.diff>
>>>             >>>>
>>>             >>>> LOG: Revert "[LV] Unconditionally branch from
>>>             middle to scalar
>>>             >>>> preheader if the scalar loop must execute"
>>>             >>>>
>>>             >>>> This reverts commit
>>>             3e5ce49e5371ce4feadbf97dd5c2b652d9db3d1d.
>>>             >>>>
>>>             >>>> Tests started failing on PPC, for example:
>>>             >>>> http://lab.llvm.org:8011/#/builders/105/builds/5569
>>>             <http://lab.llvm.org:8011/#/builders/105/builds/5569>
>>>             >>>>
>>>             >>>> Added:
>>>             >>>>
>>>             >>>>
>>>             >>>> Modified:
>>>             >>>> llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>             >>>> llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>             >>>>
>>>             llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>             >>>>
>>>             llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>             >>>> llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>             >>>>
>>>             >>>> Removed:
>>>             >>>>
>>>             >>>>
>>>             >>>>
>>>             >>>>
>>>             ################################################################################
>>>             >>>>
>>>             >>>> diff  --git
>>>             a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>             >>>> b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>             >>>> index 8a89158788cf..de4fb446fdf2 100644
>>>             >>>> --- a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>             >>>> +++ b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>             >>>> @@ -44,11 +44,11 @@
>>>             LoopVersioning::LoopVersioning(const
>>>             >>>> LoopAccessInfo &LAI,
>>>             >>>> AliasChecks(Checks.begin(), Checks.end()),
>>>             >>>> Preds(LAI.getPSE().getUnionPredicate()), LAI(LAI),
>>>             LI(LI),
>>>             >>>> DT(DT),
>>>             >>>>          SE(SE) {
>>>             >>>> + assert(L->getUniqueExitBlock() && "No single exit
>>>             block");
>>>             >>>>    }
>>>             >>>>
>>>             >>>>    void LoopVersioning::versionLoop(
>>>             >>>>        const SmallVectorImpl<Instruction *>
>>>             &DefsUsedOutside) {
>>>             >>>> - assert(VersionedLoop->getUniqueExitBlock() && "No
>>>             single exit
>>>             >>>> block");
>>>             >>>> assert(VersionedLoop->isLoopSimplifyForm() &&
>>>             >>>>             "Loop is not in loop-simplify form");
>>>             >>>>
>>>             >>>>
>>>             >>>> diff  --git
>>>             a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>             >>>> b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>             >>>> index 3277842edbfe..6bce0caeb36f 100644
>>>             >>>> --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>             >>>> +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>             >>>> @@ -852,7 +852,7 @@ class InnerLoopVectorizer {
>>>             >>>>      /// Middle Block between the vector and the
>>>             scalar.
>>>             >>>>      BasicBlock *LoopMiddleBlock;
>>>             >>>>
>>>             >>>> -  /// The unique ExitBlock of the scalar loop if
>>>             one exists.  Note
>>>             >>>> that
>>>             >>>> +  /// The (unique) ExitBlock of the scalar loop. 
>>>             Note that
>>>             >>>>      /// there can be multiple exiting edges
>>>             reaching this block.
>>>             >>>>      BasicBlock *LoopExitBlock;
>>>             >>>>
>>>             >>>> @@ -3147,13 +3147,9 @@ void
>>>             >>>>
>>>             InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
>>>             >>>> DT->getNode(Bypass)->getIDom()) &&
>>>             >>>>             "TC check is expected to dominate Bypass");
>>>             >>>>
>>>             >>>> -  // Update dominator for Bypass & LoopExit (if
>>>             needed).
>>>             >>>> +  // Update dominator for Bypass & LoopExit.
>>>             >>>> DT->changeImmediateDominator(Bypass, TCCheckBlock);
>>>             >>>> -  if (!Cost->requiresScalarEpilogue())
>>>             >>>> -    // If there is an epilogue which must run,
>>>             there's no edge
>>>             >>>> from the
>>>             >>>> -    // middle block to exit blocks  and thus no
>>>             need to update the
>>>             >>>> immediate
>>>             >>>> -    // dominator of the exit blocks.
>>>             >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>             TCCheckBlock);
>>>             >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>             TCCheckBlock);
>>>             >>>>
>>>             >>>>      ReplaceInstWithInst(
>>>             >>>> TCCheckBlock->getTerminator(),
>>>             >>>> @@ -3192,11 +3188,7 @@ void
>>>             >>>> InnerLoopVectorizer::emitSCEVChecks(Loop *L,
>>>             BasicBlock *Bypass) {
>>>             >>>>      // Update dominator only if this is first RT
>>>             check.
>>>             >>>>      if (LoopBypassBlocks.empty()) {
>>>             >>>> DT->changeImmediateDominator(Bypass, SCEVCheckBlock);
>>>             >>>> -    if (!Cost->requiresScalarEpilogue())
>>>             >>>> -      // If there is an epilogue which must run,
>>>             there's no edge
>>>             >>>> from the
>>>             >>>> -      // middle block to exit blocks  and thus no
>>>             need to update
>>>             >>>> the immediate
>>>             >>>> -      // dominator of the exit blocks.
>>>             >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>             SCEVCheckBlock);
>>>             >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>             SCEVCheckBlock);
>>>             >>>>      }
>>>             >>>>
>>>             >>>>      ReplaceInstWithInst(
>>>             >>>> @@ -3252,11 +3244,7 @@ void
>>>             >>>> InnerLoopVectorizer::emitMemRuntimeChecks(Loop *L,
>>>             BasicBlock
>>>             >>>> *Bypass) {
>>>             >>>>      // Update dominator only if this is first RT
>>>             check.
>>>             >>>>      if (LoopBypassBlocks.empty()) {
>>>             >>>> DT->changeImmediateDominator(Bypass, MemCheckBlock);
>>>             >>>> -    if (!Cost->requiresScalarEpilogue())
>>>             >>>> -      // If there is an epilogue which must run,
>>>             there's no edge
>>>             >>>> from the
>>>             >>>> -      // middle block to exit blocks  and thus no
>>>             need to update
>>>             >>>> the immediate
>>>             >>>> -      // dominator of the exit blocks.
>>>             >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>             MemCheckBlock);
>>>             >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>             MemCheckBlock);
>>>             >>>>      }
>>>             >>>>
>>>             >>>>      Instruction *FirstCheckInst;
>>>             >>>> @@ -3381,10 +3369,9 @@ Value
>>>             >>>> *InnerLoopVectorizer::emitTransformedIndex(
>>>             >>>>    Loop
>>>             *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>             >>>> Prefix) {
>>>             >>>>      LoopScalarBody = OrigLoop->getHeader();
>>>             >>>>      LoopVectorPreHeader =
>>>             OrigLoop->getLoopPreheader();
>>>             >>>> +  LoopExitBlock = OrigLoop->getUniqueExitBlock();
>>>             >>>> +  assert(LoopExitBlock && "Must have an exit block");
>>>             >>>>      assert(LoopVectorPreHeader && "Invalid loop
>>>             structure");
>>>             >>>> -  LoopExitBlock = OrigLoop->getUniqueExitBlock();
>>>             // may be nullptr
>>>             >>>> -  assert((LoopExitBlock ||
>>>             Cost->requiresScalarEpilogue()) &&
>>>             >>>> -         "multiple exit loop without required
>>>             epilogue?");
>>>             >>>>
>>>             >>>>      LoopMiddleBlock =
>>>             >>>> SplitBlock(LoopVectorPreHeader,
>>>             >>>> LoopVectorPreHeader->getTerminator(), DT,
>>>             >>>> @@ -3393,20 +3380,12 @@ Loop
>>>             >>>>
>>>             *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>             Prefix) {
>>>             >>>> SplitBlock(LoopMiddleBlock,
>>>             >>>> LoopMiddleBlock->getTerminator(), DT, LI,
>>>             >>>>                     nullptr, Twine(Prefix) +
>>>             "scalar.ph <http://scalar.ph>");
>>>             >>>>
>>>             >>>> +  // Set up branch from middle block to the exit
>>>             and scalar
>>>             >>>> preheader blocks.
>>>             >>>> +  // completeLoopSkeleton will update the
>>>             condition to use an
>>>             >>>> iteration check,
>>>             >>>> +  // if required to decide whether to execute the
>>>             remainder.
>>>             >>>> +  BranchInst *BrInst =
>>>             >>>> + BranchInst::Create(LoopExitBlock,
>>>             LoopScalarPreHeader,
>>>             >>>> Builder.getTrue());
>>>             >>>>      auto *ScalarLatchTerm =
>>>             >>>> OrigLoop->getLoopLatch()->getTerminator();
>>>             >>>> -
>>>             >>>> -  // Set up the middle block terminator.  Two cases:
>>>             >>>> -  // 1) If we know that we must execute the scalar
>>>             epilogue, emit an
>>>             >>>> -  //    unconditional branch.
>>>             >>>> -  // 2) Otherwise, we must have a single unique
>>>             exit block (due to
>>>             >>>> how we
>>>             >>>> -  //    implement the multiple exit case).  In
>>>             this case, set up a
>>>             >>>> conditonal
>>>             >>>> -  //    branch from the middle block to the loop
>>>             scalar preheader,
>>>             >>>> and the
>>>             >>>> -  //    exit block. completeLoopSkeleton will
>>>             update the
>>>             >>>> condition to use an
>>>             >>>> -  //    iteration check, if required to decide
>>>             whether to execute
>>>             >>>> the remainder.
>>>             >>>> -  BranchInst *BrInst =
>>>             Cost->requiresScalarEpilogue() ?
>>>             >>>> - BranchInst::Create(LoopScalarPreHeader) :
>>>             >>>> - BranchInst::Create(LoopExitBlock,
>>>             LoopScalarPreHeader,
>>>             >>>> - Builder.getTrue());
>>>             >>>> BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
>>>             >>>>
>>>             ReplaceInstWithInst(LoopMiddleBlock->getTerminator(),
>>>             BrInst);
>>>             >>>>
>>>             >>>> @@ -3418,11 +3397,7 @@ Loop
>>>             >>>>
>>>             *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>             Prefix) {
>>>             >>>>                     nullptr, nullptr, Twine(Prefix)
>>>             + "vector.body");
>>>             >>>>
>>>             >>>>      // Update dominator for loop exit.
>>>             >>>> -  if (!Cost->requiresScalarEpilogue())
>>>             >>>> -    // If there is an epilogue which must run,
>>>             there's no edge
>>>             >>>> from the
>>>             >>>> -    // middle block to exit blocks  and thus no
>>>             need to update the
>>>             >>>> immediate
>>>             >>>> -    // dominator of the exit blocks.
>>>             >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>             LoopMiddleBlock);
>>>             >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>             LoopMiddleBlock);
>>>             >>>>
>>>             >>>>      // Create and register the new vector loop.
>>>             >>>>      Loop *Lp = LI->AllocateLoop();
>>>             >>>> @@ -3519,14 +3494,10 @@ BasicBlock
>>>             >>>> *InnerLoopVectorizer::completeLoopSkeleton(Loop *L,
>>>             >>>>      auto *ScalarLatchTerm =
>>>             >>>> OrigLoop->getLoopLatch()->getTerminator();
>>>             >>>>
>>>             >>>>      // Add a check in the middle block to see if
>>>             we have completed
>>>             >>>> -  // all of the iterations in the first vector
>>>             loop.  Three cases:
>>>             >>>> -  // 1) If we require a scalar epilogue, there is
>>>             no conditional
>>>             >>>> branch as
>>>             >>>> -  //    we unconditionally branch to the scalar
>>>             preheader. Do
>>>             >>>> nothing.
>>>             >>>> -  // 2) If (N - N%VF) == N, then we *don't* need
>>>             to run the
>>>             >>>> remainder.
>>>             >>>> -  //    Thus if tail is to be folded, we know we
>>>             don't need to run
>>>             >>>> the
>>>             >>>> -  //    remainder and we can use the previous
>>>             value for the
>>>             >>>> condition (true).
>>>             >>>> -  // 3) Otherwise, construct a runtime check.
>>>             >>>> -  if (!Cost->requiresScalarEpilogue() &&
>>>             >>>> !Cost->foldTailByMasking()) {
>>>             >>>> +  // all of the iterations in the first vector loop.
>>>             >>>> +  // If (N - N%VF) == N, then we *don't* need to
>>>             run the remainder.
>>>             >>>> +  // If tail is to be folded, we know we don't
>>>             need to run the
>>>             >>>> remainder.
>>>             >>>> +  if (!Cost->foldTailByMasking()) {
>>>             >>>>        Instruction *CmpN =
>>>             CmpInst::Create(Instruction::ICmp,
>>>             >>>> CmpInst::ICMP_EQ,
>>>             >>>> Count, VectorTripCount,
>>>             >>>> "cmp.n",
>>>             >>>> LoopMiddleBlock->getTerminator());
>>>             >>>> @@ -3590,17 +3561,17 @@ BasicBlock
>>>             >>>> *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
>>>             >>>>      |    [  ]_|   <-- vector loop.
>>>             >>>>      |     |
>>>             >>>>      |     v
>>>             >>>> -  \   -[ ]   <--- middle-block.
>>>             >>>> -   \/   |
>>>             >>>> -   /\   v
>>>             >>>> -   | ->[ ]     <--- new preheader.
>>>             >>>> +  |   -[ ]   <--- middle-block.
>>>             >>>> +  |  /  |
>>>             >>>> +  | /   v
>>>             >>>> +  -|- >[ ]     <--- new preheader.
>>>             >>>>       |    |
>>>             >>>> - (opt)  v      <-- edge from middle to exit iff
>>>             epilogue is not
>>>             >>>> required.
>>>             >>>> +   |    v
>>>             >>>>       |   [ ] \
>>>             >>>> -   |   [ ]_|   <-- old scalar loop to handle
>>>             remainder (scalar
>>>             >>>> epilogue).
>>>             >>>> +   |   [ ]_|   <-- old scalar loop to handle
>>>             remainder.
>>>             >>>>        \   |
>>>             >>>>         \  v
>>>             >>>> -      >[ ]     <-- exit block(s).
>>>             >>>> +      >[ ]     <-- exit block.
>>>             >>>>       ...
>>>             >>>>       */
>>>             >>>>
>>>             >>>> @@ -4021,18 +3992,13 @@ void
>>>             >>>> InnerLoopVectorizer::fixVectorizedLoop() {
>>>             >>>>      // Forget the original basic block.
>>>             >>>> PSE.getSE()->forgetLoop(OrigLoop);
>>>             >>>>
>>>             >>>> -  // If we inserted an edge from the middle block
>>>             to the unique
>>>             >>>> exit block,
>>>             >>>> -  // update uses outside the loop (phis) to
>>>             account for the newly
>>>             >>>> inserted
>>>             >>>> -  // edge.
>>>             >>>> -  if (!Cost->requiresScalarEpilogue()) {
>>>             >>>> -    // Fix-up external users of the induction
>>>             variables.
>>>             >>>> -    for (auto &Entry : Legal->getInductionVars())
>>>             >>>> - fixupIVUsers(Entry.first, Entry.second,
>>>             >>>> -
>>>             getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
>>>             >>>> - IVEndValues[Entry.first], LoopMiddleBlock);
>>>             >>>> +  // Fix-up external users of the induction variables.
>>>             >>>> +  for (auto &Entry : Legal->getInductionVars())
>>>             >>>> +    fixupIVUsers(Entry.first, Entry.second,
>>>             >>>> +
>>>             getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
>>>             >>>> + IVEndValues[Entry.first], LoopMiddleBlock);
>>>             >>>>
>>>             >>>> -    fixLCSSAPHIs();
>>>             >>>> -  }
>>>             >>>> +  fixLCSSAPHIs();
>>>             >>>>      for (Instruction *PI : PredicatedInstructions)
>>>             >>>> sinkScalarOperands(&*PI);
>>>             >>>>
>>>             >>>> @@ -4250,13 +4216,12 @@ void
>>>             >>>>
>>>             InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {
>>>             >>>>      // recurrence in the exit block, and then add
>>>             an edge for the
>>>             >>>> middle block.
>>>             >>>>      // Note that LCSSA does not imply single entry
>>>             when the
>>>             >>>> original scalar loop
>>>             >>>>      // had multiple exiting edges (as we always
>>>             run the last
>>>             >>>> iteration in the
>>>             >>>> -  // scalar epilogue); in that case, there is no
>>>             edge from middle
>>>             >>>> to exit and
>>>             >>>> -  // and thus no phis which needed updated.
>>>             >>>> -  if (!Cost->requiresScalarEpilogue())
>>>             >>>> -    for (PHINode &LCSSAPhi : LoopExitBlock->phis())
>>>             >>>> -      if (any_of(LCSSAPhi.incoming_values(),
>>>             >>>> -                 [Phi](Value *V) { return V ==
>>>             Phi; }))
>>>             >>>> - LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop,
>>>             >>>> LoopMiddleBlock);
>>>             >>>> +  // scalar epilogue); in that case, the exiting
>>>             path through
>>>             >>>> middle will be
>>>             >>>> +  // dynamically dead and the value picked for the
>>>             phi doesn't
>>>             >>>> matter.
>>>             >>>> +  for (PHINode &LCSSAPhi : LoopExitBlock->phis())
>>>             >>>> +    if (any_of(LCSSAPhi.incoming_values(),
>>>             >>>> +               [Phi](Value *V) { return V == Phi; }))
>>>             >>>> + LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop,
>>>             >>>> LoopMiddleBlock);
>>>             >>>>    }
>>>             >>>>
>>>             >>>>    void InnerLoopVectorizer::fixReduction(PHINode
>>>             *Phi) {
>>>             >>>> @@ -4421,11 +4386,10 @@ void
>>>             >>>> InnerLoopVectorizer::fixReduction(PHINode *Phi) {
>>>             >>>>      // We know that the loop is in LCSSA form. We
>>>             need to update
>>>             >>>> the PHI nodes
>>>             >>>>      // in the exit blocks. See comment on
>>>             analogous loop in
>>>             >>>>      // fixFirstOrderRecurrence for a more complete
>>>             explaination of
>>>             >>>> the logic.
>>>             >>>> -  if (!Cost->requiresScalarEpilogue())
>>>             >>>> -    for (PHINode &LCSSAPhi : LoopExitBlock->phis())
>>>             >>>> -      if (any_of(LCSSAPhi.incoming_values(),
>>>             >>>> - [LoopExitInst](Value *V) { return V ==
>>>             >>>> LoopExitInst; }))
>>>             >>>> - LCSSAPhi.addIncoming(ReducedPartRdx,
>>>             LoopMiddleBlock);
>>>             >>>> +  for (PHINode &LCSSAPhi : LoopExitBlock->phis())
>>>             >>>> +    if (any_of(LCSSAPhi.incoming_values(),
>>>             >>>> + [LoopExitInst](Value *V) { return V ==
>>>             >>>> LoopExitInst; }))
>>>             >>>> + LCSSAPhi.addIncoming(ReducedPartRdx,
>>>             LoopMiddleBlock);
>>>             >>>>
>>>             >>>>      // Fix the scalar loop reduction variable with
>>>             the incoming
>>>             >>>> reduction sum
>>>             >>>>      // from the vector body and from the backedge
>>>             value.
>>>             >>>> @@ -8074,11 +8038,7 @@ BasicBlock
>>>             >>>>
>>>             *EpilogueVectorizerMainLoop::emitMinimumIterationCountCheck(
>>>             >>>>
>>>             >>>>        // Update dominator for Bypass & LoopExit.
>>>             >>>> DT->changeImmediateDominator(Bypass, TCCheckBlock);
>>>             >>>> -    if (!Cost->requiresScalarEpilogue())
>>>             >>>> -      // For loops with multiple exits, there's no
>>>             edge from the
>>>             >>>> middle block
>>>             >>>> -      // to exit blocks (as the epilogue must run)
>>>             and thus no
>>>             >>>> need to update
>>>             >>>> -      // the immediate dominator of the exit blocks.
>>>             >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>             TCCheckBlock);
>>>             >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>             TCCheckBlock);
>>>             >>>>
>>>             >>>> LoopBypassBlocks.push_back(TCCheckBlock);
>>>             >>>>
>>>             >>>> @@ -8142,12 +8102,7 @@
>>>             >>>>
>>>             EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton()
>>>             >>>> {
>>>             >>>>
>>>             >>>> DT->changeImmediateDominator(LoopScalarPreHeader,
>>>             >>>> EPI.EpilogueIterationCountCheck);
>>>             >>>> -  if (!Cost->requiresScalarEpilogue())
>>>             >>>> -    // If there is an epilogue which must run,
>>>             there's no edge
>>>             >>>> from the
>>>             >>>> -    // middle block to exit blocks  and thus no
>>>             need to update the
>>>             >>>> immediate
>>>             >>>> -    // dominator of the exit blocks.
>>>             >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>             >>>> - EPI.EpilogueIterationCountCheck);
>>>             >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>             >>>> EPI.EpilogueIterationCountCheck);
>>>             >>>>
>>>             >>>>      // Keep track of bypass blocks, as they feed
>>>             start values to
>>>             >>>> the induction
>>>             >>>>      // phis in the scalar loop preheader.
>>>             >>>>
>>>             >>>> diff  --git
>>>             >>>>
>>>             a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>             >>>>
>>>             b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>             >>>> index ec280bf5d5e4..7d4a3c5c9935 100644
>>>             >>>> ---
>>>             >>>>
>>>             a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>             >>>> +++
>>>             >>>>
>>>             b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>             >>>> @@ -471,9 +471,10 @@ define i16 @multiple_exit(i16*
>>>             %p, i32 %n) {
>>>             >>>>    ; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i32
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP15]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32
>>>             [[TMP2]], [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] =
>>>             extractelement
>>>             >>>> <4 x i16> [[WIDE_LOAD]], i32 3
>>>             >>>>    ; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] =
>>>             >>>> extractelement <4 x i16> [[WIDE_LOAD]], i32 2
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[IF_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi
>>>             i16 [ 0,
>>>             >>>> [[ENTRY:%.*]] ], [ [[VECTOR_RECUR_EXTRACT]],
>>>             [[MIDDLE_BLOCK]] ]
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>             >>>> @@ -485,14 +486,14 @@ define i16
>>>             @multiple_exit(i16* %p, i32 %n) {
>>>             >>>>    ; CHECK-NEXT:    [[B:%.*]] = getelementptr
>>>             inbounds i16, i16*
>>>             >>>> [[P]], i64 [[IPROM]]
>>>             >>>>    ; CHECK-NEXT: [[REC_NEXT]] = load i16, i16*
>>>             [[B]], align 2
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32
>>>             [[I]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[IF_END:%.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label [[IF_END]]
>>>             >>>>    ; CHECK:       for.body:
>>>             >>>>    ; CHECK-NEXT:    store i16 [[SCALAR_RECUR]],
>>>             i16* [[B]], align 4
>>>             >>>>    ; CHECK-NEXT:    [[INC]] = add nsw i32 [[I]], 1
>>>             >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]],
>>>             2096
>>>             >>>>    ; CHECK-NEXT:    br i1 [[CMP2]], label
>>>             [[FOR_COND]], label
>>>             >>>> [[IF_END]], [[LOOP7:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       if.end:
>>>             >>>> -; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [
>>>             [[SCALAR_RECUR]],
>>>             >>>> [[FOR_BODY]] ], [ [[SCALAR_RECUR]], [[FOR_COND]] ]
>>>             >>>> +; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [
>>>             [[SCALAR_RECUR]],
>>>             >>>> [[FOR_BODY]] ], [ [[SCALAR_RECUR]], [[FOR_COND]] ], [
>>>             >>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
>>>             >>>>    ; CHECK-NEXT:    ret i16 [[REC_LCSSA]]
>>>             >>>>    ;
>>>             >>>>    entry:
>>>             >>>> @@ -557,9 +558,10 @@ define i16
>>>             @multiple_exit2(i16* %p, i32 %n) {
>>>             >>>>    ; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i32
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP15]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32
>>>             [[TMP2]], [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] =
>>>             extractelement
>>>             >>>> <4 x i16> [[WIDE_LOAD]], i32 3
>>>             >>>>    ; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] =
>>>             >>>> extractelement <4 x i16> [[WIDE_LOAD]], i32 2
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[IF_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi
>>>             i16 [ 0,
>>>             >>>> [[ENTRY:%.*]] ], [ [[VECTOR_RECUR_EXTRACT]],
>>>             [[MIDDLE_BLOCK]] ]
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>             >>>> @@ -571,14 +573,14 @@ define i16
>>>             @multiple_exit2(i16* %p, i32 %n) {
>>>             >>>>    ; CHECK-NEXT:    [[B:%.*]] = getelementptr
>>>             inbounds i16, i16*
>>>             >>>> [[P]], i64 [[IPROM]]
>>>             >>>>    ; CHECK-NEXT: [[REC_NEXT]] = load i16, i16*
>>>             [[B]], align 2
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32
>>>             [[I]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[IF_END:%.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label [[IF_END]]
>>>             >>>>    ; CHECK:       for.body:
>>>             >>>>    ; CHECK-NEXT:    store i16 [[SCALAR_RECUR]],
>>>             i16* [[B]], align 4
>>>             >>>>    ; CHECK-NEXT:    [[INC]] = add nsw i32 [[I]], 1
>>>             >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]],
>>>             2096
>>>             >>>>    ; CHECK-NEXT:    br i1 [[CMP2]], label
>>>             [[FOR_COND]], label
>>>             >>>> [[IF_END]], [[LOOP9:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       if.end:
>>>             >>>> -; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [
>>>             [[SCALAR_RECUR]],
>>>             >>>> [[FOR_COND]] ], [ 10, [[FOR_BODY]] ]
>>>             >>>> +; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [
>>>             [[SCALAR_RECUR]],
>>>             >>>> [[FOR_COND]] ], [ 10, [[FOR_BODY]] ], [
>>>             >>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
>>>             >>>>    ; CHECK-NEXT:    ret i16 [[REC_LCSSA]]
>>>             >>>>    ;
>>>             >>>>    entry:
>>>             >>>>
>>>             >>>> diff  --git
>>>             >>>>
>>>             a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>             >>>>
>>>             b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>             >>>> index f0ba677348ab..0d4bdf0ecac3 100644
>>>             >>>> ---
>>>             a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>             >>>> +++
>>>             b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>             >>>> @@ -447,7 +447,7 @@ define void
>>>             @even_load_static_tc(i32* noalias
>>>             >>>> nocapture readonly %A, i32* noalia
>>>             >>>>    ; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64
>>>             [[INDEX_NEXT]], 508
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP6]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 false, label
>>>             [[FOR_COND_CLEANUP:%.*]],
>>>             >>>> label [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [
>>>             1016,
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
>>>             >>>> @@ -463,7 +463,7 @@ define void
>>>             @even_load_static_tc(i32* noalias
>>>             >>>> nocapture readonly %A, i32* noalia
>>>             >>>>    ; CHECK-NEXT:    store i32 [[MUL]], i32*
>>>             [[ARRAYIDX2]], align 4
>>>             >>>>    ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64
>>>             >>>> [[INDVARS_IV]], 2
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i64
>>>             [[INDVARS_IV]], 1022
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_COND_CLEANUP:%.*]], [[LOOP13:!llvm.loop !.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_COND_CLEANUP]], [[LOOP13:!llvm.loop !.*]]
>>>             >>>>    ;
>>>             >>>>    entry:
>>>             >>>>      br label %for.body
>>>             >>>> @@ -528,7 +528,7 @@ define void
>>>             @even_load_dynamic_tc(i32* noalias
>>>             >>>> nocapture readonly %A, i32* noali
>>>             >>>>    ; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP12]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 false, label
>>>             [[FOR_COND_CLEANUP:%.*]],
>>>             >>>> label [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [
>>>             [[IND_END]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
>>>             >>>> @@ -544,7 +544,7 @@ define void
>>>             @even_load_dynamic_tc(i32* noalias
>>>             >>>> nocapture readonly %A, i32* noali
>>>             >>>>    ; CHECK-NEXT:    store i32 [[MUL]], i32*
>>>             [[ARRAYIDX2]], align 4
>>>             >>>>    ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64
>>>             >>>> [[INDVARS_IV]], 2
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i64
>>>             [[INDVARS_IV_NEXT]],
>>>             >>>> [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_COND_CLEANUP:%.*]], [[LOOP15:!llvm.loop !.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_COND_CLEANUP]], [[LOOP15:!llvm.loop !.*]]
>>>             >>>>    ;
>>>             >>>>    entry:
>>>             >>>>      br label %for.body
>>>             >>>> @@ -973,7 +973,7 @@ define void
>>>             @PR27626_0(%pair.i32 *%p, i32 %z,
>>>             >>>> i64 %n) {
>>>             >>>>    ; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP19]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP24:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 false, label
>>>             [[FOR_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
>>>             >>>> @@ -985,7 +985,7 @@ define void
>>>             @PR27626_0(%pair.i32 *%p, i32 %z,
>>>             >>>> i64 %n) {
>>>             >>>>    ; CHECK-NEXT:    store i32 [[Z]], i32*
>>>             [[P_I_Y]], align 4
>>>             >>>>    ; CHECK-NEXT:    [[I_NEXT]] = add nuw nsw i64
>>>             [[I]], 1
>>>             >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt i64
>>>             [[I_NEXT]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END:%.*]], [[LOOP25:!llvm.loop !.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END]], [[LOOP25:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       for.end:
>>>             >>>>    ; CHECK-NEXT:    ret void
>>>             >>>>    ;
>>>             >>>> @@ -1066,7 +1066,7 @@ define i32
>>>             @PR27626_1(%pair.i32 *%p, i64 %n) {
>>>             >>>>    ; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector
>>>             <4 x i32>
>>>             >>>> [[BIN_RDX]], <4 x i32> poison, <4 x i32> <i32 1,
>>>             i32 undef, i32
>>>             >>>> undef, i32 undef>
>>>             >>>>    ; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32>
>>>             [[BIN_RDX]],
>>>             >>>> [[RDX_SHUF3]]
>>>             >>>>    ; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4
>>>             x i32>
>>>             >>>> [[BIN_RDX4]], i32 0
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 false, label
>>>             [[FOR_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [
>>>             [[TMP19]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>             >>>> @@ -1081,9 +1081,10 @@ define i32
>>>             @PR27626_1(%pair.i32 *%p, i64 %n) {
>>>             >>>>    ; CHECK-NEXT:    [[TMP21]] = add nsw i32
>>>             [[TMP20]], [[S]]
>>>             >>>>    ; CHECK-NEXT:    [[I_NEXT]] = add nuw nsw i64
>>>             [[I]], 1
>>>             >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt i64
>>>             [[I_NEXT]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END:%.*]], [[LOOP27:!llvm.loop !.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END]], [[LOOP27:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       for.end:
>>>             >>>> -; CHECK-NEXT:    ret i32 [[TMP21]]
>>>             >>>> +; CHECK-NEXT:    [[TMP22:%.*]] = phi i32 [
>>>             [[TMP21]], [[FOR_BODY]]
>>>             >>>> ], [ [[TMP19]], [[MIDDLE_BLOCK]] ]
>>>             >>>> +; CHECK-NEXT:    ret i32 [[TMP22]]
>>>             >>>>    ;
>>>             >>>>    entry:
>>>             >>>>      br label %for.body
>>>             >>>> @@ -1162,7 +1163,7 @@ define void
>>>             @PR27626_2(%pair.i32 *%p, i64 %n,
>>>             >>>> i32 %z) {
>>>             >>>>    ; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP20]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP28:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 false, label
>>>             [[FOR_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
>>>             >>>> @@ -1176,7 +1177,7 @@ define void
>>>             @PR27626_2(%pair.i32 *%p, i64 %n,
>>>             >>>> i32 %z) {
>>>             >>>>    ; CHECK-NEXT:    store i32 [[TMP21]], i32*
>>>             [[P_I_Y]], align 4
>>>             >>>>    ; CHECK-NEXT:    [[I_NEXT]] = add nuw nsw i64
>>>             [[I]], 1
>>>             >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt i64
>>>             [[I_NEXT]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END:%.*]], [[LOOP29:!llvm.loop !.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END]], [[LOOP29:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       for.end:
>>>             >>>>    ; CHECK-NEXT:    ret void
>>>             >>>>    ;
>>>             >>>> @@ -1263,7 +1264,7 @@ define i32
>>>             @PR27626_3(%pair.i32 *%p, i64 %n,
>>>             >>>> i32 %z) {
>>>             >>>>    ; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector
>>>             <4 x i32>
>>>             >>>> [[BIN_RDX]], <4 x i32> poison, <4 x i32> <i32 1,
>>>             i32 undef, i32
>>>             >>>> undef, i32 undef>
>>>             >>>>    ; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32>
>>>             [[BIN_RDX]],
>>>             >>>> [[RDX_SHUF3]]
>>>             >>>>    ; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4
>>>             x i32>
>>>             >>>> [[BIN_RDX4]], i32 0
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    br i1 false, label
>>>             [[FOR_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [
>>>             [[TMP22]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>             >>>> @@ -1281,9 +1282,10 @@ define i32
>>>             @PR27626_3(%pair.i32 *%p, i64 %n,
>>>             >>>> i32 %z) {
>>>             >>>>    ; CHECK-NEXT:    [[TMP25]] = add nsw i32
>>>             [[TMP24]], [[S]]
>>>             >>>>    ; CHECK-NEXT:    [[I_NEXT]] = add nuw nsw i64
>>>             [[I]], 1
>>>             >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt i64
>>>             [[I_NEXT]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END:%.*]], [[LOOP31:!llvm.loop !.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[FOR_END]], [[LOOP31:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       for.end:
>>>             >>>> -; CHECK-NEXT:    ret i32 [[TMP25]]
>>>             >>>> +; CHECK-NEXT:    [[TMP26:%.*]] = phi i32 [
>>>             [[TMP25]], [[FOR_BODY]]
>>>             >>>> ], [ [[TMP22]], [[MIDDLE_BLOCK]] ]
>>>             >>>> +; CHECK-NEXT:    ret i32 [[TMP26]]
>>>             >>>>    ;
>>>             >>>>    entry:
>>>             >>>>      br label %for.body
>>>             >>>>
>>>             >>>> diff  --git
>>>             a/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>             >>>> b/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>             >>>> index f32002fae2b6..91780789088b 100644
>>>             >>>> --- a/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>             >>>> +++ b/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>             >>>> @@ -146,14 +146,15 @@ define void @early_exit(i16*
>>>             %p, i32 %n) {
>>>             >>>>    ; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP10]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32
>>>             [[TMP1]], [[N_VEC]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[IF_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_COND:%.*]]
>>>             >>>>    ; CHECK:       for.cond:
>>>             >>>>    ; CHECK-NEXT:    [[I:%.*]] = phi i32 [
>>>             [[BC_RESUME_VAL]],
>>>             >>>> [[SCALAR_PH]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32
>>>             [[I]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[IF_END:%.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label [[IF_END]]
>>>             >>>>    ; CHECK:       for.body:
>>>             >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
>>>             >>>>    ; CHECK-NEXT:    [[B:%.*]] = getelementptr
>>>             inbounds i16, i16*
>>>             >>>> [[P]], i64 [[IPROM]]
>>>             >>>> @@ -285,14 +286,15 @@ define void
>>>             @multiple_unique_exit(i16* %p,
>>>             >>>> i32 %n) {
>>>             >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP11]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32
>>>             [[TMP2]], [[N_VEC]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[IF_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_COND:%.*]]
>>>             >>>>    ; CHECK:       for.cond:
>>>             >>>>    ; CHECK-NEXT:    [[I:%.*]] = phi i32 [
>>>             [[BC_RESUME_VAL]],
>>>             >>>> [[SCALAR_PH]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32
>>>             [[I]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[IF_END:%.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label [[IF_END]]
>>>             >>>>    ; CHECK:       for.body:
>>>             >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
>>>             >>>>    ; CHECK-NEXT:    [[B:%.*]] = getelementptr
>>>             inbounds i16, i16*
>>>             >>>> [[P]], i64 [[IPROM]]
>>>             >>>> @@ -372,14 +374,17 @@ define i32
>>>             @multiple_unique_exit2(i16* %p,
>>>             >>>> i32 %n) {
>>>             >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP11]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32
>>>             [[TMP2]], [[N_VEC]]
>>>             >>>> +; CHECK-NEXT: [[IND_ESCAPE:%.*]] = sub i32
>>>             [[N_VEC]], 1
>>>             >>>> +; CHECK-NEXT: [[IND_ESCAPE1:%.*]] = sub i32
>>>             [[N_VEC]], 1
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[IF_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_COND:%.*]]
>>>             >>>>    ; CHECK:       for.cond:
>>>             >>>>    ; CHECK-NEXT:    [[I:%.*]] = phi i32 [
>>>             [[BC_RESUME_VAL]],
>>>             >>>> [[SCALAR_PH]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32
>>>             [[I]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[IF_END:%.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label [[IF_END]]
>>>             >>>>    ; CHECK:       for.body:
>>>             >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
>>>             >>>>    ; CHECK-NEXT:    [[B:%.*]] = getelementptr
>>>             inbounds i16, i16*
>>>             >>>> [[P]], i64 [[IPROM]]
>>>             >>>> @@ -388,7 +393,7 @@ define i32
>>>             @multiple_unique_exit2(i16* %p, i32
>>>             >>>> %n) {
>>>             >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]],
>>>             2096
>>>             >>>>    ; CHECK-NEXT:    br i1 [[CMP2]], label
>>>             [[FOR_COND]], label
>>>             >>>> [[IF_END]], [[LOOP9:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       if.end:
>>>             >>>> -; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]],
>>>             [[FOR_BODY]]
>>>             >>>> ], [ [[I]], [[FOR_COND]] ]
>>>             >>>> +; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]],
>>>             [[FOR_BODY]]
>>>             >>>> ], [ [[I]], [[FOR_COND]] ], [ [[IND_ESCAPE1]],
>>>             [[MIDDLE_BLOCK]] ]
>>>             >>>>    ; CHECK-NEXT:    ret i32 [[I_LCSSA]]
>>>             >>>>    ;
>>>             >>>>    ; TAILFOLD-LABEL: @multiple_unique_exit2(
>>>             >>>> @@ -461,14 +466,15 @@ define i32
>>>             @multiple_unique_exit3(i16* %p,
>>>             >>>> i32 %n) {
>>>             >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32
>>>             [[INDEX_NEXT]],
>>>             >>>> [[N_VEC]]
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP11]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP10:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32
>>>             [[TMP2]], [[N_VEC]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[IF_END:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [
>>>             [[N_VEC]],
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[FOR_COND:%.*]]
>>>             >>>>    ; CHECK:       for.cond:
>>>             >>>>    ; CHECK-NEXT:    [[I:%.*]] = phi i32 [
>>>             [[BC_RESUME_VAL]],
>>>             >>>> [[SCALAR_PH]] ], [ [[INC:%.*]], [[FOR_BODY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32
>>>             [[I]], [[N]]
>>>             >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label
>>>             >>>> [[IF_END:%.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>             [[FOR_BODY]], label [[IF_END]]
>>>             >>>>    ; CHECK:       for.body:
>>>             >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
>>>             >>>>    ; CHECK-NEXT:    [[B:%.*]] = getelementptr
>>>             inbounds i16, i16*
>>>             >>>> [[P]], i64 [[IPROM]]
>>>             >>>> @@ -477,7 +483,7 @@ define i32
>>>             @multiple_unique_exit3(i16* %p, i32
>>>             >>>> %n) {
>>>             >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]],
>>>             2096
>>>             >>>>    ; CHECK-NEXT:    br i1 [[CMP2]], label
>>>             [[FOR_COND]], label
>>>             >>>> [[IF_END]], [[LOOP11:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       if.end:
>>>             >>>> -; CHECK-NEXT:    [[EXIT:%.*]] = phi i32 [ 0,
>>>             [[FOR_COND]] ], [ 1,
>>>             >>>> [[FOR_BODY]] ]
>>>             >>>> +; CHECK-NEXT:    [[EXIT:%.*]] = phi i32 [ 0,
>>>             [[FOR_COND]] ], [ 1,
>>>             >>>> [[FOR_BODY]] ], [ 0, [[MIDDLE_BLOCK]] ]
>>>             >>>>    ; CHECK-NEXT:    ret i32 [[EXIT]]
>>>             >>>>    ;
>>>             >>>>    ; TAILFOLD-LABEL: @multiple_unique_exit3(
>>>             >>>> @@ -994,7 +1000,8 @@ define void
>>>             @scalar_predication(float* %addr) {
>>>             >>>>    ; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64
>>>             [[INDEX_NEXT]], 200
>>>             >>>>    ; CHECK-NEXT:    br i1 [[TMP10]], label
>>>             [[MIDDLE_BLOCK:%.*]],
>>>             >>>> label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       middle.block:
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 201, 200
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[EXIT:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 200,
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    br label [[LOOP_HEADER:%.*]]
>>>             >>>> @@ -1002,7 +1009,7 @@ define void
>>>             @scalar_predication(float* %addr) {
>>>             >>>>    ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [
>>>             [[BC_RESUME_VAL]],
>>>             >>>> [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]],
>>>             [[LOOP_LATCH:%.*]] ]
>>>             >>>>    ; CHECK-NEXT:    [[GEP:%.*]] = getelementptr
>>>             float, float*
>>>             >>>> [[ADDR]], i64 [[IV]]
>>>             >>>>    ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64
>>>             [[IV]], 200
>>>             >>>> -; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label
>>>             [[EXIT:%.*]], label
>>>             >>>> [[LOOP_BODY:%.*]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label
>>>             [[EXIT]], label
>>>             >>>> [[LOOP_BODY:%.*]]
>>>             >>>>    ; CHECK:       loop.body:
>>>             >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = load float, float*
>>>             [[GEP]],
>>>             >>>> align 4
>>>             >>>>    ; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq float
>>>             [[TMP11]],
>>>             >>>> 0.000000e+00
>>>             >>>> @@ -1088,7 +1095,8 @@ define i32 @me_reduction(i32*
>>>             %addr) {
>>>             >>>>    ; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector
>>>             <2 x i32>
>>>             >>>> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32
>>>             undef>
>>>             >>>>    ; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2 x i32>
>>>             [[TMP5]],
>>>             >>>> [[RDX_SHUF]]
>>>             >>>>    ; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x
>>>             i32>
>>>             >>>> [[BIN_RDX]], i32 0
>>>             >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>             >>>> +; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 201, 200
>>>             >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>             [[EXIT:%.*]], label
>>>             >>>> [[SCALAR_PH]]
>>>             >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>             >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 200,
>>>             >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>             >>>>    ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [
>>>             0, [[ENTRY]]
>>>             >>>> ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
>>>             >>>> @@ -1098,7 +1106,7 @@ define i32 @me_reduction(i32*
>>>             %addr) {
>>>             >>>>    ; CHECK-NEXT: [[ACCUM:%.*]] = phi i32 [
>>>             [[BC_MERGE_RDX]],
>>>             >>>> [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.*]], [[LOOP_LATCH]] ]
>>>             >>>>    ; CHECK-NEXT:    [[GEP:%.*]] = getelementptr
>>>             i32, i32* [[ADDR]],
>>>             >>>> i64 [[IV]]
>>>             >>>>    ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64
>>>             [[IV]], 200
>>>             >>>> -; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label
>>>             [[EXIT:%.*]], label
>>>             >>>> [[LOOP_LATCH]]
>>>             >>>> +; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label
>>>             [[EXIT]], label
>>>             >>>> [[LOOP_LATCH]]
>>>             >>>>    ; CHECK:       loop.latch:
>>>             >>>>    ; CHECK-NEXT: [[TMP8:%.*]] = load i32, i32*
>>>             [[GEP]], align 4
>>>             >>>>    ; CHECK-NEXT: [[ACCUM_NEXT]] = add i32
>>>             [[ACCUM]], [[TMP8]]
>>>             >>>> @@ -1106,7 +1114,7 @@ define i32 @me_reduction(i32*
>>>             %addr) {
>>>             >>>>    ; CHECK-NEXT: [[EXITCOND2_NOT:%.*]] = icmp eq
>>>             i64 [[IV]], 400
>>>             >>>>    ; CHECK-NEXT:    br i1 [[EXITCOND2_NOT]], label
>>>             [[EXIT]], label
>>>             >>>> [[LOOP_HEADER]], [[LOOP15:!llvm.loop !.*]]
>>>             >>>>    ; CHECK:       exit:
>>>             >>>> -; CHECK-NEXT:    [[LCSSA:%.*]] = phi i32 [ 0,
>>>             [[LOOP_HEADER]] ], [
>>>             >>>> [[ACCUM_NEXT]], [[LOOP_LATCH]] ]
>>>             >>>> +; CHECK-NEXT:    [[LCSSA:%.*]] = phi i32 [ 0,
>>>             [[LOOP_HEADER]] ], [
>>>             >>>> [[ACCUM_NEXT]], [[LOOP_LATCH]] ], [ [[TMP7]],
>>>             [[MIDDLE_BLOCK]] ]
>>>             >>>>    ; CHECK-NEXT:    ret i32 [[LCSSA]]
>>>             >>>>    ;
>>>             >>>>    ; TAILFOLD-LABEL: @me_reduction(
>>>             >>>>
>>>             >>>>
>>>             >>>>
>>>             >>>> _______________________________________________
>>>             >>>> llvm-commits mailing list
>>>             >>>> llvm-commits at lists.llvm.org
>>>             <mailto:llvm-commits at lists.llvm.org>
>>>             >>>>
>>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>             <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>             >>> _______________________________________________
>>>             >>> llvm-commits mailing list
>>>             >>> llvm-commits at lists.llvm.org
>>>             <mailto:llvm-commits at lists.llvm.org>
>>>             >>>
>>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>             <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>             > _______________________________________________
>>>             > llvm-commits mailing list
>>>             > llvm-commits at lists.llvm.org
>>>             <mailto:llvm-commits at lists.llvm.org>
>>>             >
>>>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>             <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210525/c7ab18e7/attachment-0001.html>