[llvm] 7fe41ac - Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"

Thu May 27 15:01:58 PDT 2021

Thank you, thank you, thank you!

This should be enough for me to find and address the issue.

Thank you for the help!

Philip

On 5/27/21 11:29 AM, Stefan Pintilie wrote:
> Hi Philip,
> We have managed to track down more information on the issue causing 
> the failure with the LV patch.
> It seems that Cost->requiresScalarEpilogue() sometimes returns true 
> when the scalar epilogue is actually not required.
> One of the failing tests is:
> test-suite/SingleSource/Benchmarks/Misc/oourafft.c
> That test has a loop that runs a fixed 1024 times. If the max vector 
> interleave is a factor of 1024 then Cost->requiresScalarEpilogue() 
> still returns true even though the scalar epilogue is not actually 
> required.
> The following examples were run on a Little Endian machine:
> FAIL - clang -DNDEBUG -fuse-ld=ld  -O3 -DNDEBUG  -w -Werror=date-time 
> -mllvm -force-vector-interleave=4 
> ${TESTSUITE}/SingleSource/Benchmarks/Misc/oourafft.c -lm
> FAIL - clang -DNDEBUG -fuse-ld=ld  -O3 -DNDEBUG  -w -Werror=date-time 
> -mllvm -force-vector-interleave=8 
> ${TESTSUITE}/SingleSource/Benchmarks/Misc/oourafft.c -lm
> PASS - clang -DNDEBUG -fuse-ld=ld  -O3 -DNDEBUG  -w -Werror=date-time 
> -mllvm -force-vector-interleave=12 
> ${TESTSUITE}/SingleSource/Benchmarks/Misc/oourafft.c -lm
> Looking at the IR immediately following LV we see this.
> Working IR:
> middle.block:
>   %cmp.n = icmp eq i64 1024, 1024
>   br i1 %cmp.n, label %for.end, label %scalar.ph
> Failing IR:
> middle.block:
>   br label %scalar.ph
>
> We traced the problem to InterleavedAccessInfo::analyzeInterleaving 
> where we set RequiresScalarEpilogue to true.
> We probably shouldn't be setting that to TRUE for this test 
> case. Perhaps it matters that the loop wasn't actually vectorized in 
> this case (VF == 1). That may impact the computation of 
> RequiresScalarEpilogue.
> Hope this helps,
> Stefan
>
>     ----- Original message -----
>     From: Stefan Pintilie via llvm-commits <llvm-commits at lists.llvm.org>
>     Sent by: "llvm-commits" <llvm-commits-bounces at lists.llvm.org>
>     To: listmail at philipreames.com
>     Cc: llvmlistbot at llvm.org, llvm-commits at lists.llvm.org,
>     benny.kra at gmail.com, Nemanja Ivanovic <nemanjai at ca.ibm.com>,
>     akuegel at google.com, LLVM on Power <powerllvm at ca.ibm.com>
>     Subject: [EXTERNAL] RE: [llvm] 7fe41ac - Revert "[LV]
>     Unconditionally branch from middle to scalar preheader if the
>     scalar loop must execute"
>     Date: Tue, May 25, 2021 7:53 PM
>     Hi Philip,
>
>     I have run a -print-after-all and I'm going to send you the IR
>     before and after the Loop Vectorize pass. Hopefully this will help.
>     Let me know if you need more information or if you want me to run
>     something on my end.
>     Best,
>     Stefan
>
>         ----- Original message -----
>         From: Philip Reames <listmail at philipreames.com>
>         To: Nemanja Ivanovic <nemanjai at ca.ibm.com>
>         Cc: akuegel at google.com, benny.kra at gmail.com,
>         llvm-commits at lists.llvm.org, llvmlistbot at llvm.org, LLVM on
>         Power <powerllvm at ca.ibm.com>, Stefan Pintilie <stefanp at ca.ibm.com>
>         Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert "[LV]
>         Unconditionally branch from middle to scalar preheader if the
>         scalar loop must execute"
>         Date: Tue, May 25, 2021 5:38 PM
>
>         If you have the before and after IR, that's really all I
>         probably need. If you share those, I can take the
>         investigation from there. Philip On 5/25/21 2:12 PM, Nemanja
>         Ivanovic wrote: ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
>         ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
>         ‍ ZjQcmQRYFpfptBannerStart
>         This Message Is From an External Sender
>         This message came from outside your organization.
>         ZjQcmQRYFpfptBannerEnd
>
>         If you have the before and after IR, that's really all I
>         probably need.  If you share those, I can take the
>         investigation from there.
>
>         Philip
>
>         On 5/25/21 2:12 PM, Nemanja Ivanovic wrote:
>>         Hi Philip,
>>         I am sorry about the late reply - this was a 4 day weekend
>>         here in Toronto. We kind of started with looking at the
>>         difference in the post-LV IR with and without your patch for
>>         the SingleSource test case. Stefan is currently looking at it
>>         and we're hoping to figure out what's going on soon. Would it
>>         help you if we shared the two IR files with you to get your
>>         opinion on what might be happening as well?
>>         Nemanja Ivanovic
>>         LLVM PPC Backend Development
>>         IBM Toronto Lab
>>         Email: nemanjai at ca.ibm.com <mailto:nemanjai at ca.ibm.com>
>>         Phone: 905-413-3388
>>
>>             ----- Original message -----
>>             From: Philip Reames <listmail at philipreames.com>
>>             <mailto:listmail at philipreames.com>
>>             To: Nemanja Ivanovic <nemanjai at ca.ibm.com>
>>             <mailto:nemanjai at ca.ibm.com>
>>             Cc: akuegel at google.com <mailto:akuegel at google.com>,
>>             benny.kra at gmail.com <mailto:benny.kra at gmail.com>,
>>             llvm-commits at lists.llvm.org
>>             <mailto:llvm-commits at lists.llvm.org>,
>>             llvmlistbot at llvm.org <mailto:llvmlistbot at llvm.org>, LLVM
>>             on Power <powerllvm at ca.ibm.com> <mailto:powerllvm at ca.ibm.com>
>>             Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert "[LV]
>>             Unconditionally branch from middle to scalar preheader if
>>             the scalar loop must execute"
>>             Date: Wed, May 19, 2021 2:15 PM
>>
>>             Looking at the various failures, I see both LE and BE
>>             bots failing.  The BE bot shows a miscompile in one of
>>             the test suite benchmarks. The LE bots appear to be
>>             showing a crash while using a stage1 build clang to build
>>             stage2 clang.
>>
>>             LE example:
>>             https://lab.llvm.org/buildbot#builders/19/builds/4236
>>             <https://lab.llvm.org/buildbot#builders/19/builds/4236>
>>             BE example:
>>             https://lab.llvm.org/buildbot#builders/100/builds/5762
>>             <https://lab.llvm.org/buildbot#builders/100/builds/5762>
>>
>>             So both are showing miscompiles, just with different
>>             symptoms.  Frankly, the BE looks easier to debug (much
>>             code making it into the miscompiled binary.)
>>
>>             Oddly, pretty much only PPC bots are failing. The one
>>             exception is a stage2 failure on AArch64
>>             (https://lab.llvm.org/buildbot/#/builders/111/builds/2027
>>             <https://lab.llvm.org/buildbot/#/builders/111/builds/2027>)
>>             which looks similar to the LE failure above.
>>
>>             Given this appears to be target specific, I am *guessing*
>>             there's some vectorizer hook which is causing a different
>>             codepath to be executed. Before we start trying to get me
>>             access to hardware, do you have any guesses on what that
>>             hook might be?  If you can give me a good hint on where
>>             to look, I suspect I can probably find the issue that way.
>>
>>             Philip
>>
>>             On 5/18/21 3:07 AM, Nemanja Ivanovic wrote:
>>>             Hi Philip,
>>>             I am not sure what happened with your first attempt to
>>>             contact us and how we missed it. We would be more than
>>>             happy to help you debug this issue. Do you know if this
>>>             only affects the big endian bot or if it also fails on
>>>             little endian bots? In the latter case, we can certainly
>>>             provide access to a little endian machine hosted at
>>>             OSU/OSL. In the former case, we don't have a machine
>>>             available and we'll have to do the debugging and report
>>>             to you (which might take a bit longer).
>>>             Nemanja Ivanovic
>>>             LLVM PPC Backend Development
>>>             IBM Toronto Lab
>>>             Email: nemanjai at ca.ibm.com <mailto:nemanjai at ca.ibm.com>
>>>             Phone: 905-413-3388
>>>
>>>                 ----- Original message -----
>>>                 From: Philip Reames <listmail at philipreames.com>
>>>                 <mailto:listmail at philipreames.com>
>>>                 To: Adrian Kuegel <akuegel at google.com>
>>>                 <mailto:akuegel at google.com>
>>>                 Cc: Benjamin Kramer <benny.kra at gmail.com>
>>>                 <mailto:benny.kra at gmail.com>, Adrian Kuegel
>>>                 <llvmlistbot at llvm.org>
>>>                 <mailto:llvmlistbot at llvm.org>, llvm-commits
>>>                 <llvm-commits at lists.llvm.org>
>>>                 <mailto:llvm-commits at lists.llvm.org>,
>>>                 powerllvm at ca.ibm.com <mailto:powerllvm at ca.ibm.com>
>>>                 Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert
>>>                 "[LV] Unconditionally branch from middle to scalar
>>>                 preheader if the scalar loop must execute"
>>>                 Date: Mon, May 17, 2021 11:59 PM
>>>
>>>                 I tried another cycle to see if this had been
>>>                 resolved, but am still seeing build bot failures,
>>>                 nearly exclusively on PPC.  (And one arm self host
>>>                 bot.)  @PPC Bot Owner - I need help reducing a test
>>>                 case for the failure seen here: ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍
>>>                 ZjQcmQRYFpfptBannerStart
>>>                 This Message Is From an External Sender
>>>                 This message came from outside your organization.
>>>                 ZjQcmQRYFpfptBannerEnd
>>>
>>>                 I tried another cycle to see if this had been
>>>                 resolved, but am still seeing build bot failures,
>>>                 nearly exclusively on PPC.  (And one arm self host
>>>                 bot.)
>>>
>>>                 @PPC Bot Owner - I need help reducing a test case
>>>                 for the failure seen here:
>>>                 https://lab.llvm.org/buildbot#builders/100/builds/5762
>>>                 <https://lab.llvm.org/buildbot#builders/100/builds/5762>.
>>>                 I am stuck, and unable to make progress for nearly 3
>>>                 months now.  I would greatly appreciate help.
>>>
>>>                 Philip
>>>
>>>                 On 5/5/21 11:59 PM, Adrian Kuegel wrote:
>>>>                 Sounds good to me, and thanks for sending the heads
>>>>                 up :)
>>>>                 On Wed, May 5, 2021 at 10:01 PM Philip Reames
>>>>                 <listmail at philipreames.com
>>>>                 <mailto:listmail at philipreames.com>> wrote:
>>>>
>>>>                     FYI, I'm going to try recommitting this without
>>>>                     changes in a day or so.
>>>>
>>>>                     I never heard back from a PPC bot owner, and I
>>>>                     don't have enough
>>>>                     information to really debug anything from the
>>>>                     builtbot log.  I did run
>>>>                     across a latent issue which this patch may very
>>>>                     well have exposed at
>>>>                     much higher frequency; the previous patch in
>>>>                     this series (which is much
>>>>                     more restrictive) does appear to have increased
>>>>                     frequency.  That was
>>>>                     worked around in 80e80250.  My educated guess
>>>>                     is that same issue
>>>>                     triggered the miscompile seen on the ppc bot,
>>>>                     but that is more of a
>>>>                     guess than I'd really prefer.
>>>>
>>>>                     I'm going to submit this during off hours, and
>>>>                     watch the bots fairly
>>>>                     closely after submit.  Hopefully this either
>>>>                     cycles clean or I get a
>>>>                     better clue as to what the root issue is.
>>>>
>>>>                     Philip
>>>>
>>>>                     On 2/8/21 9:09 PM, Philip Reames via
>>>>                     llvm-commits wrote:
>>>>                     > Ben,
>>>>                     >
>>>>                     > Thanks for the clarification. The log does
>>>>                     not make the fact this is
>>>>                     > an execution failure obvious.
>>>>                     >
>>>>                     > No, I don't have access to a PPC machine.
>>>>                     >
>>>>                     > I am going to need some assistance from the
>>>>                     bot owner on this. At a
>>>>                     > minimum, IR for the test in question (before
>>>>                     optimization, but on
>>>>                     > target platform) seems like a reasonable ask.
>>>>                     >
>>>>                     > I strongly suspect this change is simply
>>>>                     exposing another latent
>>>>                     > issue.  Or at least, I've reviewed the change
>>>>                     and don't see anything
>>>>                     > likely to cause runtime crashes w/o also
>>>>                     tripping compiler asserts.
>>>>                     >
>>>>                     > Philip
>>>>                     >
>>>>                     >
>>>>                     > On 2/8/21 5:21 AM, Benjamin Kramer wrote:
>>>>                     >> `execution_time` failures mean that the bot
>>>>                     succeeded building a test
>>>>                     >> but it failed when running it. I'm
>>>>                     relatively certain that this is the
>>>>                     >> same issue Adrian is seeing -- binaries
>>>>                     segfaulting early on PPC.
>>>>                     >>
>>>>                     >> The bot log output isn't helpful at all for
>>>>                     investigating why this is
>>>>                     >> happening. Do you happen to have access to a
>>>>                     PPC machine?
>>>>                     >>
>>>>                     >> On Fri, Feb 5, 2021 at 6:05 PM Philip Reames
>>>>                     via llvm-commits
>>>>                     >> <llvm-commits at lists.llvm.org
>>>>                     <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>>                     >>> Adrian,
>>>>                     >>>
>>>>                     >>> I'm going to need you to provide a bit more
>>>>                     information here. The test
>>>>                     >>> failure in stage1 was fixed at the time you
>>>>                     reverted this patch.  The
>>>>                     >>> remaining failure in the bot is very
>>>>                     unclear.  What is a execution_time
>>>>                     >>> failure? From the log output, the "failing"
>>>>                     run finished in 0.5
>>>>                     >>> seconds,
>>>>                     >>> whereas the previous "succeeding" run
>>>>                     finished in 11 seconds. Without
>>>>                     >>> further context, I'd say that's no failure.
>>>>                     >>>
>>>>                     >>> I'll also note that I did not receive email
>>>>                     from this bot.  I received
>>>>                     >>> notice from the various other bots and
>>>>                     fixed the ARM test issue, but
>>>>                     >>> unless I missed it in with the others, this
>>>>                     bot is not notifying.
>>>>                     >>>
>>>>                     >>> In general, I'm a fan of fast reverts, but
>>>>                     I have to admit, this one
>>>>                     >>> appears borderline at the moment.
>>>>                     >>>
>>>>                     >>> Philip
>>>>                     >>>
>>>>                     >>> On 2/5/21 3:53 AM, Adrian Kuegel via
>>>>                     llvm-commits wrote:
>>>>                     >>>> Author: Adrian Kuegel
>>>>                     >>>> Date: 2021-02-05T12:51:03+01:00
>>>>                     >>>> New Revision:
>>>>                     7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c
>>>>                     >>>>
>>>>                     >>>> URL:
>>>>                     >>>>
>>>>                     https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c
>>>>                     <https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c>
>>>>                     >>>> DIFF:
>>>>                     >>>>
>>>>                     https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c.diff
>>>>                     <https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c.diff>
>>>>                     >>>>
>>>>                     >>>> LOG: Revert "[LV] Unconditionally branch
>>>>                     from middle to scalar
>>>>                     >>>> preheader if the scalar loop must execute"
>>>>                     >>>>
>>>>                     >>>> This reverts commit
>>>>                     3e5ce49e5371ce4feadbf97dd5c2b652d9db3d1d.
>>>>                     >>>>
>>>>                     >>>> Tests started failing on PPC, for example:
>>>>                     >>>>
>>>>                     http://lab.llvm.org:8011/#/builders/105/builds/5569
>>>>                     <http://lab.llvm.org:8011/#/builders/105/builds/5569>
>>>>                     >>>>
>>>>                     >>>> Added:
>>>>                     >>>>
>>>>                     >>>>
>>>>                     >>>> Modified:
>>>>                     >>>> llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>>                     >>>>
>>>>                     llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>                     >>>>
>>>>                     llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>>                     >>>>
>>>>                     llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>>                     >>>>
>>>>                     llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>>                     >>>>
>>>>                     >>>> Removed:
>>>>                     >>>>
>>>>                     >>>>
>>>>                     >>>>
>>>>                     >>>>
>>>>                     ################################################################################
>>>>                     >>>>
>>>>                     >>>> diff  --git
>>>>                     a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>>                     >>>> b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>>                     >>>> index 8a89158788cf..de4fb446fdf2 100644
>>>>                     >>>> ---
>>>>                     a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>>                     >>>> +++
>>>>                     b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>>                     >>>> @@ -44,11 +44,11 @@
>>>>                     LoopVersioning::LoopVersioning(const
>>>>                     >>>> LoopAccessInfo &LAI,
>>>>                     >>>> AliasChecks(Checks.begin(), Checks.end()),
>>>>                     >>>> Preds(LAI.getPSE().getUnionPredicate()),
>>>>                     LAI(LAI), LI(LI),
>>>>                     >>>> DT(DT),
>>>>                     >>>>          SE(SE) {
>>>>                     >>>> + assert(L->getUniqueExitBlock() && "No
>>>>                     single exit block");
>>>>                     >>>>    }
>>>>                     >>>>
>>>>                     >>>>    void LoopVersioning::versionLoop(
>>>>                     >>>>        const SmallVectorImpl<Instruction
>>>>                     *> &DefsUsedOutside) {
>>>>                     >>>> -
>>>>                     assert(VersionedLoop->getUniqueExitBlock() &&
>>>>                     "No single exit
>>>>                     >>>> block");
>>>>                     >>>> assert(VersionedLoop->isLoopSimplifyForm() &&
>>>>                     >>>>             "Loop is not in loop-simplify
>>>>                     form");
>>>>                     >>>>
>>>>                     >>>>
>>>>                     >>>> diff  --git
>>>>                     a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>                     >>>>
>>>>                     b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>                     >>>> index 3277842edbfe..6bce0caeb36f 100644
>>>>                     >>>> ---
>>>>                     a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>                     >>>> +++
>>>>                     b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>                     >>>> @@ -852,7 +852,7 @@ class
>>>>                     InnerLoopVectorizer {
>>>>                     >>>>      /// Middle Block between the vector
>>>>                     and the scalar.
>>>>                     >>>>      BasicBlock *LoopMiddleBlock;
>>>>                     >>>>
>>>>                     >>>> -  /// The unique ExitBlock of the scalar
>>>>                     loop if one exists.  Note
>>>>                     >>>> that
>>>>                     >>>> +  /// The (unique) ExitBlock of the
>>>>                     scalar loop.  Note that
>>>>                     >>>>      /// there can be multiple exiting
>>>>                     edges reaching this block.
>>>>                     >>>>      BasicBlock *LoopExitBlock;
>>>>                     >>>>
>>>>                     >>>> @@ -3147,13 +3147,9 @@ void
>>>>                     >>>>
>>>>                     InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop
>>>>                     *L,
>>>>                     >>>> DT->getNode(Bypass)->getIDom()) &&
>>>>                     >>>>             "TC check is expected to
>>>>                     dominate Bypass");
>>>>                     >>>>
>>>>                     >>>> -  // Update dominator for Bypass &
>>>>                     LoopExit (if needed).
>>>>                     >>>> +  // Update dominator for Bypass & LoopExit.
>>>>                     >>>> DT->changeImmediateDominator(Bypass,
>>>>                     TCCheckBlock);
>>>>                     >>>> -  if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -    // If there is an epilogue which must
>>>>                     run, there's no edge
>>>>                     >>>> from the
>>>>                     >>>> -    // middle block to exit blocks  and
>>>>                     thus no need to update the
>>>>                     >>>> immediate
>>>>                     >>>> -    // dominator of the exit blocks.
>>>>                     >>>> -
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     TCCheckBlock);
>>>>                     >>>> +
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     TCCheckBlock);
>>>>                     >>>>
>>>>                     >>>> ReplaceInstWithInst(
>>>>                     >>>> TCCheckBlock->getTerminator(),
>>>>                     >>>> @@ -3192,11 +3188,7 @@ void
>>>>                     >>>> InnerLoopVectorizer::emitSCEVChecks(Loop
>>>>                     *L, BasicBlock *Bypass) {
>>>>                     >>>>      // Update dominator only if this is
>>>>                     first RT check.
>>>>                     >>>>      if (LoopBypassBlocks.empty()) {
>>>>                     >>>> DT->changeImmediateDominator(Bypass,
>>>>                     SCEVCheckBlock);
>>>>                     >>>> -    if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -      // If there is an epilogue which
>>>>                     must run, there's no edge
>>>>                     >>>> from the
>>>>                     >>>> -      // middle block to exit blocks  and
>>>>                     thus no need to update
>>>>                     >>>> the immediate
>>>>                     >>>> -      // dominator of the exit blocks.
>>>>                     >>>> -
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     SCEVCheckBlock);
>>>>                     >>>> +
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     SCEVCheckBlock);
>>>>                     >>>>      }
>>>>                     >>>>
>>>>                     >>>> ReplaceInstWithInst(
>>>>                     >>>> @@ -3252,11 +3244,7 @@ void
>>>>                     >>>>
>>>>                     InnerLoopVectorizer::emitMemRuntimeChecks(Loop
>>>>                     *L, BasicBlock
>>>>                     >>>> *Bypass) {
>>>>                     >>>>      // Update dominator only if this is
>>>>                     first RT check.
>>>>                     >>>>      if (LoopBypassBlocks.empty()) {
>>>>                     >>>> DT->changeImmediateDominator(Bypass,
>>>>                     MemCheckBlock);
>>>>                     >>>> -    if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -      // If there is an epilogue which
>>>>                     must run, there's no edge
>>>>                     >>>> from the
>>>>                     >>>> -      // middle block to exit blocks  and
>>>>                     thus no need to update
>>>>                     >>>> the immediate
>>>>                     >>>> -      // dominator of the exit blocks.
>>>>                     >>>> -
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     MemCheckBlock);
>>>>                     >>>> +
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     MemCheckBlock);
>>>>                     >>>>      }
>>>>                     >>>>
>>>>                     >>>>      Instruction *FirstCheckInst;
>>>>                     >>>> @@ -3381,10 +3369,9 @@ Value
>>>>                     >>>> *InnerLoopVectorizer::emitTransformedIndex(
>>>>                     >>>>    Loop
>>>>                     *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>>                     >>>> Prefix) {
>>>>                     >>>>      LoopScalarBody = OrigLoop->getHeader();
>>>>                     >>>> LoopVectorPreHeader =
>>>>                     OrigLoop->getLoopPreheader();
>>>>                     >>>> +  LoopExitBlock =
>>>>                     OrigLoop->getUniqueExitBlock();
>>>>                     >>>> + assert(LoopExitBlock && "Must have an
>>>>                     exit block");
>>>>                     >>>> assert(LoopVectorPreHeader && "Invalid
>>>>                     loop structure");
>>>>                     >>>> -  LoopExitBlock =
>>>>                     OrigLoop->getUniqueExitBlock(); // may be nullptr
>>>>                     >>>> - assert((LoopExitBlock ||
>>>>                     Cost->requiresScalarEpilogue()) &&
>>>>                     >>>> -         "multiple exit loop without
>>>>                     required epilogue?");
>>>>                     >>>>
>>>>                     >>>>      LoopMiddleBlock =
>>>>                     >>>> SplitBlock(LoopVectorPreHeader,
>>>>                     >>>> LoopVectorPreHeader->getTerminator(), DT,
>>>>                     >>>> @@ -3393,20 +3380,12 @@ Loop
>>>>                     >>>>
>>>>                     *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>>                     Prefix) {
>>>>                     >>>> SplitBlock(LoopMiddleBlock,
>>>>                     >>>> LoopMiddleBlock->getTerminator(), DT, LI,
>>>>                     >>>> nullptr, Twine(Prefix) + "scalar.ph
>>>>                     <http://scalar.ph>");
>>>>                     >>>>
>>>>                     >>>> +  // Set up branch from middle block to
>>>>                     the exit and scalar
>>>>                     >>>> preheader blocks.
>>>>                     >>>> +  // completeLoopSkeleton will update the
>>>>                     condition to use an
>>>>                     >>>> iteration check,
>>>>                     >>>> +  // if required to decide whether to
>>>>                     execute the remainder.
>>>>                     >>>> +  BranchInst *BrInst =
>>>>                     >>>> + BranchInst::Create(LoopExitBlock,
>>>>                     LoopScalarPreHeader,
>>>>                     >>>> Builder.getTrue());
>>>>                     >>>>      auto *ScalarLatchTerm =
>>>>                     >>>> OrigLoop->getLoopLatch()->getTerminator();
>>>>                     >>>> -
>>>>                     >>>> -  // Set up the middle block terminator. 
>>>>                     Two cases:
>>>>                     >>>> -  // 1) If we know that we must execute
>>>>                     the scalar epilogue, emit an
>>>>                     >>>> -  // unconditional branch.
>>>>                     >>>> -  // 2) Otherwise, we must have a single
>>>>                     unique exit block (due to
>>>>                     >>>> how we
>>>>                     >>>> -  //    implement the multiple exit
>>>>                     case).  In this case, set up a
>>>>                     >>>> conditonal
>>>>                     >>>> -  //    branch from the middle block to
>>>>                     the loop scalar preheader,
>>>>                     >>>> and the
>>>>                     >>>> -  //    exit block. completeLoopSkeleton
>>>>                     will update the
>>>>                     >>>> condition to use an
>>>>                     >>>> -  //    iteration check, if required to
>>>>                     decide whether to execute
>>>>                     >>>> the remainder.
>>>>                     >>>> -  BranchInst *BrInst =
>>>>                     Cost->requiresScalarEpilogue() ?
>>>>                     >>>> - BranchInst::Create(LoopScalarPreHeader) :
>>>>                     >>>> - BranchInst::Create(LoopExitBlock,
>>>>                     LoopScalarPreHeader,
>>>>                     >>>> - Builder.getTrue());
>>>>                     >>>>
>>>>                     BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
>>>>                     >>>>
>>>>                     ReplaceInstWithInst(LoopMiddleBlock->getTerminator(),
>>>>                     BrInst);
>>>>                     >>>>
>>>>                     >>>> @@ -3418,11 +3397,7 @@ Loop
>>>>                     >>>>
>>>>                     *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>>                     Prefix) {
>>>>                     >>>> nullptr, nullptr, Twine(Prefix) +
>>>>                     "vector.body");
>>>>                     >>>>
>>>>                     >>>>      // Update dominator for loop exit.
>>>>                     >>>> -  if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -    // If there is an epilogue which must
>>>>                     run, there's no edge
>>>>                     >>>> from the
>>>>                     >>>> -    // middle block to exit blocks  and
>>>>                     thus no need to update the
>>>>                     >>>> immediate
>>>>                     >>>> -    // dominator of the exit blocks.
>>>>                     >>>> -
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     LoopMiddleBlock);
>>>>                     >>>> +
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     LoopMiddleBlock);
>>>>                     >>>>
>>>>                     >>>>      // Create and register the new vector
>>>>                     loop.
>>>>                     >>>>      Loop *Lp = LI->AllocateLoop();
>>>>                     >>>> @@ -3519,14 +3494,10 @@ BasicBlock
>>>>                     >>>>
>>>>                     *InnerLoopVectorizer::completeLoopSkeleton(Loop *L,
>>>>                     >>>>      auto *ScalarLatchTerm =
>>>>                     >>>> OrigLoop->getLoopLatch()->getTerminator();
>>>>                     >>>>
>>>>                     >>>>      // Add a check in the middle block to
>>>>                     see if we have completed
>>>>                     >>>> -  // all of the iterations in the first
>>>>                     vector loop. Three cases:
>>>>                     >>>> -  // 1) If we require a scalar epilogue,
>>>>                     there is no conditional
>>>>                     >>>> branch as
>>>>                     >>>> -  //    we unconditionally branch to the
>>>>                     scalar preheader. Do
>>>>                     >>>> nothing.
>>>>                     >>>> -  // 2) If (N - N%VF) == N, then we
>>>>                     *don't* need to run the
>>>>                     >>>> remainder.
>>>>                     >>>> -  //    Thus if tail is to be folded, we
>>>>                     know we don't need to run
>>>>                     >>>> the
>>>>                     >>>> -  //    remainder and we can use the
>>>>                     previous value for the
>>>>                     >>>> condition (true).
>>>>                     >>>> -  // 3) Otherwise, construct a runtime check.
>>>>                     >>>> -  if (!Cost->requiresScalarEpilogue() &&
>>>>                     >>>> !Cost->foldTailByMasking()) {
>>>>                     >>>> +  // all of the iterations in the first
>>>>                     vector loop.
>>>>                     >>>> +  // If (N - N%VF) == N, then we *don't*
>>>>                     need to run the remainder.
>>>>                     >>>> +  // If tail is to be folded, we know we
>>>>                     don't need to run the
>>>>                     >>>> remainder.
>>>>                     >>>> +  if (!Cost->foldTailByMasking()) {
>>>>                     >>>>        Instruction *CmpN =
>>>>                     CmpInst::Create(Instruction::ICmp,
>>>>                     >>>> CmpInst::ICMP_EQ,
>>>>                     >>>> Count, VectorTripCount,
>>>>                     >>>> "cmp.n",
>>>>                     >>>> LoopMiddleBlock->getTerminator());
>>>>                     >>>> @@ -3590,17 +3561,17 @@ BasicBlock
>>>>                     >>>>
>>>>                     *InnerLoopVectorizer::createVectorizedLoopSkeleton()
>>>>                     {
>>>>                     >>>>      |    [  ]_| <-- vector loop.
>>>>                     >>>>      |     |
>>>>                     >>>>      |     v
>>>>                     >>>> -  \   -[ ]   <--- middle-block.
>>>>                     >>>> -   \/   |
>>>>                     >>>> -   /\   v
>>>>                     >>>> -   | ->[ ] <--- new preheader.
>>>>                     >>>> +  |   -[ ]   <--- middle-block.
>>>>                     >>>> +  |  /  |
>>>>                     >>>> +  | /   v
>>>>                     >>>> +  -|- >[ ] <--- new preheader.
>>>>                     >>>>       |    |
>>>>                     >>>> - (opt)  v <-- edge from middle to exit
>>>>                     iff epilogue is not
>>>>                     >>>> required.
>>>>                     >>>> +   |    v
>>>>                     >>>>       |   [ ] \
>>>>                     >>>> -   |   [ ]_| <-- old scalar loop to
>>>>                     handle remainder (scalar
>>>>                     >>>> epilogue).
>>>>                     >>>> +   |   [ ]_| <-- old scalar loop to
>>>>                     handle remainder.
>>>>                     >>>>        \   |
>>>>                     >>>>         \  v
>>>>                     >>>> -      >[ ] <-- exit block(s).
>>>>                     >>>> +      >[ ] <-- exit block.
>>>>                     >>>>       ...
>>>>                     >>>>       */
>>>>                     >>>>
>>>>                     >>>> @@ -4021,18 +3992,13 @@ void
>>>>                     >>>> InnerLoopVectorizer::fixVectorizedLoop() {
>>>>                     >>>>      // Forget the original basic block.
>>>>                     >>>> PSE.getSE()->forgetLoop(OrigLoop);
>>>>                     >>>>
>>>>                     >>>> -  // If we inserted an edge from the
>>>>                     middle block to the unique
>>>>                     >>>> exit block,
>>>>                     >>>> -  // update uses outside the loop (phis)
>>>>                     to account for the newly
>>>>                     >>>> inserted
>>>>                     >>>> -  // edge.
>>>>                     >>>> -  if (!Cost->requiresScalarEpilogue()) {
>>>>                     >>>> -    // Fix-up external users of the
>>>>                     induction variables.
>>>>                     >>>> -    for (auto &Entry :
>>>>                     Legal->getInductionVars())
>>>>                     >>>> - fixupIVUsers(Entry.first, Entry.second,
>>>>                     >>>> -
>>>>                     getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
>>>>                     >>>> - IVEndValues[Entry.first], LoopMiddleBlock);
>>>>                     >>>> +  // Fix-up external users of the
>>>>                     induction variables.
>>>>                     >>>> +  for (auto &Entry :
>>>>                     Legal->getInductionVars())
>>>>                     >>>> + fixupIVUsers(Entry.first, Entry.second,
>>>>                     >>>> +
>>>>                     getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
>>>>                     >>>> + IVEndValues[Entry.first], LoopMiddleBlock);
>>>>                     >>>>
>>>>                     >>>> -    fixLCSSAPHIs();
>>>>                     >>>> -  }
>>>>                     >>>> +  fixLCSSAPHIs();
>>>>                     >>>>      for (Instruction *PI :
>>>>                     PredicatedInstructions)
>>>>                     >>>> sinkScalarOperands(&*PI);
>>>>                     >>>>
>>>>                     >>>> @@ -4250,13 +4216,12 @@ void
>>>>                     >>>>
>>>>                     InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode
>>>>                     *Phi) {
>>>>                     >>>>      // recurrence in the exit block, and
>>>>                     then add an edge for the
>>>>                     >>>> middle block.
>>>>                     >>>>      // Note that LCSSA does not imply
>>>>                     single entry when the
>>>>                     >>>> original scalar loop
>>>>                     >>>>      // had multiple exiting edges (as we
>>>>                     always run the last
>>>>                     >>>> iteration in the
>>>>                     >>>> -  // scalar epilogue); in that case,
>>>>                     there is no edge from middle
>>>>                     >>>> to exit and
>>>>                     >>>> -  // and thus no phis which needed updated.
>>>>                     >>>> -  if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -    for (PHINode &LCSSAPhi :
>>>>                     LoopExitBlock->phis())
>>>>                     >>>> -      if (any_of(LCSSAPhi.incoming_values(),
>>>>                     >>>> - [Phi](Value *V) { return V == Phi; }))
>>>>                     >>>> -
>>>>                     LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop,
>>>>                     >>>> LoopMiddleBlock);
>>>>                     >>>> +  // scalar epilogue); in that case, the
>>>>                     exiting path through
>>>>                     >>>> middle will be
>>>>                     >>>> +  // dynamically dead and the value
>>>>                     picked for the phi doesn't
>>>>                     >>>> matter.
>>>>                     >>>> +  for (PHINode &LCSSAPhi :
>>>>                     LoopExitBlock->phis())
>>>>                     >>>> +    if (any_of(LCSSAPhi.incoming_values(),
>>>>                     >>>> + [Phi](Value *V) { return V == Phi; }))
>>>>                     >>>> +
>>>>                     LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop,
>>>>                     >>>> LoopMiddleBlock);
>>>>                     >>>>    }
>>>>                     >>>>
>>>>                     >>>>    void
>>>>                     InnerLoopVectorizer::fixReduction(PHINode *Phi) {
>>>>                     >>>> @@ -4421,11 +4386,10 @@ void
>>>>                     >>>> InnerLoopVectorizer::fixReduction(PHINode
>>>>                     *Phi) {
>>>>                     >>>>      // We know that the loop is in LCSSA
>>>>                     form. We need to update
>>>>                     >>>> the PHI nodes
>>>>                     >>>>      // in the exit blocks.  See comment
>>>>                     on analogous loop in
>>>>                     >>>>      // fixFirstOrderRecurrence for a more
>>>>                     complete explaination of
>>>>                     >>>> the logic.
>>>>                     >>>> -  if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -    for (PHINode &LCSSAPhi :
>>>>                     LoopExitBlock->phis())
>>>>                     >>>> -      if (any_of(LCSSAPhi.incoming_values(),
>>>>                     >>>> - [LoopExitInst](Value *V) { return V ==
>>>>                     >>>> LoopExitInst; }))
>>>>                     >>>> - LCSSAPhi.addIncoming(ReducedPartRdx,
>>>>                     LoopMiddleBlock);
>>>>                     >>>> +  for (PHINode &LCSSAPhi :
>>>>                     LoopExitBlock->phis())
>>>>                     >>>> +    if (any_of(LCSSAPhi.incoming_values(),
>>>>                     >>>> + [LoopExitInst](Value *V) { return V ==
>>>>                     >>>> LoopExitInst; }))
>>>>                     >>>> + LCSSAPhi.addIncoming(ReducedPartRdx,
>>>>                     LoopMiddleBlock);
>>>>                     >>>>
>>>>                     >>>>      // Fix the scalar loop reduction
>>>>                     variable with the incoming
>>>>                     >>>> reduction sum
>>>>                     >>>>      // from the vector body and from the
>>>>                     backedge value.
>>>>                     >>>> @@ -8074,11 +8038,7 @@ BasicBlock
>>>>                     >>>>
>>>>                     *EpilogueVectorizerMainLoop::emitMinimumIterationCountCheck(
>>>>                     >>>>
>>>>                     >>>>        // Update dominator for Bypass &
>>>>                     LoopExit.
>>>>                     >>>> DT->changeImmediateDominator(Bypass,
>>>>                     TCCheckBlock);
>>>>                     >>>> -    if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -      // For loops with multiple exits,
>>>>                     there's no edge from the
>>>>                     >>>> middle block
>>>>                     >>>> -      // to exit blocks (as the epilogue
>>>>                     must run) and thus no
>>>>                     >>>> need to update
>>>>                     >>>> -      // the immediate dominator of the
>>>>                     exit blocks.
>>>>                     >>>> -
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     TCCheckBlock);
>>>>                     >>>> +
>>>>                     DT->changeImmediateDominator(LoopExitBlock,
>>>>                     TCCheckBlock);
>>>>                     >>>>
>>>>                     >>>> LoopBypassBlocks.push_back(TCCheckBlock);
>>>>                     >>>>
>>>>                     >>>> @@ -8142,12 +8102,7 @@
>>>>                     >>>>
>>>>                     EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton()
>>>>                     >>>> {
>>>>                     >>>>
>>>>                     >>>>
>>>>                     DT->changeImmediateDominator(LoopScalarPreHeader,
>>>>                     >>>> EPI.EpilogueIterationCountCheck);
>>>>                     >>>> -  if (!Cost->requiresScalarEpilogue())
>>>>                     >>>> -    // If there is an epilogue which must
>>>>                     run, there's no edge
>>>>                     >>>> from the
>>>>                     >>>> -    // middle block to exit blocks  and
>>>>                     thus no need to update the
>>>>                     >>>> immediate
>>>>                     >>>> -    // dominator of the exit blocks.
>>>>                     >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>>                     >>>> - EPI.EpilogueIterationCountCheck);
>>>>                     >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>>                     >>>> EPI.EpilogueIterationCountCheck);
>>>>                     >>>>
>>>>                     >>>>      // Keep track of bypass blocks, as
>>>>                     they feed start values to
>>>>                     >>>> the induction
>>>>                     >>>>      // phis in the scalar loop preheader.
>>>>                     >>>>
>>>>                     >>>> diff  --git
>>>>                     >>>>
>>>>                     a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>>                     >>>>
>>>>                     b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>>                     >>>> index ec280bf5d5e4..7d4a3c5c9935 100644
>>>>                     >>>> ---
>>>>                     >>>>
>>>>                     a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>>                     >>>> +++
>>>>                     >>>>
>>>>                     b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>>                     >>>> @@ -471,9 +471,10 @@ define i16
>>>>                     @multiple_exit(i16* %p, i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP15:%.*]] = icmp eq
>>>>                     i32 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP15]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP6:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>>                     [[TMP2]], [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT:
>>>>                     [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement
>>>>                     >>>> <4 x i16> [[WIDE_LOAD]], i32 3
>>>>                     >>>>    ; CHECK-NEXT:
>>>>                     [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] =
>>>>                     >>>> extractelement <4 x i16> [[WIDE_LOAD]], i32 2
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[IF_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]]
>>>>                     = phi i16 [ 0,
>>>>                     >>>> [[ENTRY:%.*]] ], [
>>>>                     [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i32 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>>                     >>>> @@ -485,14 +486,14 @@ define i16
>>>>                     @multiple_exit(i16* %p, i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>>                     inbounds i16, i16*
>>>>                     >>>> [[P]], i64 [[IPROM]]
>>>>                     >>>>    ; CHECK-NEXT: [[REC_NEXT]] = load i16,
>>>>                     i16* [[B]], align 2
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>>                     i32 [[I]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[IF_END:%.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label [[IF_END]]
>>>>                     >>>>    ; CHECK: for.body:
>>>>                     >>>>    ; CHECK-NEXT: store i16
>>>>                     [[SCALAR_RECUR]], i16* [[B]], align 4
>>>>                     >>>>    ; CHECK-NEXT: [[INC]] = add nsw i32
>>>>                     [[I]], 1
>>>>                     >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>>                     i32 [[I]], 2096
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[CMP2]], label
>>>>                     [[FOR_COND]], label
>>>>                     >>>> [[IF_END]], [[LOOP7:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: if.end:
>>>>                     >>>> -; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>>                     [ [[SCALAR_RECUR]],
>>>>                     >>>> [[FOR_BODY]] ], [ [[SCALAR_RECUR]],
>>>>                     [[FOR_COND]] ]
>>>>                     >>>> +; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>>                     [ [[SCALAR_RECUR]],
>>>>                     >>>> [[FOR_BODY]] ], [ [[SCALAR_RECUR]],
>>>>                     [[FOR_COND]] ], [
>>>>                     >>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI]],
>>>>                     [[MIDDLE_BLOCK]] ]
>>>>                     >>>>    ; CHECK-NEXT: ret i16 [[REC_LCSSA]]
>>>>                     >>>>    ;
>>>>                     >>>>    entry:
>>>>                     >>>> @@ -557,9 +558,10 @@ define i16
>>>>                     @multiple_exit2(i16* %p, i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP15:%.*]] = icmp eq
>>>>                     i32 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP15]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP8:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>>                     [[TMP2]], [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT:
>>>>                     [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement
>>>>                     >>>> <4 x i16> [[WIDE_LOAD]], i32 3
>>>>                     >>>>    ; CHECK-NEXT:
>>>>                     [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] =
>>>>                     >>>> extractelement <4 x i16> [[WIDE_LOAD]], i32 2
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[IF_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]]
>>>>                     = phi i16 [ 0,
>>>>                     >>>> [[ENTRY:%.*]] ], [
>>>>                     [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i32 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>>                     >>>> @@ -571,14 +573,14 @@ define i16
>>>>                     @multiple_exit2(i16* %p, i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>>                     inbounds i16, i16*
>>>>                     >>>> [[P]], i64 [[IPROM]]
>>>>                     >>>>    ; CHECK-NEXT: [[REC_NEXT]] = load i16,
>>>>                     i16* [[B]], align 2
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>>                     i32 [[I]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[IF_END:%.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label [[IF_END]]
>>>>                     >>>>    ; CHECK: for.body:
>>>>                     >>>>    ; CHECK-NEXT: store i16
>>>>                     [[SCALAR_RECUR]], i16* [[B]], align 4
>>>>                     >>>>    ; CHECK-NEXT: [[INC]] = add nsw i32
>>>>                     [[I]], 1
>>>>                     >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>>                     i32 [[I]], 2096
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[CMP2]], label
>>>>                     [[FOR_COND]], label
>>>>                     >>>> [[IF_END]], [[LOOP9:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: if.end:
>>>>                     >>>> -; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>>                     [ [[SCALAR_RECUR]],
>>>>                     >>>> [[FOR_COND]] ], [ 10, [[FOR_BODY]] ]
>>>>                     >>>> +; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>>                     [ [[SCALAR_RECUR]],
>>>>                     >>>> [[FOR_COND]] ], [ 10, [[FOR_BODY]] ], [
>>>>                     >>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI]],
>>>>                     [[MIDDLE_BLOCK]] ]
>>>>                     >>>>    ; CHECK-NEXT: ret i16 [[REC_LCSSA]]
>>>>                     >>>>    ;
>>>>                     >>>>    entry:
>>>>                     >>>>
>>>>                     >>>> diff  --git
>>>>                     >>>>
>>>>                     a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>>                     >>>>
>>>>                     b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>>                     >>>> index f0ba677348ab..0d4bdf0ecac3 100644
>>>>                     >>>> ---
>>>>                     a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>>                     >>>> +++
>>>>                     b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>>                     >>>> @@ -447,7 +447,7 @@ define void
>>>>                     @even_load_static_tc(i32* noalias
>>>>                     >>>> nocapture readonly %A, i32* noalia
>>>>                     >>>>    ; CHECK-NEXT: [[TMP6:%.*]] = icmp eq
>>>>                     i64 [[INDEX_NEXT]], 508
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP6]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP12:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 false, label
>>>>                     [[FOR_COND_CLEANUP:%.*]],
>>>>                     >>>> label [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ 1016,
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>>                     >>>> @@ -463,7 +463,7 @@ define void
>>>>                     @even_load_static_tc(i32* noalias
>>>>                     >>>> nocapture readonly %A, i32* noalia
>>>>                     >>>>    ; CHECK-NEXT: store i32 [[MUL]], i32*
>>>>                     [[ARRAYIDX2]], align 4
>>>>                     >>>>    ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add
>>>>                     nuw nsw i64
>>>>                     >>>> [[INDVARS_IV]], 2
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp ult
>>>>                     i64 [[INDVARS_IV]], 1022
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_COND_CLEANUP:%.*]],
>>>>                     [[LOOP13:!llvm.loop !.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_COND_CLEANUP]], [[LOOP13:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ;
>>>>                     >>>>    entry:
>>>>                     >>>>      br label %for.body
>>>>                     >>>> @@ -528,7 +528,7 @@ define void
>>>>                     @even_load_dynamic_tc(i32* noalias
>>>>                     >>>> nocapture readonly %A, i32* noali
>>>>                     >>>>    ; CHECK-NEXT: [[TMP12:%.*]] = icmp eq
>>>>                     i64 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP12]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP14:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 false, label
>>>>                     [[FOR_COND_CLEANUP:%.*]],
>>>>                     >>>> label [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ [[IND_END]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>>                     >>>> @@ -544,7 +544,7 @@ define void
>>>>                     @even_load_dynamic_tc(i32* noalias
>>>>                     >>>> nocapture readonly %A, i32* noali
>>>>                     >>>>    ; CHECK-NEXT: store i32 [[MUL]], i32*
>>>>                     [[ARRAYIDX2]], align 4
>>>>                     >>>>    ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add
>>>>                     nuw nsw i64
>>>>                     >>>> [[INDVARS_IV]], 2
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp ult
>>>>                     i64 [[INDVARS_IV_NEXT]],
>>>>                     >>>> [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_COND_CLEANUP:%.*]],
>>>>                     [[LOOP15:!llvm.loop !.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_COND_CLEANUP]], [[LOOP15:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ;
>>>>                     >>>>    entry:
>>>>                     >>>>      br label %for.body
>>>>                     >>>> @@ -973,7 +973,7 @@ define void
>>>>                     @PR27626_0(%pair.i32 *%p, i32 %z,
>>>>                     >>>> i64 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP19:%.*]] = icmp eq
>>>>                     i64 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP19]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP24:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 false, label
>>>>                     [[FOR_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>>                     >>>> @@ -985,7 +985,7 @@ define void
>>>>                     @PR27626_0(%pair.i32 *%p, i32 %z,
>>>>                     >>>> i64 %n) {
>>>>                     >>>>    ; CHECK-NEXT: store i32 [[Z]], i32*
>>>>                     [[P_I_Y]], align 4
>>>>                     >>>>    ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>>                     i64 [[I]], 1
>>>>                     >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>>                     i64 [[I_NEXT]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END:%.*]], [[LOOP25:!llvm.loop !.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END]], [[LOOP25:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: for.end:
>>>>                     >>>>    ; CHECK-NEXT: ret void
>>>>                     >>>>    ;
>>>>                     >>>> @@ -1066,7 +1066,7 @@ define i32
>>>>                     @PR27626_1(%pair.i32 *%p, i64 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[RDX_SHUF3:%.*]] =
>>>>                     shufflevector <4 x i32>
>>>>                     >>>> [[BIN_RDX]], <4 x i32> poison, <4 x i32>
>>>>                     <i32 1, i32 undef, i32
>>>>                     >>>> undef, i32 undef>
>>>>                     >>>>    ; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4
>>>>                     x i32> [[BIN_RDX]],
>>>>                     >>>> [[RDX_SHUF3]]
>>>>                     >>>>    ; CHECK-NEXT: [[TMP19:%.*]] =
>>>>                     extractelement <4 x i32>
>>>>                     >>>> [[BIN_RDX4]], i32 0
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 false, label
>>>>                     [[FOR_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] =
>>>>                     phi i32 [ [[TMP19]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>>                     >>>> @@ -1081,9 +1081,10 @@ define i32
>>>>                     @PR27626_1(%pair.i32 *%p, i64 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP21]] = add nsw i32
>>>>                     [[TMP20]], [[S]]
>>>>                     >>>>    ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>>                     i64 [[I]], 1
>>>>                     >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>>                     i64 [[I_NEXT]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END:%.*]], [[LOOP27:!llvm.loop !.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END]], [[LOOP27:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: for.end:
>>>>                     >>>> -; CHECK-NEXT:    ret i32 [[TMP21]]
>>>>                     >>>> +; CHECK-NEXT: [[TMP22:%.*]] = phi i32 [
>>>>                     [[TMP21]], [[FOR_BODY]]
>>>>                     >>>> ], [ [[TMP19]], [[MIDDLE_BLOCK]] ]
>>>>                     >>>> +; CHECK-NEXT:    ret i32 [[TMP22]]
>>>>                     >>>>    ;
>>>>                     >>>>    entry:
>>>>                     >>>>      br label %for.body
>>>>                     >>>> @@ -1162,7 +1163,7 @@ define void
>>>>                     @PR27626_2(%pair.i32 *%p, i64 %n,
>>>>                     >>>> i32 %z) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP20:%.*]] = icmp eq
>>>>                     i64 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP20]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP28:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 false, label
>>>>                     [[FOR_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>>                     >>>> @@ -1176,7 +1177,7 @@ define void
>>>>                     @PR27626_2(%pair.i32 *%p, i64 %n,
>>>>                     >>>> i32 %z) {
>>>>                     >>>>    ; CHECK-NEXT: store i32 [[TMP21]], i32*
>>>>                     [[P_I_Y]], align 4
>>>>                     >>>>    ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>>                     i64 [[I]], 1
>>>>                     >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>>                     i64 [[I_NEXT]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END:%.*]], [[LOOP29:!llvm.loop !.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END]], [[LOOP29:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: for.end:
>>>>                     >>>>    ; CHECK-NEXT: ret void
>>>>                     >>>>    ;
>>>>                     >>>> @@ -1263,7 +1264,7 @@ define i32
>>>>                     @PR27626_3(%pair.i32 *%p, i64 %n,
>>>>                     >>>> i32 %z) {
>>>>                     >>>>    ; CHECK-NEXT: [[RDX_SHUF3:%.*]] =
>>>>                     shufflevector <4 x i32>
>>>>                     >>>> [[BIN_RDX]], <4 x i32> poison, <4 x i32>
>>>>                     <i32 1, i32 undef, i32
>>>>                     >>>> undef, i32 undef>
>>>>                     >>>>    ; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4
>>>>                     x i32> [[BIN_RDX]],
>>>>                     >>>> [[RDX_SHUF3]]
>>>>                     >>>>    ; CHECK-NEXT: [[TMP22:%.*]] =
>>>>                     extractelement <4 x i32>
>>>>                     >>>> [[BIN_RDX4]], i32 0
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 false, label
>>>>                     [[FOR_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] =
>>>>                     phi i32 [ [[TMP22]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>>                     >>>> @@ -1281,9 +1282,10 @@ define i32
>>>>                     @PR27626_3(%pair.i32 *%p, i64 %n,
>>>>                     >>>> i32 %z) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP25]] = add nsw i32
>>>>                     [[TMP24]], [[S]]
>>>>                     >>>>    ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>>                     i64 [[I]], 1
>>>>                     >>>>    ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>>                     i64 [[I_NEXT]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END:%.*]], [[LOOP31:!llvm.loop !.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[COND]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[FOR_END]], [[LOOP31:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: for.end:
>>>>                     >>>> -; CHECK-NEXT:    ret i32 [[TMP25]]
>>>>                     >>>> +; CHECK-NEXT: [[TMP26:%.*]] = phi i32 [
>>>>                     [[TMP25]], [[FOR_BODY]]
>>>>                     >>>> ], [ [[TMP22]], [[MIDDLE_BLOCK]] ]
>>>>                     >>>> +; CHECK-NEXT:    ret i32 [[TMP26]]
>>>>                     >>>>    ;
>>>>                     >>>>    entry:
>>>>                     >>>>      br label %for.body
>>>>                     >>>>
>>>>                     >>>> diff  --git
>>>>                     a/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>>                     >>>>
>>>>                     b/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>>                     >>>> index f32002fae2b6..91780789088b 100644
>>>>                     >>>> ---
>>>>                     a/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>>                     >>>> +++
>>>>                     b/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>>                     >>>> @@ -146,14 +146,15 @@ define void
>>>>                     @early_exit(i16* %p, i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP10:%.*]] = icmp eq
>>>>                     i32 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP10]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP4:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>>                     [[TMP1]], [[N_VEC]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[IF_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i32 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>>                     >>>>    ; CHECK: for.cond:
>>>>                     >>>>    ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>>                     [[BC_RESUME_VAL]],
>>>>                     >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>>                     [[FOR_BODY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>>                     i32 [[I]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[IF_END:%.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label [[IF_END]]
>>>>                     >>>>    ; CHECK: for.body:
>>>>                     >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>>                     [[I]] to i64
>>>>                     >>>>    ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>>                     inbounds i16, i16*
>>>>                     >>>> [[P]], i64 [[IPROM]]
>>>>                     >>>> @@ -285,14 +286,15 @@ define void
>>>>                     @multiple_unique_exit(i16* %p,
>>>>                     >>>> i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq
>>>>                     i32 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP11]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP6:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>>                     [[TMP2]], [[N_VEC]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[IF_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i32 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>>                     >>>>    ; CHECK: for.cond:
>>>>                     >>>>    ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>>                     [[BC_RESUME_VAL]],
>>>>                     >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>>                     [[FOR_BODY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>>                     i32 [[I]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[IF_END:%.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label [[IF_END]]
>>>>                     >>>>    ; CHECK: for.body:
>>>>                     >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>>                     [[I]] to i64
>>>>                     >>>>    ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>>                     inbounds i16, i16*
>>>>                     >>>> [[P]], i64 [[IPROM]]
>>>>                     >>>> @@ -372,14 +374,17 @@ define i32
>>>>                     @multiple_unique_exit2(i16* %p,
>>>>                     >>>> i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq
>>>>                     i32 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP11]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP8:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>>                     [[TMP2]], [[N_VEC]]
>>>>                     >>>> +; CHECK-NEXT: [[IND_ESCAPE:%.*]] = sub
>>>>                     i32 [[N_VEC]], 1
>>>>                     >>>> +; CHECK-NEXT: [[IND_ESCAPE1:%.*]] = sub
>>>>                     i32 [[N_VEC]], 1
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[IF_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i32 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>>                     >>>>    ; CHECK: for.cond:
>>>>                     >>>>    ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>>                     [[BC_RESUME_VAL]],
>>>>                     >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>>                     [[FOR_BODY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>>                     i32 [[I]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[IF_END:%.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label [[IF_END]]
>>>>                     >>>>    ; CHECK: for.body:
>>>>                     >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>>                     [[I]] to i64
>>>>                     >>>>    ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>>                     inbounds i16, i16*
>>>>                     >>>> [[P]], i64 [[IPROM]]
>>>>                     >>>> @@ -388,7 +393,7 @@ define i32
>>>>                     @multiple_unique_exit2(i16* %p, i32
>>>>                     >>>> %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>>                     i32 [[I]], 2096
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[CMP2]], label
>>>>                     [[FOR_COND]], label
>>>>                     >>>> [[IF_END]], [[LOOP9:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: if.end:
>>>>                     >>>> -; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [
>>>>                     [[I]], [[FOR_BODY]]
>>>>                     >>>> ], [ [[I]], [[FOR_COND]] ]
>>>>                     >>>> +; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [
>>>>                     [[I]], [[FOR_BODY]]
>>>>                     >>>> ], [ [[I]], [[FOR_COND]] ], [
>>>>                     [[IND_ESCAPE1]], [[MIDDLE_BLOCK]] ]
>>>>                     >>>>    ; CHECK-NEXT: ret i32 [[I_LCSSA]]
>>>>                     >>>>    ;
>>>>                     >>>>    ; TAILFOLD-LABEL: @multiple_unique_exit2(
>>>>                     >>>> @@ -461,14 +466,15 @@ define i32
>>>>                     @multiple_unique_exit3(i16* %p,
>>>>                     >>>> i32 %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq
>>>>                     i32 [[INDEX_NEXT]],
>>>>                     >>>> [[N_VEC]]
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP11]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP10:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>>                     [[TMP2]], [[N_VEC]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[IF_END:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i32 [ [[N_VEC]],
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>>                     >>>>    ; CHECK: for.cond:
>>>>                     >>>>    ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>>                     [[BC_RESUME_VAL]],
>>>>                     >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>>                     [[FOR_BODY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>>                     i32 [[I]], [[N]]
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label
>>>>                     >>>> [[IF_END:%.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP]], label
>>>>                     [[FOR_BODY]], label [[IF_END]]
>>>>                     >>>>    ; CHECK: for.body:
>>>>                     >>>>    ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>>                     [[I]] to i64
>>>>                     >>>>    ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>>                     inbounds i16, i16*
>>>>                     >>>> [[P]], i64 [[IPROM]]
>>>>                     >>>> @@ -477,7 +483,7 @@ define i32
>>>>                     @multiple_unique_exit3(i16* %p, i32
>>>>                     >>>> %n) {
>>>>                     >>>>    ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>>                     i32 [[I]], 2096
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[CMP2]], label
>>>>                     [[FOR_COND]], label
>>>>                     >>>> [[IF_END]], [[LOOP11:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: if.end:
>>>>                     >>>> -; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0,
>>>>                     [[FOR_COND]] ], [ 1,
>>>>                     >>>> [[FOR_BODY]] ]
>>>>                     >>>> +; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0,
>>>>                     [[FOR_COND]] ], [ 1,
>>>>                     >>>> [[FOR_BODY]] ], [ 0, [[MIDDLE_BLOCK]] ]
>>>>                     >>>>    ; CHECK-NEXT: ret i32 [[EXIT]]
>>>>                     >>>>    ;
>>>>                     >>>>    ; TAILFOLD-LABEL: @multiple_unique_exit3(
>>>>                     >>>> @@ -994,7 +1000,8 @@ define void
>>>>                     @scalar_predication(float* %addr) {
>>>>                     >>>>    ; CHECK-NEXT: [[TMP10:%.*]] = icmp eq
>>>>                     i64 [[INDEX_NEXT]], 200
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[TMP10]], label
>>>>                     [[MIDDLE_BLOCK:%.*]],
>>>>                     >>>> label [[VECTOR_BODY]], [[LOOP12:!llvm.loop
>>>>                     !.*]]
>>>>                     >>>>    ; CHECK: middle.block:
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64
>>>>                     201, 200
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[EXIT:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ 200,
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
>>>>                     >>>> @@ -1002,7 +1009,7 @@ define void
>>>>                     @scalar_predication(float* %addr) {
>>>>                     >>>>    ; CHECK-NEXT: [[IV:%.*]] = phi i64 [
>>>>                     [[BC_RESUME_VAL]],
>>>>                     >>>> [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]],
>>>>                     [[LOOP_LATCH:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[GEP:%.*]] =
>>>>                     getelementptr float, float*
>>>>                     >>>> [[ADDR]], i64 [[IV]]
>>>>                     >>>>    ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] =
>>>>                     icmp eq i64 [[IV]], 200
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[EXITCOND_NOT]],
>>>>                     label [[EXIT:%.*]], label
>>>>                     >>>> [[LOOP_BODY:%.*]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[EXITCOND_NOT]],
>>>>                     label [[EXIT]], label
>>>>                     >>>> [[LOOP_BODY:%.*]]
>>>>                     >>>>    ; CHECK: loop.body:
>>>>                     >>>>    ; CHECK-NEXT: [[TMP11:%.*]] = load
>>>>                     float, float* [[GEP]],
>>>>                     >>>> align 4
>>>>                     >>>>    ; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq
>>>>                     float [[TMP11]],
>>>>                     >>>> 0.000000e+00
>>>>                     >>>> @@ -1088,7 +1095,8 @@ define i32
>>>>                     @me_reduction(i32* %addr) {
>>>>                     >>>>    ; CHECK-NEXT: [[RDX_SHUF:%.*]] =
>>>>                     shufflevector <2 x i32>
>>>>                     >>>> [[TMP5]], <2 x i32> poison, <2 x i32> <i32
>>>>                     1, i32 undef>
>>>>                     >>>>    ; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2
>>>>                     x i32> [[TMP5]],
>>>>                     >>>> [[RDX_SHUF]]
>>>>                     >>>>    ; CHECK-NEXT: [[TMP7:%.*]] =
>>>>                     extractelement <2 x i32>
>>>>                     >>>> [[BIN_RDX]], i32 0
>>>>                     >>>> -; CHECK-NEXT:    br label [[SCALAR_PH]]
>>>>                     >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64
>>>>                     201, 200
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[CMP_N]], label
>>>>                     [[EXIT:%.*]], label
>>>>                     >>>> [[SCALAR_PH]]
>>>>                     >>>>    ; CHECK: scalar.ph <http://scalar.ph>:
>>>>                     >>>>    ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>>                     phi i64 [ 200,
>>>>                     >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] =
>>>>                     phi i32 [ 0, [[ENTRY]]
>>>>                     >>>> ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
>>>>                     >>>> @@ -1098,7 +1106,7 @@ define i32
>>>>                     @me_reduction(i32* %addr) {
>>>>                     >>>>    ; CHECK-NEXT: [[ACCUM:%.*]] = phi i32 [
>>>>                     [[BC_MERGE_RDX]],
>>>>                     >>>> [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.*]],
>>>>                     [[LOOP_LATCH]] ]
>>>>                     >>>>    ; CHECK-NEXT: [[GEP:%.*]] =
>>>>                     getelementptr i32, i32* [[ADDR]],
>>>>                     >>>> i64 [[IV]]
>>>>                     >>>>    ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] =
>>>>                     icmp eq i64 [[IV]], 200
>>>>                     >>>> -; CHECK-NEXT:    br i1 [[EXITCOND_NOT]],
>>>>                     label [[EXIT:%.*]], label
>>>>                     >>>> [[LOOP_LATCH]]
>>>>                     >>>> +; CHECK-NEXT:    br i1 [[EXITCOND_NOT]],
>>>>                     label [[EXIT]], label
>>>>                     >>>> [[LOOP_LATCH]]
>>>>                     >>>>    ; CHECK: loop.latch:
>>>>                     >>>>    ; CHECK-NEXT: [[TMP8:%.*]] = load i32,
>>>>                     i32* [[GEP]], align 4
>>>>                     >>>>    ; CHECK-NEXT: [[ACCUM_NEXT]] = add i32
>>>>                     [[ACCUM]], [[TMP8]]
>>>>                     >>>> @@ -1106,7 +1114,7 @@ define i32
>>>>                     @me_reduction(i32* %addr) {
>>>>                     >>>>    ; CHECK-NEXT: [[EXITCOND2_NOT:%.*]] =
>>>>                     icmp eq i64 [[IV]], 400
>>>>                     >>>>    ; CHECK-NEXT: br i1 [[EXITCOND2_NOT]],
>>>>                     label [[EXIT]], label
>>>>                     >>>> [[LOOP_HEADER]], [[LOOP15:!llvm.loop !.*]]
>>>>                     >>>>    ; CHECK: exit:
>>>>                     >>>> -; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [
>>>>                     0, [[LOOP_HEADER]] ], [
>>>>                     >>>> [[ACCUM_NEXT]], [[LOOP_LATCH]] ]
>>>>                     >>>> +; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [
>>>>                     0, [[LOOP_HEADER]] ], [
>>>>                     >>>> [[ACCUM_NEXT]], [[LOOP_LATCH]] ], [
>>>>                     [[TMP7]], [[MIDDLE_BLOCK]] ]
>>>>                     >>>>    ; CHECK-NEXT: ret i32 [[LCSSA]]
>>>>                     >>>>    ;
>>>>                     >>>>    ; TAILFOLD-LABEL: @me_reduction(
>>>>                     >>>>
>>>>                     >>>>
>>>>                     >>>>
>>>>                     >>>>
>>>>                     _______________________________________________
>>>>                     >>>> llvm-commits mailing list
>>>>                     >>>> llvm-commits at lists.llvm.org
>>>>                     <mailto:llvm-commits at lists.llvm.org>
>>>>                     >>>>
>>>>                     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>                     <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>>                     >>> _______________________________________________
>>>>                     >>> llvm-commits mailing list
>>>>                     >>> llvm-commits at lists.llvm.org
>>>>                     <mailto:llvm-commits at lists.llvm.org>
>>>>                     >>>
>>>>                     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>                     <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>>                     > _______________________________________________
>>>>                     > llvm-commits mailing list
>>>>                     > llvm-commits at lists.llvm.org
>>>>                     <mailto:llvm-commits at lists.llvm.org>
>>>>                     >
>>>>                     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>                     <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>>
>     _______________________________________________
>     llvm-commits mailing list
>     llvm-commits at lists.llvm.org
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>     <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210527/9cae13a9/attachment-0001.html>