[llvm] 7fe41ac - Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Thu May 27 15:01:58 PDT 2021
Thank you, thank you, thank you!
This should be enough for me to find and address the issue.
Thank you for the help!
Philip
On 5/27/21 11:29 AM, Stefan Pintilie wrote:
> Hi Philip,
> We have managed to track down more information on the issue causing
> the failure with the LV patch.
> It seems that Cost->requiresScalarEpilogue() sometimes returns true
> when the scalar epilogue is actually not required.
> One of the failing tests is:
> test-suite/SingleSource/Benchmarks/Misc/oourafft.c
> That test has a loop that runs a fixed 1024 times. If the max vector
> interleave is a factor of 1024 then Cost->requiresScalarEpilogue()
> still returns true even though the scalar epilogue is not actually
> required.
> The following examples were run on a Little Endian machine:
> FAIL - clang -DNDEBUG -fuse-ld=ld -O3 -DNDEBUG -w -Werror=date-time
> -mllvm -force-vector-interleave=4
> ${TESTSUITE}/SingleSource/Benchmarks/Misc/oourafft.c -lm
> FAIL - clang -DNDEBUG -fuse-ld=ld -O3 -DNDEBUG -w -Werror=date-time
> -mllvm -force-vector-interleave=8
> ${TESTSUITE}/SingleSource/Benchmarks/Misc/oourafft.c -lm
> PASS - clang -DNDEBUG -fuse-ld=ld -O3 -DNDEBUG -w -Werror=date-time
> -mllvm -force-vector-interleave=12
> ${TESTSUITE}/SingleSource/Benchmarks/Misc/oourafft.c -lm
> Looking at the IR immediately following LV we see this.
> Working IR:
> middle.block:
> %cmp.n = icmp eq i64 1024, 1024
> br i1 %cmp.n, label %for.end, label %scalar.ph
> Failing IR:
> middle.block:
> br label %scalar.ph
>
> We traced the problem to InterleavedAccessInfo::analyzeInterleaving
> where we set RequiresScalarEpilogue to true.
> We probably shouldn't be setting that to TRUE for this test
> case. Perhaps it matters that the loop wasn't actually vectorized in
> this case (VF == 1). That may impact the computation of
> RequiresScalarEpilogue.
> Hope this helps,
> Stefan
>
> ----- Original message -----
> From: Stefan Pintilie via llvm-commits <llvm-commits at lists.llvm.org>
> Sent by: "llvm-commits" <llvm-commits-bounces at lists.llvm.org>
> To: listmail at philipreames.com
> Cc: llvmlistbot at llvm.org, llvm-commits at lists.llvm.org,
> benny.kra at gmail.com, Nemanja Ivanovic <nemanjai at ca.ibm.com>,
> akuegel at google.com, LLVM on Power <powerllvm at ca.ibm.com>
> Subject: [EXTERNAL] RE: [llvm] 7fe41ac - Revert "[LV]
> Unconditionally branch from middle to scalar preheader if the
> scalar loop must execute"
> Date: Tue, May 25, 2021 7:53 PM
> Hi Philip,
>
> I have run a -print-after-all and I'm going to send you the IR
> before and after the Loop Vectorize pass. Hopefully this will help.
> Let me know if you need more information or if you want me to run
> something on my end.
> Best,
> Stefan
>
> ----- Original message -----
> From: Philip Reames <listmail at philipreames.com>
> To: Nemanja Ivanovic <nemanjai at ca.ibm.com>
> Cc: akuegel at google.com, benny.kra at gmail.com,
> llvm-commits at lists.llvm.org, llvmlistbot at llvm.org, LLVM on
> Power <powerllvm at ca.ibm.com>, Stefan Pintilie <stefanp at ca.ibm.com>
> Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert "[LV]
> Unconditionally branch from middle to scalar preheader if the
> scalar loop must execute"
> Date: Tue, May 25, 2021 5:38 PM
>
> If you have the before and after IR, that's really all I
> probably need. If you share those, I can take the
> investigation from there. Philip On 5/25/21 2:12 PM, Nemanja
> Ivanovic wrote:
>
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
>
> If you have the before and after IR, that's really all I
> probably need. If you share those, I can take the
> investigation from there.
>
> Philip
>
> On 5/25/21 2:12 PM, Nemanja Ivanovic wrote:
>> Hi Philip,
>> I am sorry about the late reply - this was a 4 day weekend
>> here in Toronto. We kind of started with looking at the
>> difference in the post-LV IR with and without your patch for
>> the SingleSource test case. Stefan is currently looking at it
>> and we're hoping to figure out what's going on soon. Would it
>> help you if we shared the two IR files with you to get your
>> opinion on what might be happening as well?
>> Nemanja Ivanovic
>> LLVM PPC Backend Development
>> IBM Toronto Lab
>> Email: nemanjai at ca.ibm.com <mailto:nemanjai at ca.ibm.com>
>> Phone: 905-413-3388
>>
>> ----- Original message -----
>> From: Philip Reames <listmail at philipreames.com>
>> <mailto:listmail at philipreames.com>
>> To: Nemanja Ivanovic <nemanjai at ca.ibm.com>
>> <mailto:nemanjai at ca.ibm.com>
>> Cc: akuegel at google.com <mailto:akuegel at google.com>,
>> benny.kra at gmail.com <mailto:benny.kra at gmail.com>,
>> llvm-commits at lists.llvm.org
>> <mailto:llvm-commits at lists.llvm.org>,
>> llvmlistbot at llvm.org <mailto:llvmlistbot at llvm.org>, LLVM
>> on Power <powerllvm at ca.ibm.com> <mailto:powerllvm at ca.ibm.com>
>> Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert "[LV]
>> Unconditionally branch from middle to scalar preheader if
>> the scalar loop must execute"
>> Date: Wed, May 19, 2021 2:15 PM
>>
>> Looking at the various failures, I see both LE and BE
>> bots failing. The BE bot shows a miscompile in one of
>> the test suite benchmarks. The LE bots appear to be
>> showing a crash while using a stage1 build clang to build
>> stage2 clang.
>>
>> LE example:
>> https://lab.llvm.org/buildbot#builders/19/builds/4236
>> <https://lab.llvm.org/buildbot#builders/19/builds/4236>
>> BE example:
>> https://lab.llvm.org/buildbot#builders/100/builds/5762
>> <https://lab.llvm.org/buildbot#builders/100/builds/5762>
>>
>> So both are showing miscompiles, just with different
>> symptoms. Frankly, the BE looks easier to debug (much
>> code making it into the miscompiled binary.)
>>
>> Oddly, pretty much only PPC bots are failing. The one
>> exception is a stage2 failure on AArch64
>> (https://lab.llvm.org/buildbot/#/builders/111/builds/2027
>> <https://lab.llvm.org/buildbot/#/builders/111/builds/2027>)
>> which looks similar to the LE failure above.
>>
>> Given this appears to be target specific, I am *guessing*
>> there's some vectorizer hook which is causing a different
>> codepath to be executed. Before we start trying to get me
>> access to hardware, do you have any guesses on what that
>> hook might be? If you can give me a good hint on where
>> to look, I suspect I can probably find the issue that way.
>>
>> Philip
>>
>> On 5/18/21 3:07 AM, Nemanja Ivanovic wrote:
>>> Hi Philip,
>>> I am not sure what happened with your first attempt to
>>> contact us and how we missed it. We would be more than
>>> happy to help you debug this issue. Do you know if this
>>> only affects the big endian bot or if it also fails on
>>> little endian bots? In the latter case, we can certainly
>>> provide access to a little endian machine hosted at
>>> OSU/OSL. In the former case, we don't have a machine
>>> available and we'll have to do the debugging and report
>>> to you (which might take a bit longer).
>>> Nemanja Ivanovic
>>> LLVM PPC Backend Development
>>> IBM Toronto Lab
>>> Email: nemanjai at ca.ibm.com <mailto:nemanjai at ca.ibm.com>
>>> Phone: 905-413-3388
>>>
>>> ----- Original message -----
>>> From: Philip Reames <listmail at philipreames.com>
>>> <mailto:listmail at philipreames.com>
>>> To: Adrian Kuegel <akuegel at google.com>
>>> <mailto:akuegel at google.com>
>>> Cc: Benjamin Kramer <benny.kra at gmail.com>
>>> <mailto:benny.kra at gmail.com>, Adrian Kuegel
>>> <llvmlistbot at llvm.org>
>>> <mailto:llvmlistbot at llvm.org>, llvm-commits
>>> <llvm-commits at lists.llvm.org>
>>> <mailto:llvm-commits at lists.llvm.org>,
>>> powerllvm at ca.ibm.com <mailto:powerllvm at ca.ibm.com>
>>> Subject: [EXTERNAL] Re: [llvm] 7fe41ac - Revert
>>> "[LV] Unconditionally branch from middle to scalar
>>> preheader if the scalar loop must execute"
>>> Date: Mon, May 17, 2021 11:59 PM
>>>
>>> I tried another cycle to see if this had been
>>> resolved, but am still seeing build bot failures,
>>> nearly exclusively on PPC. (And one arm self host
>>> bot.) @PPC Bot Owner - I need help reducing a test
>>> case for the failure seen here:
>>> ZjQcmQRYFpfptBannerStart
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> ZjQcmQRYFpfptBannerEnd
>>>
>>> I tried another cycle to see if this had been
>>> resolved, but am still seeing build bot failures,
>>> nearly exclusively on PPC. (And one arm self host
>>> bot.)
>>>
>>> @PPC Bot Owner - I need help reducing a test case
>>> for the failure seen here:
>>> https://lab.llvm.org/buildbot#builders/100/builds/5762
>>> <https://lab.llvm.org/buildbot#builders/100/builds/5762>.
>>> I am stuck, and unable to make progress for nearly 3
>>> months now. I would greatly appreciate help.
>>>
>>> Philip
>>>
>>> On 5/5/21 11:59 PM, Adrian Kuegel wrote:
>>>> Sounds good to me, and thanks for sending the heads
>>>> up :)
>>>> On Wed, May 5, 2021 at 10:01 PM Philip Reames
>>>> <listmail at philipreames.com
>>>> <mailto:listmail at philipreames.com>> wrote:
>>>>
>>>> FYI, I'm going to try recommitting this without
>>>> changes in a day or so.
>>>>
>>>> I never heard back from a PPC bot owner, and I
>>>> don't have enough
>>>> information to really debug anything from the
>>>> builtbot log. I did run
>>>> across a latent issue which this patch may very
>>>> well have exposed at
>>>> much higher frequency; the previous patch in
>>>> this series (which is much
>>>> more restrictive) does appear to have increased
>>>> frequency. That was
>>>> worked around in 80e80250. My educated guess
>>>> is that same issue
>>>> triggered the miscompile seen on the ppc bot,
>>>> but that is more of a
>>>> guess than I'd really prefer.
>>>>
>>>> I'm going to submit this during off hours, and
>>>> watch the bots fairly
>>>> closely after submit. Hopefully this either
>>>> cycles clean or I get a
>>>> better clue as to what the root issue is.
>>>>
>>>> Philip
>>>>
>>>> On 2/8/21 9:09 PM, Philip Reames via
>>>> llvm-commits wrote:
>>>> > Ben,
>>>> >
>>>> > Thanks for the clarification. The log does
>>>> not make the fact this is
>>>> > an execution failure obvious.
>>>> >
>>>> > No, I don't have access to a PPC machine.
>>>> >
>>>> > I am going to need some assistance from the
>>>> bot owner on this. At a
>>>> > minimum, IR for the test in question (before
>>>> optimization, but on
>>>> > target platform) seems like a reasonable ask.
>>>> >
>>>> > I strongly suspect this change is simply
>>>> exposing another latent
>>>> > issue. Or at least, I've reviewed the change
>>>> and don't see anything
>>>> > likely to cause runtime crashes w/o also
>>>> tripping compiler asserts.
>>>> >
>>>> > Philip
>>>> >
>>>> >
>>>> > On 2/8/21 5:21 AM, Benjamin Kramer wrote:
>>>> >> `execution_time` failures mean that the bot
>>>> succeeded building a test
>>>> >> but it failed when running it. I'm
>>>> relatively certain that this is the
>>>> >> same issue Adrian is seeing -- binaries
>>>> segfaulting early on PPC.
>>>> >>
>>>> >> The bot log output isn't helpful at all for
>>>> investigating why this is
>>>> >> happening. Do you happen to have access to a
>>>> PPC machine?
>>>> >>
>>>> >> On Fri, Feb 5, 2021 at 6:05 PM Philip Reames
>>>> via llvm-commits
>>>> >> <llvm-commits at lists.llvm.org
>>>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>> >>> Adrian,
>>>> >>>
>>>> >>> I'm going to need you to provide a bit more
>>>> information here. The test
>>>> >>> failure in stage1 was fixed at the time you
>>>> reverted this patch. The
>>>> >>> remaining failure in the bot is very
>>>> unclear. What is a execution_time
>>>> >>> failure? From the log output, the "failing"
>>>> run finished in 0.5
>>>> >>> seconds,
>>>> >>> whereas the previous "succeeding" run
>>>> finished in 11 seconds. Without
>>>> >>> further context, I'd say that's no failure.
>>>> >>>
>>>> >>> I'll also note that I did not receive email
>>>> from this bot. I received
>>>> >>> notice from the various other bots and
>>>> fixed the ARM test issue, but
>>>> >>> unless I missed it in with the others, this
>>>> bot is not notifying.
>>>> >>>
>>>> >>> In general, I'm a fan of fast reverts, but
>>>> I have to admit, this one
>>>> >>> appears borderline at the moment.
>>>> >>>
>>>> >>> Philip
>>>> >>>
>>>> >>> On 2/5/21 3:53 AM, Adrian Kuegel via
>>>> llvm-commits wrote:
>>>> >>>> Author: Adrian Kuegel
>>>> >>>> Date: 2021-02-05T12:51:03+01:00
>>>> >>>> New Revision:
>>>> 7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c
>>>> >>>>
>>>> >>>> URL:
>>>> >>>>
>>>> https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c
>>>> <https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c>
>>>> >>>> DIFF:
>>>> >>>>
>>>> https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c.diff
>>>> <https://github.com/llvm/llvm-project/commit/7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c.diff>
>>>> >>>>
>>>> >>>> LOG: Revert "[LV] Unconditionally branch
>>>> from middle to scalar
>>>> >>>> preheader if the scalar loop must execute"
>>>> >>>>
>>>> >>>> This reverts commit
>>>> 3e5ce49e5371ce4feadbf97dd5c2b652d9db3d1d.
>>>> >>>>
>>>> >>>> Tests started failing on PPC, for example:
>>>> >>>>
>>>> http://lab.llvm.org:8011/#/builders/105/builds/5569
>>>> <http://lab.llvm.org:8011/#/builders/105/builds/5569>
>>>> >>>>
>>>> >>>> Added:
>>>> >>>>
>>>> >>>>
>>>> >>>> Modified:
>>>> >>>> llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>> >>>>
>>>> llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>> >>>>
>>>> llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>> >>>>
>>>> llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>> >>>>
>>>> llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>> >>>>
>>>> >>>> Removed:
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> ################################################################################
>>>> >>>>
>>>> >>>> diff --git
>>>> a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>> >>>> b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>> >>>> index 8a89158788cf..de4fb446fdf2 100644
>>>> >>>> ---
>>>> a/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>> >>>> +++
>>>> b/llvm/lib/Transforms/Utils/LoopVersioning.cpp
>>>> >>>> @@ -44,11 +44,11 @@
>>>> LoopVersioning::LoopVersioning(const
>>>> >>>> LoopAccessInfo &LAI,
>>>> >>>> AliasChecks(Checks.begin(), Checks.end()),
>>>> >>>> Preds(LAI.getPSE().getUnionPredicate()),
>>>> LAI(LAI), LI(LI),
>>>> >>>> DT(DT),
>>>> >>>> SE(SE) {
>>>> >>>> + assert(L->getUniqueExitBlock() && "No
>>>> single exit block");
>>>> >>>> }
>>>> >>>>
>>>> >>>> void LoopVersioning::versionLoop(
>>>> >>>> const SmallVectorImpl<Instruction
>>>> *> &DefsUsedOutside) {
>>>> >>>> -
>>>> assert(VersionedLoop->getUniqueExitBlock() &&
>>>> "No single exit
>>>> >>>> block");
>>>> >>>> assert(VersionedLoop->isLoopSimplifyForm() &&
>>>> >>>> "Loop is not in loop-simplify
>>>> form");
>>>> >>>>
>>>> >>>>
>>>> >>>> diff --git
>>>> a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>> >>>>
>>>> b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>> >>>> index 3277842edbfe..6bce0caeb36f 100644
>>>> >>>> ---
>>>> a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>> >>>> +++
>>>> b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>> >>>> @@ -852,7 +852,7 @@ class
>>>> InnerLoopVectorizer {
>>>> >>>> /// Middle Block between the vector
>>>> and the scalar.
>>>> >>>> BasicBlock *LoopMiddleBlock;
>>>> >>>>
>>>> >>>> - /// The unique ExitBlock of the scalar
>>>> loop if one exists. Note
>>>> >>>> that
>>>> >>>> + /// The (unique) ExitBlock of the
>>>> scalar loop. Note that
>>>> >>>> /// there can be multiple exiting
>>>> edges reaching this block.
>>>> >>>> BasicBlock *LoopExitBlock;
>>>> >>>>
>>>> >>>> @@ -3147,13 +3147,9 @@ void
>>>> >>>>
>>>> InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop
>>>> *L,
>>>> >>>> DT->getNode(Bypass)->getIDom()) &&
>>>> >>>> "TC check is expected to
>>>> dominate Bypass");
>>>> >>>>
>>>> >>>> - // Update dominator for Bypass &
>>>> LoopExit (if needed).
>>>> >>>> + // Update dominator for Bypass & LoopExit.
>>>> >>>> DT->changeImmediateDominator(Bypass,
>>>> TCCheckBlock);
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - // If there is an epilogue which must
>>>> run, there's no edge
>>>> >>>> from the
>>>> >>>> - // middle block to exit blocks and
>>>> thus no need to update the
>>>> >>>> immediate
>>>> >>>> - // dominator of the exit blocks.
>>>> >>>> -
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> TCCheckBlock);
>>>> >>>> +
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> TCCheckBlock);
>>>> >>>>
>>>> >>>> ReplaceInstWithInst(
>>>> >>>> TCCheckBlock->getTerminator(),
>>>> >>>> @@ -3192,11 +3188,7 @@ void
>>>> >>>> InnerLoopVectorizer::emitSCEVChecks(Loop
>>>> *L, BasicBlock *Bypass) {
>>>> >>>> // Update dominator only if this is
>>>> first RT check.
>>>> >>>> if (LoopBypassBlocks.empty()) {
>>>> >>>> DT->changeImmediateDominator(Bypass,
>>>> SCEVCheckBlock);
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - // If there is an epilogue which
>>>> must run, there's no edge
>>>> >>>> from the
>>>> >>>> - // middle block to exit blocks and
>>>> thus no need to update
>>>> >>>> the immediate
>>>> >>>> - // dominator of the exit blocks.
>>>> >>>> -
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> SCEVCheckBlock);
>>>> >>>> +
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> SCEVCheckBlock);
>>>> >>>> }
>>>> >>>>
>>>> >>>> ReplaceInstWithInst(
>>>> >>>> @@ -3252,11 +3244,7 @@ void
>>>> >>>>
>>>> InnerLoopVectorizer::emitMemRuntimeChecks(Loop
>>>> *L, BasicBlock
>>>> >>>> *Bypass) {
>>>> >>>> // Update dominator only if this is
>>>> first RT check.
>>>> >>>> if (LoopBypassBlocks.empty()) {
>>>> >>>> DT->changeImmediateDominator(Bypass,
>>>> MemCheckBlock);
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - // If there is an epilogue which
>>>> must run, there's no edge
>>>> >>>> from the
>>>> >>>> - // middle block to exit blocks and
>>>> thus no need to update
>>>> >>>> the immediate
>>>> >>>> - // dominator of the exit blocks.
>>>> >>>> -
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> MemCheckBlock);
>>>> >>>> +
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> MemCheckBlock);
>>>> >>>> }
>>>> >>>>
>>>> >>>> Instruction *FirstCheckInst;
>>>> >>>> @@ -3381,10 +3369,9 @@ Value
>>>> >>>> *InnerLoopVectorizer::emitTransformedIndex(
>>>> >>>> Loop
>>>> *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>> >>>> Prefix) {
>>>> >>>> LoopScalarBody = OrigLoop->getHeader();
>>>> >>>> LoopVectorPreHeader =
>>>> OrigLoop->getLoopPreheader();
>>>> >>>> + LoopExitBlock =
>>>> OrigLoop->getUniqueExitBlock();
>>>> >>>> + assert(LoopExitBlock && "Must have an
>>>> exit block");
>>>> >>>> assert(LoopVectorPreHeader && "Invalid
>>>> loop structure");
>>>> >>>> - LoopExitBlock =
>>>> OrigLoop->getUniqueExitBlock(); // may be nullptr
>>>> >>>> - assert((LoopExitBlock ||
>>>> Cost->requiresScalarEpilogue()) &&
>>>> >>>> - "multiple exit loop without
>>>> required epilogue?");
>>>> >>>>
>>>> >>>> LoopMiddleBlock =
>>>> >>>> SplitBlock(LoopVectorPreHeader,
>>>> >>>> LoopVectorPreHeader->getTerminator(), DT,
>>>> >>>> @@ -3393,20 +3380,12 @@ Loop
>>>> >>>>
>>>> *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>> Prefix) {
>>>> >>>> SplitBlock(LoopMiddleBlock,
>>>> >>>> LoopMiddleBlock->getTerminator(), DT, LI,
>>>> >>>> nullptr, Twine(Prefix) + "scalar.ph
>>>> <http://scalar.ph>");
>>>> >>>>
>>>> >>>> + // Set up branch from middle block to
>>>> the exit and scalar
>>>> >>>> preheader blocks.
>>>> >>>> + // completeLoopSkeleton will update the
>>>> condition to use an
>>>> >>>> iteration check,
>>>> >>>> + // if required to decide whether to
>>>> execute the remainder.
>>>> >>>> + BranchInst *BrInst =
>>>> >>>> + BranchInst::Create(LoopExitBlock,
>>>> LoopScalarPreHeader,
>>>> >>>> Builder.getTrue());
>>>> >>>> auto *ScalarLatchTerm =
>>>> >>>> OrigLoop->getLoopLatch()->getTerminator();
>>>> >>>> -
>>>> >>>> - // Set up the middle block terminator.
>>>> Two cases:
>>>> >>>> - // 1) If we know that we must execute
>>>> the scalar epilogue, emit an
>>>> >>>> - // unconditional branch.
>>>> >>>> - // 2) Otherwise, we must have a single
>>>> unique exit block (due to
>>>> >>>> how we
>>>> >>>> - // implement the multiple exit
>>>> case). In this case, set up a
>>>> >>>> conditonal
>>>> >>>> - // branch from the middle block to
>>>> the loop scalar preheader,
>>>> >>>> and the
>>>> >>>> - // exit block. completeLoopSkeleton
>>>> will update the
>>>> >>>> condition to use an
>>>> >>>> - // iteration check, if required to
>>>> decide whether to execute
>>>> >>>> the remainder.
>>>> >>>> - BranchInst *BrInst =
>>>> Cost->requiresScalarEpilogue() ?
>>>> >>>> - BranchInst::Create(LoopScalarPreHeader) :
>>>> >>>> - BranchInst::Create(LoopExitBlock,
>>>> LoopScalarPreHeader,
>>>> >>>> - Builder.getTrue());
>>>> >>>>
>>>> BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
>>>> >>>>
>>>> ReplaceInstWithInst(LoopMiddleBlock->getTerminator(),
>>>> BrInst);
>>>> >>>>
>>>> >>>> @@ -3418,11 +3397,7 @@ Loop
>>>> >>>>
>>>> *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef
>>>> Prefix) {
>>>> >>>> nullptr, nullptr, Twine(Prefix) +
>>>> "vector.body");
>>>> >>>>
>>>> >>>> // Update dominator for loop exit.
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - // If there is an epilogue which must
>>>> run, there's no edge
>>>> >>>> from the
>>>> >>>> - // middle block to exit blocks and
>>>> thus no need to update the
>>>> >>>> immediate
>>>> >>>> - // dominator of the exit blocks.
>>>> >>>> -
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> LoopMiddleBlock);
>>>> >>>> +
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> LoopMiddleBlock);
>>>> >>>>
>>>> >>>> // Create and register the new vector
>>>> loop.
>>>> >>>> Loop *Lp = LI->AllocateLoop();
>>>> >>>> @@ -3519,14 +3494,10 @@ BasicBlock
>>>> >>>>
>>>> *InnerLoopVectorizer::completeLoopSkeleton(Loop *L,
>>>> >>>> auto *ScalarLatchTerm =
>>>> >>>> OrigLoop->getLoopLatch()->getTerminator();
>>>> >>>>
>>>> >>>> // Add a check in the middle block to
>>>> see if we have completed
>>>> >>>> - // all of the iterations in the first
>>>> vector loop. Three cases:
>>>> >>>> - // 1) If we require a scalar epilogue,
>>>> there is no conditional
>>>> >>>> branch as
>>>> >>>> - // we unconditionally branch to the
>>>> scalar preheader. Do
>>>> >>>> nothing.
>>>> >>>> - // 2) If (N - N%VF) == N, then we
>>>> *don't* need to run the
>>>> >>>> remainder.
>>>> >>>> - // Thus if tail is to be folded, we
>>>> know we don't need to run
>>>> >>>> the
>>>> >>>> - // remainder and we can use the
>>>> previous value for the
>>>> >>>> condition (true).
>>>> >>>> - // 3) Otherwise, construct a runtime check.
>>>> >>>> - if (!Cost->requiresScalarEpilogue() &&
>>>> >>>> !Cost->foldTailByMasking()) {
>>>> >>>> + // all of the iterations in the first
>>>> vector loop.
>>>> >>>> + // If (N - N%VF) == N, then we *don't*
>>>> need to run the remainder.
>>>> >>>> + // If tail is to be folded, we know we
>>>> don't need to run the
>>>> >>>> remainder.
>>>> >>>> + if (!Cost->foldTailByMasking()) {
>>>> >>>> Instruction *CmpN =
>>>> CmpInst::Create(Instruction::ICmp,
>>>> >>>> CmpInst::ICMP_EQ,
>>>> >>>> Count, VectorTripCount,
>>>> >>>> "cmp.n",
>>>> >>>> LoopMiddleBlock->getTerminator());
>>>> >>>> @@ -3590,17 +3561,17 @@ BasicBlock
>>>> >>>>
>>>> *InnerLoopVectorizer::createVectorizedLoopSkeleton()
>>>> {
>>>> >>>> | [ ]_| <-- vector loop.
>>>> >>>> | |
>>>> >>>> | v
>>>> >>>> - \ -[ ] <--- middle-block.
>>>> >>>> - \/ |
>>>> >>>> - /\ v
>>>> >>>> - | ->[ ] <--- new preheader.
>>>> >>>> + | -[ ] <--- middle-block.
>>>> >>>> + | / |
>>>> >>>> + | / v
>>>> >>>> + -|- >[ ] <--- new preheader.
>>>> >>>> | |
>>>> >>>> - (opt) v <-- edge from middle to exit
>>>> iff epilogue is not
>>>> >>>> required.
>>>> >>>> + | v
>>>> >>>> | [ ] \
>>>> >>>> - | [ ]_| <-- old scalar loop to
>>>> handle remainder (scalar
>>>> >>>> epilogue).
>>>> >>>> + | [ ]_| <-- old scalar loop to
>>>> handle remainder.
>>>> >>>> \ |
>>>> >>>> \ v
>>>> >>>> - >[ ] <-- exit block(s).
>>>> >>>> + >[ ] <-- exit block.
>>>> >>>> ...
>>>> >>>> */
>>>> >>>>
>>>> >>>> @@ -4021,18 +3992,13 @@ void
>>>> >>>> InnerLoopVectorizer::fixVectorizedLoop() {
>>>> >>>> // Forget the original basic block.
>>>> >>>> PSE.getSE()->forgetLoop(OrigLoop);
>>>> >>>>
>>>> >>>> - // If we inserted an edge from the
>>>> middle block to the unique
>>>> >>>> exit block,
>>>> >>>> - // update uses outside the loop (phis)
>>>> to account for the newly
>>>> >>>> inserted
>>>> >>>> - // edge.
>>>> >>>> - if (!Cost->requiresScalarEpilogue()) {
>>>> >>>> - // Fix-up external users of the
>>>> induction variables.
>>>> >>>> - for (auto &Entry :
>>>> Legal->getInductionVars())
>>>> >>>> - fixupIVUsers(Entry.first, Entry.second,
>>>> >>>> -
>>>> getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
>>>> >>>> - IVEndValues[Entry.first], LoopMiddleBlock);
>>>> >>>> + // Fix-up external users of the
>>>> induction variables.
>>>> >>>> + for (auto &Entry :
>>>> Legal->getInductionVars())
>>>> >>>> + fixupIVUsers(Entry.first, Entry.second,
>>>> >>>> +
>>>> getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
>>>> >>>> + IVEndValues[Entry.first], LoopMiddleBlock);
>>>> >>>>
>>>> >>>> - fixLCSSAPHIs();
>>>> >>>> - }
>>>> >>>> + fixLCSSAPHIs();
>>>> >>>> for (Instruction *PI :
>>>> PredicatedInstructions)
>>>> >>>> sinkScalarOperands(&*PI);
>>>> >>>>
>>>> >>>> @@ -4250,13 +4216,12 @@ void
>>>> >>>>
>>>> InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode
>>>> *Phi) {
>>>> >>>> // recurrence in the exit block, and
>>>> then add an edge for the
>>>> >>>> middle block.
>>>> >>>> // Note that LCSSA does not imply
>>>> single entry when the
>>>> >>>> original scalar loop
>>>> >>>> // had multiple exiting edges (as we
>>>> always run the last
>>>> >>>> iteration in the
>>>> >>>> - // scalar epilogue); in that case,
>>>> there is no edge from middle
>>>> >>>> to exit and
>>>> >>>> - // and thus no phis which needed updated.
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - for (PHINode &LCSSAPhi :
>>>> LoopExitBlock->phis())
>>>> >>>> - if (any_of(LCSSAPhi.incoming_values(),
>>>> >>>> - [Phi](Value *V) { return V == Phi; }))
>>>> >>>> -
>>>> LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop,
>>>> >>>> LoopMiddleBlock);
>>>> >>>> + // scalar epilogue); in that case, the
>>>> exiting path through
>>>> >>>> middle will be
>>>> >>>> + // dynamically dead and the value
>>>> picked for the phi doesn't
>>>> >>>> matter.
>>>> >>>> + for (PHINode &LCSSAPhi :
>>>> LoopExitBlock->phis())
>>>> >>>> + if (any_of(LCSSAPhi.incoming_values(),
>>>> >>>> + [Phi](Value *V) { return V == Phi; }))
>>>> >>>> +
>>>> LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop,
>>>> >>>> LoopMiddleBlock);
>>>> >>>> }
>>>> >>>>
>>>> >>>> void
>>>> InnerLoopVectorizer::fixReduction(PHINode *Phi) {
>>>> >>>> @@ -4421,11 +4386,10 @@ void
>>>> >>>> InnerLoopVectorizer::fixReduction(PHINode
>>>> *Phi) {
>>>> >>>> // We know that the loop is in LCSSA
>>>> form. We need to update
>>>> >>>> the PHI nodes
>>>> >>>> // in the exit blocks. See comment
>>>> on analogous loop in
>>>> >>>> // fixFirstOrderRecurrence for a more
>>>> complete explaination of
>>>> >>>> the logic.
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - for (PHINode &LCSSAPhi :
>>>> LoopExitBlock->phis())
>>>> >>>> - if (any_of(LCSSAPhi.incoming_values(),
>>>> >>>> - [LoopExitInst](Value *V) { return V ==
>>>> >>>> LoopExitInst; }))
>>>> >>>> - LCSSAPhi.addIncoming(ReducedPartRdx,
>>>> LoopMiddleBlock);
>>>> >>>> + for (PHINode &LCSSAPhi :
>>>> LoopExitBlock->phis())
>>>> >>>> + if (any_of(LCSSAPhi.incoming_values(),
>>>> >>>> + [LoopExitInst](Value *V) { return V ==
>>>> >>>> LoopExitInst; }))
>>>> >>>> + LCSSAPhi.addIncoming(ReducedPartRdx,
>>>> LoopMiddleBlock);
>>>> >>>>
>>>> >>>> // Fix the scalar loop reduction
>>>> variable with the incoming
>>>> >>>> reduction sum
>>>> >>>> // from the vector body and from the
>>>> backedge value.
>>>> >>>> @@ -8074,11 +8038,7 @@ BasicBlock
>>>> >>>>
>>>> *EpilogueVectorizerMainLoop::emitMinimumIterationCountCheck(
>>>> >>>>
>>>> >>>> // Update dominator for Bypass &
>>>> LoopExit.
>>>> >>>> DT->changeImmediateDominator(Bypass,
>>>> TCCheckBlock);
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - // For loops with multiple exits,
>>>> there's no edge from the
>>>> >>>> middle block
>>>> >>>> - // to exit blocks (as the epilogue
>>>> must run) and thus no
>>>> >>>> need to update
>>>> >>>> - // the immediate dominator of the
>>>> exit blocks.
>>>> >>>> -
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> TCCheckBlock);
>>>> >>>> +
>>>> DT->changeImmediateDominator(LoopExitBlock,
>>>> TCCheckBlock);
>>>> >>>>
>>>> >>>> LoopBypassBlocks.push_back(TCCheckBlock);
>>>> >>>>
>>>> >>>> @@ -8142,12 +8102,7 @@
>>>> >>>>
>>>> EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton()
>>>> >>>> {
>>>> >>>>
>>>> >>>>
>>>> DT->changeImmediateDominator(LoopScalarPreHeader,
>>>> >>>> EPI.EpilogueIterationCountCheck);
>>>> >>>> - if (!Cost->requiresScalarEpilogue())
>>>> >>>> - // If there is an epilogue which must
>>>> run, there's no edge
>>>> >>>> from the
>>>> >>>> - // middle block to exit blocks and
>>>> thus no need to update the
>>>> >>>> immediate
>>>> >>>> - // dominator of the exit blocks.
>>>> >>>> - DT->changeImmediateDominator(LoopExitBlock,
>>>> >>>> - EPI.EpilogueIterationCountCheck);
>>>> >>>> + DT->changeImmediateDominator(LoopExitBlock,
>>>> >>>> EPI.EpilogueIterationCountCheck);
>>>> >>>>
>>>> >>>> // Keep track of bypass blocks, as
>>>> they feed start values to
>>>> >>>> the induction
>>>> >>>> // phis in the scalar loop preheader.
>>>> >>>>
>>>> >>>> diff --git
>>>> >>>>
>>>> a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>> >>>>
>>>> b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>> >>>> index ec280bf5d5e4..7d4a3c5c9935 100644
>>>> >>>> ---
>>>> >>>>
>>>> a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>> >>>> +++
>>>> >>>>
>>>> b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
>>>> >>>> @@ -471,9 +471,10 @@ define i16
>>>> @multiple_exit(i16* %p, i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP15:%.*]] = icmp eq
>>>> i32 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP15]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP6:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>> [[TMP2]], [[N_VEC]]
>>>> >>>> ; CHECK-NEXT:
>>>> [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement
>>>> >>>> <4 x i16> [[WIDE_LOAD]], i32 3
>>>> >>>> ; CHECK-NEXT:
>>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] =
>>>> >>>> extractelement <4 x i16> [[WIDE_LOAD]], i32 2
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[IF_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]]
>>>> = phi i16 [ 0,
>>>> >>>> [[ENTRY:%.*]] ], [
>>>> [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i32 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>> >>>> @@ -485,14 +486,14 @@ define i16
>>>> @multiple_exit(i16* %p, i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>> inbounds i16, i16*
>>>> >>>> [[P]], i64 [[IPROM]]
>>>> >>>> ; CHECK-NEXT: [[REC_NEXT]] = load i16,
>>>> i16* [[B]], align 2
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>> i32 [[I]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[IF_END:%.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label [[IF_END]]
>>>> >>>> ; CHECK: for.body:
>>>> >>>> ; CHECK-NEXT: store i16
>>>> [[SCALAR_RECUR]], i16* [[B]], align 4
>>>> >>>> ; CHECK-NEXT: [[INC]] = add nsw i32
>>>> [[I]], 1
>>>> >>>> ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>> i32 [[I]], 2096
>>>> >>>> ; CHECK-NEXT: br i1 [[CMP2]], label
>>>> [[FOR_COND]], label
>>>> >>>> [[IF_END]], [[LOOP7:!llvm.loop !.*]]
>>>> >>>> ; CHECK: if.end:
>>>> >>>> -; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>> [ [[SCALAR_RECUR]],
>>>> >>>> [[FOR_BODY]] ], [ [[SCALAR_RECUR]],
>>>> [[FOR_COND]] ]
>>>> >>>> +; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>> [ [[SCALAR_RECUR]],
>>>> >>>> [[FOR_BODY]] ], [ [[SCALAR_RECUR]],
>>>> [[FOR_COND]] ], [
>>>> >>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI]],
>>>> [[MIDDLE_BLOCK]] ]
>>>> >>>> ; CHECK-NEXT: ret i16 [[REC_LCSSA]]
>>>> >>>> ;
>>>> >>>> entry:
>>>> >>>> @@ -557,9 +558,10 @@ define i16
>>>> @multiple_exit2(i16* %p, i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP15:%.*]] = icmp eq
>>>> i32 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP15]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP8:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>> [[TMP2]], [[N_VEC]]
>>>> >>>> ; CHECK-NEXT:
>>>> [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement
>>>> >>>> <4 x i16> [[WIDE_LOAD]], i32 3
>>>> >>>> ; CHECK-NEXT:
>>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] =
>>>> >>>> extractelement <4 x i16> [[WIDE_LOAD]], i32 2
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[IF_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]]
>>>> = phi i16 [ 0,
>>>> >>>> [[ENTRY:%.*]] ], [
>>>> [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i32 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>> >>>> @@ -571,14 +573,14 @@ define i16
>>>> @multiple_exit2(i16* %p, i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>> inbounds i16, i16*
>>>> >>>> [[P]], i64 [[IPROM]]
>>>> >>>> ; CHECK-NEXT: [[REC_NEXT]] = load i16,
>>>> i16* [[B]], align 2
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>> i32 [[I]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[IF_END:%.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label [[IF_END]]
>>>> >>>> ; CHECK: for.body:
>>>> >>>> ; CHECK-NEXT: store i16
>>>> [[SCALAR_RECUR]], i16* [[B]], align 4
>>>> >>>> ; CHECK-NEXT: [[INC]] = add nsw i32
>>>> [[I]], 1
>>>> >>>> ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>> i32 [[I]], 2096
>>>> >>>> ; CHECK-NEXT: br i1 [[CMP2]], label
>>>> [[FOR_COND]], label
>>>> >>>> [[IF_END]], [[LOOP9:!llvm.loop !.*]]
>>>> >>>> ; CHECK: if.end:
>>>> >>>> -; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>> [ [[SCALAR_RECUR]],
>>>> >>>> [[FOR_COND]] ], [ 10, [[FOR_BODY]] ]
>>>> >>>> +; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16
>>>> [ [[SCALAR_RECUR]],
>>>> >>>> [[FOR_COND]] ], [ 10, [[FOR_BODY]] ], [
>>>> >>>> [[VECTOR_RECUR_EXTRACT_FOR_PHI]],
>>>> [[MIDDLE_BLOCK]] ]
>>>> >>>> ; CHECK-NEXT: ret i16 [[REC_LCSSA]]
>>>> >>>> ;
>>>> >>>> entry:
>>>> >>>>
>>>> >>>> diff --git
>>>> >>>>
>>>> a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>> >>>>
>>>> b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>> >>>> index f0ba677348ab..0d4bdf0ecac3 100644
>>>> >>>> ---
>>>> a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>> >>>> +++
>>>> b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
>>>> >>>> @@ -447,7 +447,7 @@ define void
>>>> @even_load_static_tc(i32* noalias
>>>> >>>> nocapture readonly %A, i32* noalia
>>>> >>>> ; CHECK-NEXT: [[TMP6:%.*]] = icmp eq
>>>> i64 [[INDEX_NEXT]], 508
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP6]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP12:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 false, label
>>>> [[FOR_COND_CLEANUP:%.*]],
>>>> >>>> label [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ 1016,
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>> >>>> @@ -463,7 +463,7 @@ define void
>>>> @even_load_static_tc(i32* noalias
>>>> >>>> nocapture readonly %A, i32* noalia
>>>> >>>> ; CHECK-NEXT: store i32 [[MUL]], i32*
>>>> [[ARRAYIDX2]], align 4
>>>> >>>> ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add
>>>> nuw nsw i64
>>>> >>>> [[INDVARS_IV]], 2
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp ult
>>>> i64 [[INDVARS_IV]], 1022
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_COND_CLEANUP:%.*]],
>>>> [[LOOP13:!llvm.loop !.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_COND_CLEANUP]], [[LOOP13:!llvm.loop
>>>> !.*]]
>>>> >>>> ;
>>>> >>>> entry:
>>>> >>>> br label %for.body
>>>> >>>> @@ -528,7 +528,7 @@ define void
>>>> @even_load_dynamic_tc(i32* noalias
>>>> >>>> nocapture readonly %A, i32* noali
>>>> >>>> ; CHECK-NEXT: [[TMP12:%.*]] = icmp eq
>>>> i64 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP12]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP14:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 false, label
>>>> [[FOR_COND_CLEANUP:%.*]],
>>>> >>>> label [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ [[IND_END]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>> >>>> @@ -544,7 +544,7 @@ define void
>>>> @even_load_dynamic_tc(i32* noalias
>>>> >>>> nocapture readonly %A, i32* noali
>>>> >>>> ; CHECK-NEXT: store i32 [[MUL]], i32*
>>>> [[ARRAYIDX2]], align 4
>>>> >>>> ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add
>>>> nuw nsw i64
>>>> >>>> [[INDVARS_IV]], 2
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp ult
>>>> i64 [[INDVARS_IV_NEXT]],
>>>> >>>> [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_COND_CLEANUP:%.*]],
>>>> [[LOOP15:!llvm.loop !.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_COND_CLEANUP]], [[LOOP15:!llvm.loop
>>>> !.*]]
>>>> >>>> ;
>>>> >>>> entry:
>>>> >>>> br label %for.body
>>>> >>>> @@ -973,7 +973,7 @@ define void
>>>> @PR27626_0(%pair.i32 *%p, i32 %z,
>>>> >>>> i64 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP19:%.*]] = icmp eq
>>>> i64 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP19]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP24:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 false, label
>>>> [[FOR_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>> >>>> @@ -985,7 +985,7 @@ define void
>>>> @PR27626_0(%pair.i32 *%p, i32 %z,
>>>> >>>> i64 %n) {
>>>> >>>> ; CHECK-NEXT: store i32 [[Z]], i32*
>>>> [[P_I_Y]], align 4
>>>> >>>> ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>> i64 [[I]], 1
>>>> >>>> ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>> i64 [[I_NEXT]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END:%.*]], [[LOOP25:!llvm.loop !.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END]], [[LOOP25:!llvm.loop !.*]]
>>>> >>>> ; CHECK: for.end:
>>>> >>>> ; CHECK-NEXT: ret void
>>>> >>>> ;
>>>> >>>> @@ -1066,7 +1066,7 @@ define i32
>>>> @PR27626_1(%pair.i32 *%p, i64 %n) {
>>>> >>>> ; CHECK-NEXT: [[RDX_SHUF3:%.*]] =
>>>> shufflevector <4 x i32>
>>>> >>>> [[BIN_RDX]], <4 x i32> poison, <4 x i32>
>>>> <i32 1, i32 undef, i32
>>>> >>>> undef, i32 undef>
>>>> >>>> ; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4
>>>> x i32> [[BIN_RDX]],
>>>> >>>> [[RDX_SHUF3]]
>>>> >>>> ; CHECK-NEXT: [[TMP19:%.*]] =
>>>> extractelement <4 x i32>
>>>> >>>> [[BIN_RDX4]], i32 0
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 false, label
>>>> [[FOR_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] =
>>>> phi i32 [ [[TMP19]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>> >>>> @@ -1081,9 +1081,10 @@ define i32
>>>> @PR27626_1(%pair.i32 *%p, i64 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP21]] = add nsw i32
>>>> [[TMP20]], [[S]]
>>>> >>>> ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>> i64 [[I]], 1
>>>> >>>> ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>> i64 [[I_NEXT]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END:%.*]], [[LOOP27:!llvm.loop !.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END]], [[LOOP27:!llvm.loop !.*]]
>>>> >>>> ; CHECK: for.end:
>>>> >>>> -; CHECK-NEXT: ret i32 [[TMP21]]
>>>> >>>> +; CHECK-NEXT: [[TMP22:%.*]] = phi i32 [
>>>> [[TMP21]], [[FOR_BODY]]
>>>> >>>> ], [ [[TMP19]], [[MIDDLE_BLOCK]] ]
>>>> >>>> +; CHECK-NEXT: ret i32 [[TMP22]]
>>>> >>>> ;
>>>> >>>> entry:
>>>> >>>> br label %for.body
>>>> >>>> @@ -1162,7 +1163,7 @@ define void
>>>> @PR27626_2(%pair.i32 *%p, i64 %n,
>>>> >>>> i32 %z) {
>>>> >>>> ; CHECK-NEXT: [[TMP20:%.*]] = icmp eq
>>>> i64 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP20]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP28:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 false, label
>>>> [[FOR_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_BODY:%.*]]
>>>> >>>> @@ -1176,7 +1177,7 @@ define void
>>>> @PR27626_2(%pair.i32 *%p, i64 %n,
>>>> >>>> i32 %z) {
>>>> >>>> ; CHECK-NEXT: store i32 [[TMP21]], i32*
>>>> [[P_I_Y]], align 4
>>>> >>>> ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>> i64 [[I]], 1
>>>> >>>> ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>> i64 [[I_NEXT]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END:%.*]], [[LOOP29:!llvm.loop !.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END]], [[LOOP29:!llvm.loop !.*]]
>>>> >>>> ; CHECK: for.end:
>>>> >>>> ; CHECK-NEXT: ret void
>>>> >>>> ;
>>>> >>>> @@ -1263,7 +1264,7 @@ define i32
>>>> @PR27626_3(%pair.i32 *%p, i64 %n,
>>>> >>>> i32 %z) {
>>>> >>>> ; CHECK-NEXT: [[RDX_SHUF3:%.*]] =
>>>> shufflevector <4 x i32>
>>>> >>>> [[BIN_RDX]], <4 x i32> poison, <4 x i32>
>>>> <i32 1, i32 undef, i32
>>>> >>>> undef, i32 undef>
>>>> >>>> ; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4
>>>> x i32> [[BIN_RDX]],
>>>> >>>> [[RDX_SHUF3]]
>>>> >>>> ; CHECK-NEXT: [[TMP22:%.*]] =
>>>> extractelement <4 x i32>
>>>> >>>> [[BIN_RDX4]], i32 0
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: br i1 false, label
>>>> [[FOR_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] =
>>>> phi i32 [ [[TMP22]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
>>>> >>>> @@ -1281,9 +1282,10 @@ define i32
>>>> @PR27626_3(%pair.i32 *%p, i64 %n,
>>>> >>>> i32 %z) {
>>>> >>>> ; CHECK-NEXT: [[TMP25]] = add nsw i32
>>>> [[TMP24]], [[S]]
>>>> >>>> ; CHECK-NEXT: [[I_NEXT]] = add nuw nsw
>>>> i64 [[I]], 1
>>>> >>>> ; CHECK-NEXT: [[COND:%.*]] = icmp slt
>>>> i64 [[I_NEXT]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END:%.*]], [[LOOP31:!llvm.loop !.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[COND]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[FOR_END]], [[LOOP31:!llvm.loop !.*]]
>>>> >>>> ; CHECK: for.end:
>>>> >>>> -; CHECK-NEXT: ret i32 [[TMP25]]
>>>> >>>> +; CHECK-NEXT: [[TMP26:%.*]] = phi i32 [
>>>> [[TMP25]], [[FOR_BODY]]
>>>> >>>> ], [ [[TMP22]], [[MIDDLE_BLOCK]] ]
>>>> >>>> +; CHECK-NEXT: ret i32 [[TMP26]]
>>>> >>>> ;
>>>> >>>> entry:
>>>> >>>> br label %for.body
>>>> >>>>
>>>> >>>> diff --git
>>>> a/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>> >>>>
>>>> b/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>> >>>> index f32002fae2b6..91780789088b 100644
>>>> >>>> ---
>>>> a/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>> >>>> +++
>>>> b/llvm/test/Transforms/LoopVectorize/loop-form.ll
>>>> >>>> @@ -146,14 +146,15 @@ define void
>>>> @early_exit(i16* %p, i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP10:%.*]] = icmp eq
>>>> i32 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP10]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP4:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>> [[TMP1]], [[N_VEC]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[IF_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i32 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>> >>>> ; CHECK: for.cond:
>>>> >>>> ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>> [[BC_RESUME_VAL]],
>>>> >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>> [[FOR_BODY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>> i32 [[I]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[IF_END:%.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label [[IF_END]]
>>>> >>>> ; CHECK: for.body:
>>>> >>>> ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>> [[I]] to i64
>>>> >>>> ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>> inbounds i16, i16*
>>>> >>>> [[P]], i64 [[IPROM]]
>>>> >>>> @@ -285,14 +286,15 @@ define void
>>>> @multiple_unique_exit(i16* %p,
>>>> >>>> i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq
>>>> i32 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP11]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP6:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>> [[TMP2]], [[N_VEC]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[IF_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i32 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>> >>>> ; CHECK: for.cond:
>>>> >>>> ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>> [[BC_RESUME_VAL]],
>>>> >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>> [[FOR_BODY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>> i32 [[I]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[IF_END:%.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label [[IF_END]]
>>>> >>>> ; CHECK: for.body:
>>>> >>>> ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>> [[I]] to i64
>>>> >>>> ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>> inbounds i16, i16*
>>>> >>>> [[P]], i64 [[IPROM]]
>>>> >>>> @@ -372,14 +374,17 @@ define i32
>>>> @multiple_unique_exit2(i16* %p,
>>>> >>>> i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq
>>>> i32 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP11]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP8:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>> [[TMP2]], [[N_VEC]]
>>>> >>>> +; CHECK-NEXT: [[IND_ESCAPE:%.*]] = sub
>>>> i32 [[N_VEC]], 1
>>>> >>>> +; CHECK-NEXT: [[IND_ESCAPE1:%.*]] = sub
>>>> i32 [[N_VEC]], 1
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[IF_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i32 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>> >>>> ; CHECK: for.cond:
>>>> >>>> ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>> [[BC_RESUME_VAL]],
>>>> >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>> [[FOR_BODY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>> i32 [[I]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[IF_END:%.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label [[IF_END]]
>>>> >>>> ; CHECK: for.body:
>>>> >>>> ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>> [[I]] to i64
>>>> >>>> ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>> inbounds i16, i16*
>>>> >>>> [[P]], i64 [[IPROM]]
>>>> >>>> @@ -388,7 +393,7 @@ define i32
>>>> @multiple_unique_exit2(i16* %p, i32
>>>> >>>> %n) {
>>>> >>>> ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>> i32 [[I]], 2096
>>>> >>>> ; CHECK-NEXT: br i1 [[CMP2]], label
>>>> [[FOR_COND]], label
>>>> >>>> [[IF_END]], [[LOOP9:!llvm.loop !.*]]
>>>> >>>> ; CHECK: if.end:
>>>> >>>> -; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [
>>>> [[I]], [[FOR_BODY]]
>>>> >>>> ], [ [[I]], [[FOR_COND]] ]
>>>> >>>> +; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [
>>>> [[I]], [[FOR_BODY]]
>>>> >>>> ], [ [[I]], [[FOR_COND]] ], [
>>>> [[IND_ESCAPE1]], [[MIDDLE_BLOCK]] ]
>>>> >>>> ; CHECK-NEXT: ret i32 [[I_LCSSA]]
>>>> >>>> ;
>>>> >>>> ; TAILFOLD-LABEL: @multiple_unique_exit2(
>>>> >>>> @@ -461,14 +466,15 @@ define i32
>>>> @multiple_unique_exit3(i16* %p,
>>>> >>>> i32 %n) {
>>>> >>>> ; CHECK-NEXT: [[TMP11:%.*]] = icmp eq
>>>> i32 [[INDEX_NEXT]],
>>>> >>>> [[N_VEC]]
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP11]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP10:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32
>>>> [[TMP2]], [[N_VEC]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[IF_END:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i32 [ [[N_VEC]],
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[FOR_COND:%.*]]
>>>> >>>> ; CHECK: for.cond:
>>>> >>>> ; CHECK-NEXT: [[I:%.*]] = phi i32 [
>>>> [[BC_RESUME_VAL]],
>>>> >>>> [[SCALAR_PH]] ], [ [[INC:%.*]],
>>>> [[FOR_BODY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[CMP:%.*]] = icmp slt
>>>> i32 [[I]], [[N]]
>>>> >>>> -; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label
>>>> >>>> [[IF_END:%.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP]], label
>>>> [[FOR_BODY]], label [[IF_END]]
>>>> >>>> ; CHECK: for.body:
>>>> >>>> ; CHECK-NEXT: [[IPROM:%.*]] = sext i32
>>>> [[I]] to i64
>>>> >>>> ; CHECK-NEXT: [[B:%.*]] = getelementptr
>>>> inbounds i16, i16*
>>>> >>>> [[P]], i64 [[IPROM]]
>>>> >>>> @@ -477,7 +483,7 @@ define i32
>>>> @multiple_unique_exit3(i16* %p, i32
>>>> >>>> %n) {
>>>> >>>> ; CHECK-NEXT: [[CMP2:%.*]] = icmp slt
>>>> i32 [[I]], 2096
>>>> >>>> ; CHECK-NEXT: br i1 [[CMP2]], label
>>>> [[FOR_COND]], label
>>>> >>>> [[IF_END]], [[LOOP11:!llvm.loop !.*]]
>>>> >>>> ; CHECK: if.end:
>>>> >>>> -; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0,
>>>> [[FOR_COND]] ], [ 1,
>>>> >>>> [[FOR_BODY]] ]
>>>> >>>> +; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0,
>>>> [[FOR_COND]] ], [ 1,
>>>> >>>> [[FOR_BODY]] ], [ 0, [[MIDDLE_BLOCK]] ]
>>>> >>>> ; CHECK-NEXT: ret i32 [[EXIT]]
>>>> >>>> ;
>>>> >>>> ; TAILFOLD-LABEL: @multiple_unique_exit3(
>>>> >>>> @@ -994,7 +1000,8 @@ define void
>>>> @scalar_predication(float* %addr) {
>>>> >>>> ; CHECK-NEXT: [[TMP10:%.*]] = icmp eq
>>>> i64 [[INDEX_NEXT]], 200
>>>> >>>> ; CHECK-NEXT: br i1 [[TMP10]], label
>>>> [[MIDDLE_BLOCK:%.*]],
>>>> >>>> label [[VECTOR_BODY]], [[LOOP12:!llvm.loop
>>>> !.*]]
>>>> >>>> ; CHECK: middle.block:
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64
>>>> 201, 200
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[EXIT:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ 200,
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
>>>> >>>> @@ -1002,7 +1009,7 @@ define void
>>>> @scalar_predication(float* %addr) {
>>>> >>>> ; CHECK-NEXT: [[IV:%.*]] = phi i64 [
>>>> [[BC_RESUME_VAL]],
>>>> >>>> [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]],
>>>> [[LOOP_LATCH:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[GEP:%.*]] =
>>>> getelementptr float, float*
>>>> >>>> [[ADDR]], i64 [[IV]]
>>>> >>>> ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] =
>>>> icmp eq i64 [[IV]], 200
>>>> >>>> -; CHECK-NEXT: br i1 [[EXITCOND_NOT]],
>>>> label [[EXIT:%.*]], label
>>>> >>>> [[LOOP_BODY:%.*]]
>>>> >>>> +; CHECK-NEXT: br i1 [[EXITCOND_NOT]],
>>>> label [[EXIT]], label
>>>> >>>> [[LOOP_BODY:%.*]]
>>>> >>>> ; CHECK: loop.body:
>>>> >>>> ; CHECK-NEXT: [[TMP11:%.*]] = load
>>>> float, float* [[GEP]],
>>>> >>>> align 4
>>>> >>>> ; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq
>>>> float [[TMP11]],
>>>> >>>> 0.000000e+00
>>>> >>>> @@ -1088,7 +1095,8 @@ define i32
>>>> @me_reduction(i32* %addr) {
>>>> >>>> ; CHECK-NEXT: [[RDX_SHUF:%.*]] =
>>>> shufflevector <2 x i32>
>>>> >>>> [[TMP5]], <2 x i32> poison, <2 x i32> <i32
>>>> 1, i32 undef>
>>>> >>>> ; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2
>>>> x i32> [[TMP5]],
>>>> >>>> [[RDX_SHUF]]
>>>> >>>> ; CHECK-NEXT: [[TMP7:%.*]] =
>>>> extractelement <2 x i32>
>>>> >>>> [[BIN_RDX]], i32 0
>>>> >>>> -; CHECK-NEXT: br label [[SCALAR_PH]]
>>>> >>>> +; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64
>>>> 201, 200
>>>> >>>> +; CHECK-NEXT: br i1 [[CMP_N]], label
>>>> [[EXIT:%.*]], label
>>>> >>>> [[SCALAR_PH]]
>>>> >>>> ; CHECK: scalar.ph <http://scalar.ph>:
>>>> >>>> ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] =
>>>> phi i64 [ 200,
>>>> >>>> [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
>>>> >>>> ; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] =
>>>> phi i32 [ 0, [[ENTRY]]
>>>> >>>> ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
>>>> >>>> @@ -1098,7 +1106,7 @@ define i32
>>>> @me_reduction(i32* %addr) {
>>>> >>>> ; CHECK-NEXT: [[ACCUM:%.*]] = phi i32 [
>>>> [[BC_MERGE_RDX]],
>>>> >>>> [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.*]],
>>>> [[LOOP_LATCH]] ]
>>>> >>>> ; CHECK-NEXT: [[GEP:%.*]] =
>>>> getelementptr i32, i32* [[ADDR]],
>>>> >>>> i64 [[IV]]
>>>> >>>> ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] =
>>>> icmp eq i64 [[IV]], 200
>>>> >>>> -; CHECK-NEXT: br i1 [[EXITCOND_NOT]],
>>>> label [[EXIT:%.*]], label
>>>> >>>> [[LOOP_LATCH]]
>>>> >>>> +; CHECK-NEXT: br i1 [[EXITCOND_NOT]],
>>>> label [[EXIT]], label
>>>> >>>> [[LOOP_LATCH]]
>>>> >>>> ; CHECK: loop.latch:
>>>> >>>> ; CHECK-NEXT: [[TMP8:%.*]] = load i32,
>>>> i32* [[GEP]], align 4
>>>> >>>> ; CHECK-NEXT: [[ACCUM_NEXT]] = add i32
>>>> [[ACCUM]], [[TMP8]]
>>>> >>>> @@ -1106,7 +1114,7 @@ define i32
>>>> @me_reduction(i32* %addr) {
>>>> >>>> ; CHECK-NEXT: [[EXITCOND2_NOT:%.*]] =
>>>> icmp eq i64 [[IV]], 400
>>>> >>>> ; CHECK-NEXT: br i1 [[EXITCOND2_NOT]],
>>>> label [[EXIT]], label
>>>> >>>> [[LOOP_HEADER]], [[LOOP15:!llvm.loop !.*]]
>>>> >>>> ; CHECK: exit:
>>>> >>>> -; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [
>>>> 0, [[LOOP_HEADER]] ], [
>>>> >>>> [[ACCUM_NEXT]], [[LOOP_LATCH]] ]
>>>> >>>> +; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [
>>>> 0, [[LOOP_HEADER]] ], [
>>>> >>>> [[ACCUM_NEXT]], [[LOOP_LATCH]] ], [
>>>> [[TMP7]], [[MIDDLE_BLOCK]] ]
>>>> >>>> ; CHECK-NEXT: ret i32 [[LCSSA]]
>>>> >>>> ;
>>>> >>>> ; TAILFOLD-LABEL: @me_reduction(
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> _______________________________________________
>>>> >>>> llvm-commits mailing list
>>>> >>>> llvm-commits at lists.llvm.org
>>>> <mailto:llvm-commits at lists.llvm.org>
>>>> >>>>
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>> >>> _______________________________________________
>>>> >>> llvm-commits mailing list
>>>> >>> llvm-commits at lists.llvm.org
>>>> <mailto:llvm-commits at lists.llvm.org>
>>>> >>>
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>> > _______________________________________________
>>>> > llvm-commits mailing list
>>>> > llvm-commits at lists.llvm.org
>>>> <mailto:llvm-commits at lists.llvm.org>
>>>> >
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210527/9cae13a9/attachment-0001.html>
More information about the llvm-commits
mailing list