[llvm] r265388 - Adds the ability to use an epilog remainder loop during loop unrolling and makes

Mon Jul 25 17:15:10 PDT 2016

> On Jul 25, 2016, at 3:53 PM, Michael Zolotukhin via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> 
>> On Jul 24, 2016, at 3:10 AM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>> 
>> Hi Michael,
> Hi Evgeny,
>> 
>> Ok. I'm on vacation right now. With limit access to testing. I'll back
>> in the middle of August, run the testing and revert to prologue.
>> Should we get one more vote to revert this to prologue or I can do
>> this right when I'm back from from vacation?
> I can switch it to the previous value (prologue) for you, and then you fix the test/LSR and switch it back (to epilogue). Does it sound good?
+1
I think this is the right approach dealing with big regressions.

>> 
>> Could you please point me to someone who is responsible for the
>> performance tests you are running? I'll address the issue.
>> Sorry for staying so long without addressing the issue. I've missed
>> the importance of the test and its performance.
> It’s not about that particular test, it’s about 90% regression.
> 
>> The regression is not compiler dependent its a "bad luck" caused by
>> test structure. I can prove this to the person who is responsible for
>> the test.
> If the test is flaky, then I’m completely fine with fixing the test. But otherwise the “bad luck” argument doesn’t convince me - what if a user found a similar regression from a previous compiler version on his code?
> 
>> My LSR patch is fixing only static addresses in the loop. The
>> regression stays unchanged. The root cause of the regression related
>> to code alignment.
> 
> Thanks,
> Michael
> 
> PS: Sorry to disrupt your vacation with this!
>> 
>> Thanks,
>> Evgeny
>> 
>> 
>> 
>> On Thu, Jul 21, 2016 at 11:57 AM, Mikhail Zolotukhin
>> <mzolotukhin at apple.com> wrote:
>>> 
>>>> On Jul 21, 2016, at 11:36 AM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>> 
>>>> Hi Michael,
>>> Hi Evgeny,
>>>> 
>>>> Turning back to default will decrease LSR positive effect.
>>> When the LSR changes are committed, we certainly can switch it back.
>>>> As for the test itself. It does not represent a real case.
>>> I won't argue that, though the "real case" is a very vague term. What matters is that this test is currently in the testsuite.
>>>> Global arrays with big amount of data are not ok.
>>>> If we switch to malloc instead, the regression disappears.
>>> The problem here is that we have a big regression with this change, and it's not addressed so far. It might be a test issue or an LSR issue, but whatever it is, we need to address it, and if it takes too long, turn it back off until it's fixed.
>>> 
>>> Thanks,
>>> Michael
>>>> I'd better modify the test or exclude it.
>>>> 
>>>> Thanks,
>>>> Evgeny
>>>> 
>>>> On Wed, Jul 20, 2016 at 1:06 PM, Michael Zolotukhin
>>>> <mzolotukhin at apple.com> wrote:
>>>>> Hi Evgeny,
>>>>> 
>>>>> Thanks for the detailed update! After a second thought - can we please switch the default to the previous value while you’re working on the patch? Generally, it’s bad to have such a big regression even though your original patch just exposed an issue in another place.
>>>>> 
>>>>> Thanks,
>>>>> Michael
>>>>>> On Jul 11, 2016, at 9:08 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>>>> 
>>>>>> Hi Michael,
>>>>>> 
>>>>>> The patch is under internal review. It changes a lot. I plan to send
>>>>>> it here this month or in August.
>>>>>> The patch fixes addressing, but not the regression.
>>>>>> We looked into the regression itself on simulator. It appears that the
>>>>>> big issue is branch misprediction.
>>>>>> 
>>>>>> Regarding addresses.
>>>>>> When we have prolog (with unroll on 2):
>>>>>> 
>>>>>> for_body
>>>>>> i = phi [i.next, i0]; //i0 is 0 or 1 depending on the prolog execution
>>>>>> x = *(array + i);
>>>>>> ...
>>>>>> cmp i, N
>>>>>> br for_body
>>>>>> 
>>>>>> LSR is trying to simplify ind vars to one.
>>>>>> lsr.i = phi [lsr.i.next, 0];
>>>>>> or
>>>>>> lsr.i = phi [lsr.i.next, -N];
>>>>>> That way array access will have base: “array + i0” or “array + i0 - N”
>>>>>> which is not static even if array is static.
>>>>>> 
>>>>>> x = *((array + i0) + lsr.i);
>>>>>> 
>>>>>> So it will require a register inside loop. Right now number of
>>>>>> registers has the highest priority in LSR when it chooses a solution.
>>>>>> 
>>>>>> When we have epilog, the same access is without “i0”
>>>>>> 
>>>>>> x = *(array + lsr.i);
>>>>>> 
>>>>>> If array is static we do not need an additional register.
>>>>>> In the test where regression occurred the array is static and N is constant.
>>>>>> 
>>>>>> However if we have enough registers we can use register instead of
>>>>>> constant address. The patch should address this.
>>>>>> 
>>>>>> Thanks,
>>>>>> Evgeny
>>>>>> 
>>>>>> On Mon, Jul 11, 2016 at 6:04 PM, Michael Zolotukhin
>>>>>> <mzolotukhin at apple.com> wrote:
>>>>>>> 
>>>>>>>> On Apr 22, 2016, at 11:35 AM, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Apr 20, 2016, at 4:43 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Yes if the test is that important. However I believe that the
>>>>>>>>> regression caused randomly, but not by epilog unrolling.
>>>>>>>>> On my Corei7 "I was able to reproduce it only in 32 bit mode, 64 bit
>>>>>>>>> mode epilog is ~30% faster.".
>>>>>>>>> The analysis showed no critical changes in the hottest loop.
>>>>>>>>> 
>>>>>>>>> As for LSR issue.
>>>>>>>>> Sorry for missing your request for posting PR. I'm preparing LSR patch
>>>>>>>>> fixing the issue. And issue itself is related to already filed PR23384
>>>>>>>>> (on an inefficient x86 LSR transformation).
>>>>>>> Hi Evgeny,
>>>>>>> 
>>>>>>> Is there any progress with the LSR patch? Or is there something blocking you?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Michael
>>>>>>>> Hi Evgeny,
>>>>>>>> 
>>>>>>>> It’s fine with me to keep it on, I just want to make sure we address the regression. So, if you’re working on the patch, there are no concerns from my side.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Michael
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Evgeny
>>>>>>>>> 
>>>>>>>>> On Wed, Apr 20, 2016 at 11:41 AM, Michael Zolotukhin
>>>>>>>>> <mzolotukhin at apple.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Could we change the default value to the original one until this 90%
>>>>>>>>>> regression is addressed?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Michael
>>>>>>>>>> 
>>>>>>>>>> On Apr 15, 2016, at 12:37 PM, Mikhail Zolotukhin <mzolotukhin at apple.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Apr 8, 2016, at 5:39 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Investigation showed that the regression is most likely LSR related,
>>>>>>>>>> but not loop unroll.
>>>>>>>>>> I was able to reproduce it only in 32 bit mode, 64 bit mode epilog is
>>>>>>>>>> ~30% faster.
>>>>>>>>>> 
>>>>>>>>>> Going back to the changes in the test. Epilog variant has 1 move more
>>>>>>>>>> in the hottest loop (which is redundant and should be deleted) and use
>>>>>>>>>> constant address in moves inside the loop. However manual deleting of
>>>>>>>>>> the move does not give any significant performance gain.
>>>>>>>>>> The different behavior of LSR comes from additional inductive variable
>>>>>>>>>> in unrolled loop. For Epilog case we add "k" (supposed to be
>>>>>>>>>> eliminated after LSR):
>>>>>>>>>> 
>>>>>>>>>> k = n - n % (unroll factor));
>>>>>>>>>> ...
>>>>>>>>>> access a[i];
>>>>>>>>>> i++;
>>>>>>>>>> k--;
>>>>>>>>>> if (!k) break;
>>>>>>>>>> 
>>>>>>>>>> Moving code of the loop modifying "palign" before loop showed up to 2
>>>>>>>>>> times perf difference. So most likely epilog unroll itself does not
>>>>>>>>>> influence on performance here.
>>>>>>>>>> However, the fact that LSR leave constant address inside the loop
>>>>>>>>>> moves (making them much longer), should be addressed to LSR. I'll
>>>>>>>>>> submit a bug report on this.
>>>>>>>>>> 
>>>>>>>>>> Hi Evgeny,
>>>>>>>>>> 
>>>>>>>>>> Could you please post the PR number here as well?
>>>>>>>>>> 
>>>>>>>>>> Michael
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Apr 8, 2016 at 2:09 PM, Evgeny Stupachenko <evstupac at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Michael,
>>>>>>>>>> 
>>>>>>>>>> Yes I do have the data.
>>>>>>>>>> There were ~10% improvements on some EEMBC tests; spec2000 performance
>>>>>>>>>> was almost flat (within 2%).
>>>>>>>>>> 
>>>>>>>>>> Let me look into BubbleSort case and come back with analysis and
>>>>>>>>>> possible improvement.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Evgeny
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Apr 8, 2016 at 1:23 PM, Michael Zolotukhin via llvm-commits
>>>>>>>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Evgeny,
>>>>>>>>>> 
>>>>>>>>>> We’ve found several performance regressions on LLVM testsuite caused by this
>>>>>>>>>> patch. Do you have a data from your performance experiments to back-up the
>>>>>>>>>> decision to make epilogues the default strategy?
>>>>>>>>>> 
>>>>>>>>>> One of the biggest regressions we see is 89% on
>>>>>>>>>> SingleSource/Benchmarks/Stanford/Bubblesort (on x86), while the biggest gain
>>>>>>>>>> is only 30%.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Michael
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Apr 5, 2016, at 5:19 AM, David L Kreitzer via llvm-commits
>>>>>>>>>> <llvm-commits at lists.llvm.org> wrote:
>>>>>>>>>> 
>>>>>>>>>> Author: dlkreitz
>>>>>>>>>> Date: Tue Apr  5 07:19:35 2016
>>>>>>>>>> New Revision: 265388
>>>>>>>>>> 
>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=265388&view=rev
>>>>>>>>>> Log:
>>>>>>>>>> Adds the ability to use an epilog remainder loop during loop unrolling and
>>>>>>>>>> makes
>>>>>>>>>> this the default behavior.
>>>>>>>>>> 
>>>>>>>>>> Patch by Evgeny Stupachenko (evstupac at gmail.com).
>>>>>>>>>> 
>>>>>>>>>> Differential Revision: http://reviews.llvm.org/D18158
>>>>>>>>>> 
>>>>>>>>>> Modified:
>>>>>>>>>> llvm/trunk/include/llvm/Transforms/Utils/UnrollLoop.h
>>>>>>>>>> llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp
>>>>>>>>>> llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/X86/mmx.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/runtime-loop2.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/runtime-loop4.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/runtime-loop5.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/unroll-cleanup.ll
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/unroll-pragmas.ll
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/include/llvm/Transforms/Utils/UnrollLoop.h
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Utils/UnrollLoop.h?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/include/llvm/Transforms/Utils/UnrollLoop.h (original)
>>>>>>>>>> +++ llvm/trunk/include/llvm/Transforms/Utils/UnrollLoop.h Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -34,10 +34,11 @@ bool UnrollLoop(Loop *L, unsigned Count,
>>>>>>>>>>         LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
>>>>>>>>>>         AssumptionCache *AC, bool PreserveLCSSA);
>>>>>>>>>> 
>>>>>>>>>> -bool UnrollRuntimeLoopProlog(Loop *L, unsigned Count,
>>>>>>>>>> -                             bool AllowExpensiveTripCount, LoopInfo *LI,
>>>>>>>>>> -                             ScalarEvolution *SE, DominatorTree *DT,
>>>>>>>>>> -                             bool PreserveLCSSA);
>>>>>>>>>> +bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
>>>>>>>>>> +                                bool AllowExpensiveTripCount,
>>>>>>>>>> +                                bool UseEpilogRemainder, LoopInfo *LI,
>>>>>>>>>> +                                ScalarEvolution *SE, DominatorTree *DT,
>>>>>>>>>> +                                bool PreserveLCSSA);
>>>>>>>>>> 
>>>>>>>>>> MDNode *GetUnrollMetadata(MDNode *LoopID, StringRef Name);
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp (original)
>>>>>>>>>> +++ llvm/trunk/lib/Transforms/Utils/LoopUnroll.cpp Tue Apr  5 07:19:35 2016
>>>>>>>>>> @@ -44,6 +44,11 @@ using namespace llvm;
>>>>>>>>>> STATISTIC(NumCompletelyUnrolled, "Number of loops completely unrolled");
>>>>>>>>>> STATISTIC(NumUnrolled, "Number of loops unrolled (completely or
>>>>>>>>>> otherwise)");
>>>>>>>>>> 
>>>>>>>>>> +static cl::opt<bool>
>>>>>>>>>> +UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(true), cl::Hidden,
>>>>>>>>>> +                    cl::desc("Allow runtime unrolled loops to be unrolled "
>>>>>>>>>> +                             "with epilog instead of prolog."));
>>>>>>>>>> +
>>>>>>>>>> /// Convert the instruction operands from referencing the current values
>>>>>>>>>> into
>>>>>>>>>> /// those specified by VMap.
>>>>>>>>>> static inline void remapInstruction(Instruction *I,
>>>>>>>>>> @@ -288,12 +293,13 @@ bool llvm::UnrollLoop(Loop *L, unsigned
>>>>>>>>>>        "convergent "
>>>>>>>>>>        "operation.");
>>>>>>>>>> });
>>>>>>>>>> -  // Don't output the runtime loop prolog if Count is a multiple of
>>>>>>>>>> -  // TripMultiple.  Such a prolog is never needed, and is unsafe if the
>>>>>>>>>> loop
>>>>>>>>>> +  // Don't output the runtime loop remainder if Count is a multiple of
>>>>>>>>>> +  // TripMultiple.  Such a remainder is never needed, and is unsafe if the
>>>>>>>>>> loop
>>>>>>>>>> // contains a convergent instruction.
>>>>>>>>>> if (RuntimeTripCount && TripMultiple % Count != 0 &&
>>>>>>>>>> -      !UnrollRuntimeLoopProlog(L, Count, AllowExpensiveTripCount, LI, SE,
>>>>>>>>>> DT,
>>>>>>>>>> -                               PreserveLCSSA))
>>>>>>>>>> +      !UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,
>>>>>>>>>> +                                  UnrollRuntimeEpilog, LI, SE, DT,
>>>>>>>>>> +                                  PreserveLCSSA))
>>>>>>>>>> return false;
>>>>>>>>>> 
>>>>>>>>>> // Notify ScalarEvolution that the loop will be substantially changed,
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp (original)
>>>>>>>>>> +++ llvm/trunk/lib/Transforms/Utils/LoopUnrollRuntime.cpp Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -16,8 +16,8 @@
>>>>>>>>>> // case, we need to generate code to execute these 'left over' iterations.
>>>>>>>>>> //
>>>>>>>>>> // The current strategy generates an if-then-else sequence prior to the
>>>>>>>>>> -// unrolled loop to execute the 'left over' iterations.  Other strategies
>>>>>>>>>> -// include generate a loop before or after the unrolled loop.
>>>>>>>>>> +// unrolled loop to execute the 'left over' iterations before or after the
>>>>>>>>>> +// unrolled loop.
>>>>>>>>>> //
>>>>>>>>>> //===----------------------------------------------------------------------===//
>>>>>>>>>> 
>>>>>>>>>> @@ -60,33 +60,35 @@ STATISTIC(NumRuntimeUnrolled,
>>>>>>>>>> ///   than the unroll factor.
>>>>>>>>>> ///
>>>>>>>>>> static void ConnectProlog(Loop *L, Value *BECount, unsigned Count,
>>>>>>>>>> -                          BasicBlock *LastPrologBB, BasicBlock *PrologEnd,
>>>>>>>>>> -                          BasicBlock *OrigPH, BasicBlock *NewPH,
>>>>>>>>>> -                          ValueToValueMapTy &VMap, DominatorTree *DT,
>>>>>>>>>> -                          LoopInfo *LI, bool PreserveLCSSA) {
>>>>>>>>>> +                          BasicBlock *PrologExit, BasicBlock *PreHeader,
>>>>>>>>>> +                          BasicBlock *NewPreHeader, ValueToValueMapTy
>>>>>>>>>> &VMap,
>>>>>>>>>> +                          DominatorTree *DT, LoopInfo *LI, bool
>>>>>>>>>> PreserveLCSSA) {
>>>>>>>>>> BasicBlock *Latch = L->getLoopLatch();
>>>>>>>>>> assert(Latch && "Loop must have a latch");
>>>>>>>>>> +  BasicBlock *PrologLatch = cast<BasicBlock>(VMap[Latch]);
>>>>>>>>>> 
>>>>>>>>>> // Create a PHI node for each outgoing value from the original loop
>>>>>>>>>> // (which means it is an outgoing value from the prolog code too).
>>>>>>>>>> // The new PHI node is inserted in the prolog end basic block.
>>>>>>>>>> -  // The new PHI name is added as an operand of a PHI node in either
>>>>>>>>>> +  // The new PHI node value is added as an operand of a PHI node in either
>>>>>>>>>> // the loop header or the loop exit block.
>>>>>>>>>> -  for (succ_iterator SBI = succ_begin(Latch), SBE = succ_end(Latch);
>>>>>>>>>> -       SBI != SBE; ++SBI) {
>>>>>>>>>> -    for (BasicBlock::iterator BBI = (*SBI)->begin();
>>>>>>>>>> -         PHINode *PN = dyn_cast<PHINode>(BBI); ++BBI) {
>>>>>>>>>> -
>>>>>>>>>> +  for (BasicBlock *Succ : successors(Latch)) {
>>>>>>>>>> +    for (Instruction &BBI : *Succ) {
>>>>>>>>>> +      PHINode *PN = dyn_cast<PHINode>(&BBI);
>>>>>>>>>> +      // Exit when we passed all PHI nodes.
>>>>>>>>>> +      if (!PN)
>>>>>>>>>> +        break;
>>>>>>>>>> // Add a new PHI node to the prolog end block and add the
>>>>>>>>>> // appropriate incoming values.
>>>>>>>>>> -      PHINode *NewPN = PHINode::Create(PN->getType(), 2,
>>>>>>>>>> PN->getName()+".unr",
>>>>>>>>>> -                                       PrologEnd->getTerminator());
>>>>>>>>>> +      PHINode *NewPN = PHINode::Create(PN->getType(), 2, PN->getName() +
>>>>>>>>>> ".unr",
>>>>>>>>>> +                                       PrologExit->getFirstNonPHI());
>>>>>>>>>> // Adding a value to the new PHI node from the original loop preheader.
>>>>>>>>>> // This is the value that skips all the prolog code.
>>>>>>>>>> if (L->contains(PN)) {
>>>>>>>>>> -        NewPN->addIncoming(PN->getIncomingValueForBlock(NewPH), OrigPH);
>>>>>>>>>> +        NewPN->addIncoming(PN->getIncomingValueForBlock(NewPreHeader),
>>>>>>>>>> +                           PreHeader);
>>>>>>>>>> } else {
>>>>>>>>>> -        NewPN->addIncoming(UndefValue::get(PN->getType()), OrigPH);
>>>>>>>>>> +        NewPN->addIncoming(UndefValue::get(PN->getType()), PreHeader);
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> Value *V = PN->getIncomingValueForBlock(Latch);
>>>>>>>>>> @@ -97,22 +99,22 @@ static void ConnectProlog(Loop *L, Value
>>>>>>>>>> }
>>>>>>>>>> // Adding a value to the new PHI node from the last prolog block
>>>>>>>>>> // that was created.
>>>>>>>>>> -      NewPN->addIncoming(V, LastPrologBB);
>>>>>>>>>> +      NewPN->addIncoming(V, PrologLatch);
>>>>>>>>>> 
>>>>>>>>>> // Update the existing PHI node operand with the value from the
>>>>>>>>>> // new PHI node.  How this is done depends on if the existing
>>>>>>>>>> // PHI node is in the original loop block, or the exit block.
>>>>>>>>>> if (L->contains(PN)) {
>>>>>>>>>> -        PN->setIncomingValue(PN->getBasicBlockIndex(NewPH), NewPN);
>>>>>>>>>> +        PN->setIncomingValue(PN->getBasicBlockIndex(NewPreHeader), NewPN);
>>>>>>>>>> } else {
>>>>>>>>>> -        PN->addIncoming(NewPN, PrologEnd);
>>>>>>>>>> +        PN->addIncoming(NewPN, PrologExit);
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> // Create a branch around the original loop, which is taken if there are no
>>>>>>>>>> // iterations remaining to be executed after running the prologue.
>>>>>>>>>> -  Instruction *InsertPt = PrologEnd->getTerminator();
>>>>>>>>>> +  Instruction *InsertPt = PrologExit->getTerminator();
>>>>>>>>>> IRBuilder<> B(InsertPt);
>>>>>>>>>> 
>>>>>>>>>> assert(Count != 0 && "nonsensical Count!");
>>>>>>>>>> @@ -126,25 +128,152 @@ static void ConnectProlog(Loop *L, Value
>>>>>>>>>> BasicBlock *Exit = L->getUniqueExitBlock();
>>>>>>>>>> assert(Exit && "Loop must have a single exit block only");
>>>>>>>>>> // Split the exit to maintain loop canonicalization guarantees
>>>>>>>>>> -  SmallVector<BasicBlock*, 4> Preds(pred_begin(Exit), pred_end(Exit));
>>>>>>>>>> +  SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
>>>>>>>>>> SplitBlockPredecessors(Exit, Preds, ".unr-lcssa", DT, LI,
>>>>>>>>>>                  PreserveLCSSA);
>>>>>>>>>> // Add the branch to the exit block (around the unrolled loop)
>>>>>>>>>> -  B.CreateCondBr(BrLoopExit, Exit, NewPH);
>>>>>>>>>> +  B.CreateCondBr(BrLoopExit, Exit, NewPreHeader);
>>>>>>>>>> +  InsertPt->eraseFromParent();
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +/// Connect the unrolling epilog code to the original loop.
>>>>>>>>>> +/// The unrolling epilog code contains code to execute the
>>>>>>>>>> +/// 'extra' iterations if the run-time trip count modulo the
>>>>>>>>>> +/// unroll count is non-zero.
>>>>>>>>>> +///
>>>>>>>>>> +/// This function performs the following:
>>>>>>>>>> +/// - Update PHI nodes at the unrolling loop exit and epilog loop exit
>>>>>>>>>> +/// - Create PHI nodes at the unrolling loop exit to combine
>>>>>>>>>> +///   values that exit the unrolling loop code and jump around it.
>>>>>>>>>> +/// - Update PHI operands in the epilog loop by the new PHI nodes
>>>>>>>>>> +/// - Branch around the epilog loop if extra iters (ModVal) is zero.
>>>>>>>>>> +///
>>>>>>>>>> +static void ConnectEpilog(Loop *L, Value *ModVal, BasicBlock *NewExit,
>>>>>>>>>> +                          BasicBlock *Exit, BasicBlock *PreHeader,
>>>>>>>>>> +                          BasicBlock *EpilogPreHeader, BasicBlock
>>>>>>>>>> *NewPreHeader,
>>>>>>>>>> +                          ValueToValueMapTy &VMap, DominatorTree *DT,
>>>>>>>>>> +                          LoopInfo *LI, bool PreserveLCSSA)  {
>>>>>>>>>> +  BasicBlock *Latch = L->getLoopLatch();
>>>>>>>>>> +  assert(Latch && "Loop must have a latch");
>>>>>>>>>> +  BasicBlock *EpilogLatch = cast<BasicBlock>(VMap[Latch]);
>>>>>>>>>> +
>>>>>>>>>> +  // Loop structure should be the following:
>>>>>>>>>> +  //
>>>>>>>>>> +  // PreHeader
>>>>>>>>>> +  // NewPreHeader
>>>>>>>>>> +  //   Header
>>>>>>>>>> +  //   ...
>>>>>>>>>> +  //   Latch
>>>>>>>>>> +  // NewExit (PN)
>>>>>>>>>> +  // EpilogPreHeader
>>>>>>>>>> +  //   EpilogHeader
>>>>>>>>>> +  //   ...
>>>>>>>>>> +  //   EpilogLatch
>>>>>>>>>> +  // Exit (EpilogPN)
>>>>>>>>>> +
>>>>>>>>>> +  // Update PHI nodes at NewExit and Exit.
>>>>>>>>>> +  for (Instruction &BBI : *NewExit) {
>>>>>>>>>> +    PHINode *PN = dyn_cast<PHINode>(&BBI);
>>>>>>>>>> +    // Exit when we passed all PHI nodes.
>>>>>>>>>> +    if (!PN)
>>>>>>>>>> +      break;
>>>>>>>>>> +    // PN should be used in another PHI located in Exit block as
>>>>>>>>>> +    // Exit was split by SplitBlockPredecessors into Exit and NewExit
>>>>>>>>>> +    // Basicaly it should look like:
>>>>>>>>>> +    // NewExit:
>>>>>>>>>> +    //   PN = PHI [I, Latch]
>>>>>>>>>> +    // ...
>>>>>>>>>> +    // Exit:
>>>>>>>>>> +    //   EpilogPN = PHI [PN, EpilogPreHeader]
>>>>>>>>>> +    //
>>>>>>>>>> +    // There is EpilogPreHeader incoming block instead of NewExit as
>>>>>>>>>> +    // NewExit was spilt 1 more time to get EpilogPreHeader.
>>>>>>>>>> +    assert(PN->hasOneUse() && "The phi should have 1 use");
>>>>>>>>>> +    PHINode *EpilogPN = cast<PHINode> (PN->use_begin()->getUser());
>>>>>>>>>> +    assert(EpilogPN->getParent() == Exit && "EpilogPN should be in Exit
>>>>>>>>>> block");
>>>>>>>>>> +
>>>>>>>>>> +    // Add incoming PreHeader from branch around the Loop
>>>>>>>>>> +    PN->addIncoming(UndefValue::get(PN->getType()), PreHeader);
>>>>>>>>>> +
>>>>>>>>>> +    Value *V = PN->getIncomingValueForBlock(Latch);
>>>>>>>>>> +    Instruction *I = dyn_cast<Instruction>(V);
>>>>>>>>>> +    if (I && L->contains(I))
>>>>>>>>>> +      // If value comes from an instruction in the loop add VMap value.
>>>>>>>>>> +      V = VMap[I];
>>>>>>>>>> +    // For the instruction out of the loop, constant or undefined value
>>>>>>>>>> +    // insert value itself.
>>>>>>>>>> +    EpilogPN->addIncoming(V, EpilogLatch);
>>>>>>>>>> +
>>>>>>>>>> +    assert(EpilogPN->getBasicBlockIndex(EpilogPreHeader) >= 0 &&
>>>>>>>>>> +          "EpilogPN should have EpilogPreHeader incoming block");
>>>>>>>>>> +    // Change EpilogPreHeader incoming block to NewExit.
>>>>>>>>>> +
>>>>>>>>>> EpilogPN->setIncomingBlock(EpilogPN->getBasicBlockIndex(EpilogPreHeader),
>>>>>>>>>> +                               NewExit);
>>>>>>>>>> +    // Now PHIs should look like:
>>>>>>>>>> +    // NewExit:
>>>>>>>>>> +    //   PN = PHI [I, Latch], [undef, PreHeader]
>>>>>>>>>> +    // ...
>>>>>>>>>> +    // Exit:
>>>>>>>>>> +    //   EpilogPN = PHI [PN, NewExit], [VMap[I], EpilogLatch]
>>>>>>>>>> +  }
>>>>>>>>>> +
>>>>>>>>>> +  // Create PHI nodes at NewExit (from the unrolling loop Latch and
>>>>>>>>>> PreHeader).
>>>>>>>>>> +  // Update corresponding PHI nodes in epilog loop.
>>>>>>>>>> +  for (BasicBlock *Succ : successors(Latch)) {
>>>>>>>>>> +    // Skip this as we already updated phis in exit blocks.
>>>>>>>>>> +    if (!L->contains(Succ))
>>>>>>>>>> +      continue;
>>>>>>>>>> +    for (Instruction &BBI : *Succ) {
>>>>>>>>>> +      PHINode *PN = dyn_cast<PHINode>(&BBI);
>>>>>>>>>> +      // Exit when we passed all PHI nodes.
>>>>>>>>>> +      if (!PN)
>>>>>>>>>> +        break;
>>>>>>>>>> +      // Add new PHI nodes to the loop exit block and update epilog
>>>>>>>>>> +      // PHIs with the new PHI values.
>>>>>>>>>> +      PHINode *NewPN = PHINode::Create(PN->getType(), 2, PN->getName() +
>>>>>>>>>> ".unr",
>>>>>>>>>> +                                       NewExit->getFirstNonPHI());
>>>>>>>>>> +      // Adding a value to the new PHI node from the unrolling loop
>>>>>>>>>> preheader.
>>>>>>>>>> +      NewPN->addIncoming(PN->getIncomingValueForBlock(NewPreHeader),
>>>>>>>>>> PreHeader);
>>>>>>>>>> +      // Adding a value to the new PHI node from the unrolling loop latch.
>>>>>>>>>> +      NewPN->addIncoming(PN->getIncomingValueForBlock(Latch), Latch);
>>>>>>>>>> +
>>>>>>>>>> +      // Update the existing PHI node operand with the value from the new
>>>>>>>>>> PHI
>>>>>>>>>> +      // node.  Corresponding instruction in epilog loop should be PHI.
>>>>>>>>>> +      PHINode *VPN = cast<PHINode>(VMap[&BBI]);
>>>>>>>>>> +      VPN->setIncomingValue(VPN->getBasicBlockIndex(EpilogPreHeader),
>>>>>>>>>> NewPN);
>>>>>>>>>> +    }
>>>>>>>>>> +  }
>>>>>>>>>> +
>>>>>>>>>> +  Instruction *InsertPt = NewExit->getTerminator();
>>>>>>>>>> +  IRBuilder<> B(InsertPt);
>>>>>>>>>> +  Value *BrLoopExit = B.CreateIsNotNull(ModVal);
>>>>>>>>>> +  assert(Exit && "Loop must have a single exit block only");
>>>>>>>>>> +  // Split the exit to maintain loop canonicalization guarantees
>>>>>>>>>> +  SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
>>>>>>>>>> +  SplitBlockPredecessors(Exit, Preds, ".epilog-lcssa", DT, LI,
>>>>>>>>>> +                         PreserveLCSSA);
>>>>>>>>>> +  // Add the branch to the exit block (around the unrolling loop)
>>>>>>>>>> +  B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit);
>>>>>>>>>> InsertPt->eraseFromParent();
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> /// Create a clone of the blocks in a loop and connect them together.
>>>>>>>>>> -/// If UnrollProlog is true, loop structure will not be cloned, otherwise a
>>>>>>>>>> new
>>>>>>>>>> -/// loop will be created including all cloned blocks, and the iterator of
>>>>>>>>>> it
>>>>>>>>>> -/// switches to count NewIter down to 0.
>>>>>>>>>> +/// If CreateRemainderLoop is false, loop structure will not be cloned,
>>>>>>>>>> +/// otherwise a new loop will be created including all cloned blocks, and
>>>>>>>>>> the
>>>>>>>>>> +/// iterator of it switches to count NewIter down to 0.
>>>>>>>>>> +/// The cloned blocks should be inserted between InsertTop and InsertBot.
>>>>>>>>>> +/// If loop structure is cloned InsertTop should be new preheader,
>>>>>>>>>> InsertBot
>>>>>>>>>> +/// new loop exit.
>>>>>>>>>> ///
>>>>>>>>>> -static void CloneLoopBlocks(Loop *L, Value *NewIter, const bool
>>>>>>>>>> UnrollProlog,
>>>>>>>>>> +static void CloneLoopBlocks(Loop *L, Value *NewIter,
>>>>>>>>>> +                            const bool CreateRemainderLoop,
>>>>>>>>>> +                            const bool UseEpilogRemainder,
>>>>>>>>>>                     BasicBlock *InsertTop, BasicBlock *InsertBot,
>>>>>>>>>> +                            BasicBlock *Preheader,
>>>>>>>>>>                     std::vector<BasicBlock *> &NewBlocks,
>>>>>>>>>>                     LoopBlocksDFS &LoopBlocks, ValueToValueMapTy
>>>>>>>>>> &VMap,
>>>>>>>>>>                     LoopInfo *LI) {
>>>>>>>>>> -  BasicBlock *Preheader = L->getLoopPreheader();
>>>>>>>>>> +  StringRef suffix = UseEpilogRemainder ? "epil" : "prol";
>>>>>>>>>> BasicBlock *Header = L->getHeader();
>>>>>>>>>> BasicBlock *Latch = L->getLoopLatch();
>>>>>>>>>> Function *F = Header->getParent();
>>>>>>>>>> @@ -152,7 +281,7 @@ static void CloneLoopBlocks(Loop *L, Val
>>>>>>>>>> LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();
>>>>>>>>>> Loop *NewLoop = nullptr;
>>>>>>>>>> Loop *ParentLoop = L->getParentLoop();
>>>>>>>>>> -  if (!UnrollProlog) {
>>>>>>>>>> +  if (CreateRemainderLoop) {
>>>>>>>>>> NewLoop = new Loop();
>>>>>>>>>> if (ParentLoop)
>>>>>>>>>> ParentLoop->addChildLoop(NewLoop);
>>>>>>>>>> @@ -163,7 +292,7 @@ static void CloneLoopBlocks(Loop *L, Val
>>>>>>>>>> // For each block in the original loop, create a new copy,
>>>>>>>>>> // and update the value map with the newly created values.
>>>>>>>>>> for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {
>>>>>>>>>> -    BasicBlock *NewBB = CloneBasicBlock(*BB, VMap, ".prol", F);
>>>>>>>>>> +    BasicBlock *NewBB = CloneBasicBlock(*BB, VMap, "." + suffix, F);
>>>>>>>>>> NewBlocks.push_back(NewBB);
>>>>>>>>>> 
>>>>>>>>>> if (NewLoop)
>>>>>>>>>> @@ -179,16 +308,17 @@ static void CloneLoopBlocks(Loop *L, Val
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> if (Latch == *BB) {
>>>>>>>>>> -      // For the last block, if UnrollProlog is true, create a direct jump
>>>>>>>>>> to
>>>>>>>>>> -      // InsertBot. If not, create a loop back to cloned head.
>>>>>>>>>> +      // For the last block, if CreateRemainderLoop is false, create a
>>>>>>>>>> direct
>>>>>>>>>> +      // jump to InsertBot. If not, create a loop back to cloned head.
>>>>>>>>>> VMap.erase((*BB)->getTerminator());
>>>>>>>>>> BasicBlock *FirstLoopBB = cast<BasicBlock>(VMap[Header]);
>>>>>>>>>> BranchInst *LatchBR = cast<BranchInst>(NewBB->getTerminator());
>>>>>>>>>> IRBuilder<> Builder(LatchBR);
>>>>>>>>>> -      if (UnrollProlog) {
>>>>>>>>>> +      if (!CreateRemainderLoop) {
>>>>>>>>>> Builder.CreateBr(InsertBot);
>>>>>>>>>> } else {
>>>>>>>>>> -        PHINode *NewIdx = PHINode::Create(NewIter->getType(), 2,
>>>>>>>>>> "prol.iter",
>>>>>>>>>> +        PHINode *NewIdx = PHINode::Create(NewIter->getType(), 2,
>>>>>>>>>> +                                          suffix + ".iter",
>>>>>>>>>>                                   FirstLoopBB->getFirstNonPHI());
>>>>>>>>>> Value *IdxSub =
>>>>>>>>>>     Builder.CreateSub(NewIdx, ConstantInt::get(NewIdx->getType(), 1),
>>>>>>>>>> @@ -207,9 +337,15 @@ static void CloneLoopBlocks(Loop *L, Val
>>>>>>>>>> // cloned loop.
>>>>>>>>>> for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
>>>>>>>>>> PHINode *NewPHI = cast<PHINode>(VMap[&*I]);
>>>>>>>>>> -    if (UnrollProlog) {
>>>>>>>>>> -      VMap[&*I] = NewPHI->getIncomingValueForBlock(Preheader);
>>>>>>>>>> -      cast<BasicBlock>(VMap[Header])->getInstList().erase(NewPHI);
>>>>>>>>>> +    if (!CreateRemainderLoop) {
>>>>>>>>>> +      if (UseEpilogRemainder) {
>>>>>>>>>> +        unsigned idx = NewPHI->getBasicBlockIndex(Preheader);
>>>>>>>>>> +        NewPHI->setIncomingBlock(idx, InsertTop);
>>>>>>>>>> +        NewPHI->removeIncomingValue(Latch, false);
>>>>>>>>>> +      } else {
>>>>>>>>>> +        VMap[&*I] = NewPHI->getIncomingValueForBlock(Preheader);
>>>>>>>>>> +        cast<BasicBlock>(VMap[Header])->getInstList().erase(NewPHI);
>>>>>>>>>> +      }
>>>>>>>>>> } else {
>>>>>>>>>> unsigned idx = NewPHI->getBasicBlockIndex(Preheader);
>>>>>>>>>> NewPHI->setIncomingBlock(idx, InsertTop);
>>>>>>>>>> @@ -254,7 +390,7 @@ static void CloneLoopBlocks(Loop *L, Val
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> -/// Insert code in the prolog code when unrolling a loop with a
>>>>>>>>>> +/// Insert code in the prolog/epilog code when unrolling a loop with a
>>>>>>>>>> /// run-time trip-count.
>>>>>>>>>> ///
>>>>>>>>>> /// This method assumes that the loop unroll factor is total number
>>>>>>>>>> @@ -266,6 +402,7 @@ static void CloneLoopBlocks(Loop *L, Val
>>>>>>>>>> /// instruction in SimplifyCFG.cpp.  Then, the backend decides how code for
>>>>>>>>>> /// the switch instruction is generated.
>>>>>>>>>> ///
>>>>>>>>>> +/// ***Prolog case***
>>>>>>>>>> ///        extraiters = tripcount % loopfactor
>>>>>>>>>> ///        if (extraiters == 0) jump Loop:
>>>>>>>>>> ///        else jump Prol
>>>>>>>>>> @@ -277,17 +414,35 @@ static void CloneLoopBlocks(Loop *L, Val
>>>>>>>>>> /// ...
>>>>>>>>>> /// End:
>>>>>>>>>> ///
>>>>>>>>>> -bool llvm::UnrollRuntimeLoopProlog(Loop *L, unsigned Count,
>>>>>>>>>> -                                   bool AllowExpensiveTripCount, LoopInfo
>>>>>>>>>> *LI,
>>>>>>>>>> -                                   ScalarEvolution *SE, DominatorTree *DT,
>>>>>>>>>> -                                   bool PreserveLCSSA) {
>>>>>>>>>> -  // For now, only unroll loops that contain a single exit.
>>>>>>>>>> +/// ***Epilog case***
>>>>>>>>>> +///        extraiters = tripcount % loopfactor
>>>>>>>>>> +///        if (extraiters == tripcount) jump LoopExit:
>>>>>>>>>> +///        unroll_iters = tripcount - extraiters
>>>>>>>>>> +/// Loop:  LoopBody; (executes unroll_iter times);
>>>>>>>>>> +///        unroll_iter -= 1
>>>>>>>>>> +///        if (unroll_iter != 0) jump Loop:
>>>>>>>>>> +/// LoopExit:
>>>>>>>>>> +///        if (extraiters == 0) jump EpilExit:
>>>>>>>>>> +/// Epil:  LoopBody; (executes extraiters times)
>>>>>>>>>> +///        extraiters -= 1                 // Omitted if unroll factor is
>>>>>>>>>> 2.
>>>>>>>>>> +///        if (extraiters != 0) jump Epil: // Omitted if unroll factor is
>>>>>>>>>> 2.
>>>>>>>>>> +/// EpilExit:
>>>>>>>>>> +
>>>>>>>>>> +bool llvm::UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
>>>>>>>>>> +                                      bool AllowExpensiveTripCount,
>>>>>>>>>> +                                      bool UseEpilogRemainder,
>>>>>>>>>> +                                      LoopInfo *LI, ScalarEvolution *SE,
>>>>>>>>>> +                                      DominatorTree *DT, bool
>>>>>>>>>> PreserveLCSSA) {
>>>>>>>>>> +  // for now, only unroll loops that contain a single exit
>>>>>>>>>> if (!L->getExitingBlock())
>>>>>>>>>> return false;
>>>>>>>>>> 
>>>>>>>>>> // Make sure the loop is in canonical form, and there is a single
>>>>>>>>>> // exit block only.
>>>>>>>>>> -  if (!L->isLoopSimplifyForm() || !L->getUniqueExitBlock())
>>>>>>>>>> +  if (!L->isLoopSimplifyForm())
>>>>>>>>>> +    return false;
>>>>>>>>>> +  BasicBlock *Exit = L->getUniqueExitBlock(); // successor out of loop
>>>>>>>>>> +  if (!Exit)
>>>>>>>>>> return false;
>>>>>>>>>> 
>>>>>>>>>> // Use Scalar Evolution to compute the trip count. This allows more loops to
>>>>>>>>>> @@ -311,8 +466,8 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>>>>>>> return false;
>>>>>>>>>> 
>>>>>>>>>> BasicBlock *Header = L->getHeader();
>>>>>>>>>> -  BasicBlock *PH = L->getLoopPreheader();
>>>>>>>>>> -  BranchInst *PreHeaderBR = cast<BranchInst>(PH->getTerminator());
>>>>>>>>>> +  BasicBlock *PreHeader = L->getLoopPreheader();
>>>>>>>>>> +  BranchInst *PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
>>>>>>>>>> const DataLayout &DL = Header->getModule()->getDataLayout();
>>>>>>>>>> SCEVExpander Expander(*SE, DL, "loop-unroll");
>>>>>>>>>> if (!AllowExpensiveTripCount &&
>>>>>>>>>> @@ -330,26 +485,75 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>>>>>>> SE->forgetLoop(ParentLoop);
>>>>>>>>>> 
>>>>>>>>>> BasicBlock *Latch = L->getLoopLatch();
>>>>>>>>>> -  // It helps to split the original preheader twice, one for the end of the
>>>>>>>>>> -  // prolog code and one for a new loop preheader.
>>>>>>>>>> -  BasicBlock *PEnd = SplitEdge(PH, Header, DT, LI);
>>>>>>>>>> -  BasicBlock *NewPH = SplitBlock(PEnd, PEnd->getTerminator(), DT, LI);
>>>>>>>>>> -  PreHeaderBR = cast<BranchInst>(PH->getTerminator());
>>>>>>>>>> 
>>>>>>>>>> +  // Loop structure is the following:
>>>>>>>>>> +  //
>>>>>>>>>> +  // PreHeader
>>>>>>>>>> +  //   Header
>>>>>>>>>> +  //   ...
>>>>>>>>>> +  //   Latch
>>>>>>>>>> +  // Exit
>>>>>>>>>> +
>>>>>>>>>> +  BasicBlock *NewPreHeader;
>>>>>>>>>> +  BasicBlock *NewExit = nullptr;
>>>>>>>>>> +  BasicBlock *PrologExit = nullptr;
>>>>>>>>>> +  BasicBlock *EpilogPreHeader = nullptr;
>>>>>>>>>> +  BasicBlock *PrologPreHeader = nullptr;
>>>>>>>>>> +
>>>>>>>>>> +  if (UseEpilogRemainder) {
>>>>>>>>>> +    // If epilog remainder
>>>>>>>>>> +    // Split PreHeader to insert a branch around loop for unrolling.
>>>>>>>>>> +    NewPreHeader = SplitBlock(PreHeader, PreHeader->getTerminator(), DT,
>>>>>>>>>> LI);
>>>>>>>>>> +    NewPreHeader->setName(PreHeader->getName() + ".new");
>>>>>>>>>> +    // Split Exit to create phi nodes from branch above.
>>>>>>>>>> +    SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
>>>>>>>>>> +    NewExit = SplitBlockPredecessors(Exit, Preds, ".unr-lcssa",
>>>>>>>>>> +                                     DT, LI, PreserveLCSSA);
>>>>>>>>>> +    // Split NewExit to insert epilog remainder loop.
>>>>>>>>>> +    EpilogPreHeader = SplitBlock(NewExit, NewExit->getTerminator(), DT,
>>>>>>>>>> LI);
>>>>>>>>>> +    EpilogPreHeader->setName(Header->getName() + ".epil.preheader");
>>>>>>>>>> +  } else {
>>>>>>>>>> +    // If prolog remainder
>>>>>>>>>> +    // Split the original preheader twice to insert prolog remainder loop
>>>>>>>>>> +    PrologPreHeader = SplitEdge(PreHeader, Header, DT, LI);
>>>>>>>>>> +    PrologPreHeader->setName(Header->getName() + ".prol.preheader");
>>>>>>>>>> +    PrologExit = SplitBlock(PrologPreHeader,
>>>>>>>>>> PrologPreHeader->getTerminator(),
>>>>>>>>>> +                            DT, LI);
>>>>>>>>>> +    PrologExit->setName(Header->getName() + ".prol.loopexit");
>>>>>>>>>> +    // Split PrologExit to get NewPreHeader.
>>>>>>>>>> +    NewPreHeader = SplitBlock(PrologExit, PrologExit->getTerminator(), DT,
>>>>>>>>>> LI);
>>>>>>>>>> +    NewPreHeader->setName(PreHeader->getName() + ".new");
>>>>>>>>>> +  }
>>>>>>>>>> +  // Loop structure should be the following:
>>>>>>>>>> +  //  Epilog             Prolog
>>>>>>>>>> +  //
>>>>>>>>>> +  // PreHeader         PreHeader
>>>>>>>>>> +  // *NewPreHeader     *PrologPreHeader
>>>>>>>>>> +  //   Header          *PrologExit
>>>>>>>>>> +  //   ...             *NewPreHeader
>>>>>>>>>> +  //   Latch             Header
>>>>>>>>>> +  // *NewExit            ...
>>>>>>>>>> +  // *EpilogPreHeader    Latch
>>>>>>>>>> +  // Exit              Exit
>>>>>>>>>> +
>>>>>>>>>> +  // Calculate conditions for branch around loop for unrolling
>>>>>>>>>> +  // in epilog case and around prolog remainder loop in prolog case.
>>>>>>>>>> // Compute the number of extra iterations required, which is:
>>>>>>>>>> -  //  extra iterations = run-time trip count % (loop unroll factor + 1)
>>>>>>>>>> +  //  extra iterations = run-time trip count % loop unroll factor
>>>>>>>>>> +  PreHeaderBR = cast<BranchInst>(PreHeader->getTerminator());
>>>>>>>>>> Value *TripCount = Expander.expandCodeFor(TripCountSC,
>>>>>>>>>> TripCountSC->getType(),
>>>>>>>>>>                                     PreHeaderBR);
>>>>>>>>>> Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),
>>>>>>>>>>                                   PreHeaderBR);
>>>>>>>>>> -
>>>>>>>>>> IRBuilder<> B(PreHeaderBR);
>>>>>>>>>> Value *ModVal;
>>>>>>>>>> // Calculate ModVal = (BECount + 1) % Count.
>>>>>>>>>> // Note that TripCount is BECount + 1.
>>>>>>>>>> if (isPowerOf2_32(Count)) {
>>>>>>>>>> +    // When Count is power of 2 we don't BECount for epilog case, however
>>>>>>>>>> we'll
>>>>>>>>>> +    // need it for a branch around unrolling loop for prolog case.
>>>>>>>>>> ModVal = B.CreateAnd(TripCount, Count - 1, "xtraiter");
>>>>>>>>>> -    //  1. There are no iterations to be run in the prologue loop.
>>>>>>>>>> +    //  1. There are no iterations to be run in the prolog/epilog loop.
>>>>>>>>>> // OR
>>>>>>>>>> //  2. The addition computing TripCount overflowed.
>>>>>>>>>> //
>>>>>>>>>> @@ -371,18 +575,18 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>>>>>>>                   ConstantInt::get(BECount->getType(), Count),
>>>>>>>>>>                   "xtraiter");
>>>>>>>>>> }
>>>>>>>>>> -  Value *BranchVal = B.CreateIsNotNull(ModVal, "lcmp.mod");
>>>>>>>>>> -
>>>>>>>>>> -  // Branch to either the extra iterations or the cloned/unrolled loop.
>>>>>>>>>> -  // We will fix up the true branch label when adding loop body copies.
>>>>>>>>>> -  B.CreateCondBr(BranchVal, PEnd, PEnd);
>>>>>>>>>> -  assert(PreHeaderBR->isUnconditional() &&
>>>>>>>>>> -         PreHeaderBR->getSuccessor(0) == PEnd &&
>>>>>>>>>> -         "CFG edges in Preheader are not correct");
>>>>>>>>>> +  Value *CmpOperand =
>>>>>>>>>> +      UseEpilogRemainder ? TripCount :
>>>>>>>>>> +                           ConstantInt::get(TripCount->getType(), 0);
>>>>>>>>>> +  Value *BranchVal = B.CreateICmpNE(ModVal, CmpOperand, "lcmp.mod");
>>>>>>>>>> +  BasicBlock *FirstLoop = UseEpilogRemainder ? NewPreHeader :
>>>>>>>>>> PrologPreHeader;
>>>>>>>>>> +  BasicBlock *SecondLoop = UseEpilogRemainder ? NewExit : PrologExit;
>>>>>>>>>> +  // Branch to either remainder (extra iterations) loop or unrolling loop.
>>>>>>>>>> +  B.CreateCondBr(BranchVal, FirstLoop, SecondLoop);
>>>>>>>>>> PreHeaderBR->eraseFromParent();
>>>>>>>>>> Function *F = Header->getParent();
>>>>>>>>>> // Get an ordered list of blocks in the loop to help with the ordering of
>>>>>>>>>> the
>>>>>>>>>> -  // cloned blocks in the prolog code.
>>>>>>>>>> +  // cloned blocks in the prolog/epilog code
>>>>>>>>>> LoopBlocksDFS LoopBlocks(L);
>>>>>>>>>> LoopBlocks.perform(LI);
>>>>>>>>>> 
>>>>>>>>>> @@ -394,17 +598,38 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>>>>>>> std::vector<BasicBlock *> NewBlocks;
>>>>>>>>>> ValueToValueMapTy VMap;
>>>>>>>>>> 
>>>>>>>>>> -  bool UnrollPrologue = Count == 2;
>>>>>>>>>> +  // For unroll factor 2 remainder loop will have 1 iterations.
>>>>>>>>>> +  // Do not create 1 iteration loop.
>>>>>>>>>> +  bool CreateRemainderLoop = (Count != 2);
>>>>>>>>>> 
>>>>>>>>>> // Clone all the basic blocks in the loop. If Count is 2, we don't clone
>>>>>>>>>> // the loop, otherwise we create a cloned loop to execute the extra
>>>>>>>>>> // iterations. This function adds the appropriate CFG connections.
>>>>>>>>>> -  CloneLoopBlocks(L, ModVal, UnrollPrologue, PH, PEnd, NewBlocks,
>>>>>>>>>> LoopBlocks,
>>>>>>>>>> -                  VMap, LI);
>>>>>>>>>> +  BasicBlock *InsertBot = UseEpilogRemainder ? Exit : PrologExit;
>>>>>>>>>> +  BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader :
>>>>>>>>>> PrologPreHeader;
>>>>>>>>>> +  CloneLoopBlocks(L, ModVal, CreateRemainderLoop, UseEpilogRemainder,
>>>>>>>>>> InsertTop,
>>>>>>>>>> +                  InsertBot, NewPreHeader, NewBlocks, LoopBlocks, VMap,
>>>>>>>>>> LI);
>>>>>>>>>> +
>>>>>>>>>> +  // Insert the cloned blocks into the function.
>>>>>>>>>> +  F->getBasicBlockList().splice(InsertBot->getIterator(),
>>>>>>>>>> +                                F->getBasicBlockList(),
>>>>>>>>>> +                                NewBlocks[0]->getIterator(),
>>>>>>>>>> +                                F->end());
>>>>>>>>>> 
>>>>>>>>>> -  // Insert the cloned blocks into the function just before the original
>>>>>>>>>> loop.
>>>>>>>>>> -  F->getBasicBlockList().splice(PEnd->getIterator(),
>>>>>>>>>> F->getBasicBlockList(),
>>>>>>>>>> -                                NewBlocks[0]->getIterator(), F->end());
>>>>>>>>>> +  // Loop structure should be the following:
>>>>>>>>>> +  //  Epilog             Prolog
>>>>>>>>>> +  //
>>>>>>>>>> +  // PreHeader         PreHeader
>>>>>>>>>> +  // NewPreHeader      PrologPreHeader
>>>>>>>>>> +  //   Header            PrologHeader
>>>>>>>>>> +  //   ...               ...
>>>>>>>>>> +  //   Latch             PrologLatch
>>>>>>>>>> +  // NewExit           PrologExit
>>>>>>>>>> +  // EpilogPreHeader   NewPreHeader
>>>>>>>>>> +  //   EpilogHeader      Header
>>>>>>>>>> +  //   ...               ...
>>>>>>>>>> +  //   EpilogLatch       Latch
>>>>>>>>>> +  // Exit              Exit
>>>>>>>>>> 
>>>>>>>>>> // Rewrite the cloned instruction operands to use the values created when
>>>>>>>>>> the
>>>>>>>>>> // clone is created.
>>>>>>>>>> @@ -415,11 +640,38 @@ bool llvm::UnrollRuntimeLoopProlog(Loop
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> -  // Connect the prolog code to the original loop and update the
>>>>>>>>>> -  // PHI functions.
>>>>>>>>>> -  BasicBlock *LastLoopBB = cast<BasicBlock>(VMap[Latch]);
>>>>>>>>>> -  ConnectProlog(L, BECount, Count, LastLoopBB, PEnd, PH, NewPH, VMap, DT,
>>>>>>>>>> LI,
>>>>>>>>>> -                PreserveLCSSA);
>>>>>>>>>> +  if (UseEpilogRemainder) {
>>>>>>>>>> +    // Connect the epilog code to the original loop and update the
>>>>>>>>>> +    // PHI functions.
>>>>>>>>>> +    ConnectEpilog(L, ModVal, NewExit, Exit, PreHeader,
>>>>>>>>>> +                  EpilogPreHeader, NewPreHeader, VMap, DT, LI,
>>>>>>>>>> +                  PreserveLCSSA);
>>>>>>>>>> +
>>>>>>>>>> +    // Update counter in loop for unrolling.
>>>>>>>>>> +    // I should be multiply of Count.
>>>>>>>>>> +    IRBuilder<> B2(NewPreHeader->getTerminator());
>>>>>>>>>> +    Value *TestVal = B2.CreateSub(TripCount, ModVal, "unroll_iter");
>>>>>>>>>> +    BranchInst *LatchBR = cast<BranchInst>(Latch->getTerminator());
>>>>>>>>>> +    B2.SetInsertPoint(LatchBR);
>>>>>>>>>> +    PHINode *NewIdx = PHINode::Create(TestVal->getType(), 2, "niter",
>>>>>>>>>> +                                      Header->getFirstNonPHI());
>>>>>>>>>> +    Value *IdxSub =
>>>>>>>>>> +        B2.CreateSub(NewIdx, ConstantInt::get(NewIdx->getType(), 1),
>>>>>>>>>> +                     NewIdx->getName() + ".nsub");
>>>>>>>>>> +    Value *IdxCmp;
>>>>>>>>>> +    if (LatchBR->getSuccessor(0) == Header)
>>>>>>>>>> +      IdxCmp = B2.CreateIsNotNull(IdxSub, NewIdx->getName() + ".ncmp");
>>>>>>>>>> +    else
>>>>>>>>>> +      IdxCmp = B2.CreateIsNull(IdxSub, NewIdx->getName() + ".ncmp");
>>>>>>>>>> +    NewIdx->addIncoming(TestVal, NewPreHeader);
>>>>>>>>>> +    NewIdx->addIncoming(IdxSub, Latch);
>>>>>>>>>> +    LatchBR->setCondition(IdxCmp);
>>>>>>>>>> +  } else {
>>>>>>>>>> +    // Connect the prolog code to the original loop and update the
>>>>>>>>>> +    // PHI functions.
>>>>>>>>>> +    ConnectProlog(L, BECount, Count, PrologExit, PreHeader, NewPreHeader,
>>>>>>>>>> +                  VMap, DT, LI, PreserveLCSSA);
>>>>>>>>>> +  }
>>>>>>>>>> NumRuntimeUnrolled++;
>>>>>>>>>> return true;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -1,13 +1,21 @@
>>>>>>>>>> -; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 |
>>>>>>>>>> FileCheck %s
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 |
>>>>>>>>>> FileCheck %s -check-prefix=EPILOG
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57
>>>>>>>>>> -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
>>>>>>>>>> 
>>>>>>>>>> ; Tests for unrolling loops with run-time trip counts
>>>>>>>>>> 
>>>>>>>>>> -; CHECK:  %xtraiter = and i32 %n
>>>>>>>>>> -; CHECK:  %lcmp.mod = icmp ne i32 %xtraiter, 0
>>>>>>>>>> -; CHECK:  br i1 %lcmp.mod, label %for.body.prol, label
>>>>>>>>>> %for.body.preheader.split
>>>>>>>>>> +; EPILOG:  %xtraiter = and i32 %n
>>>>>>>>>> +; EPILOG:  %lcmp.mod = icmp ne i32 %xtraiter, %n
>>>>>>>>>> +; EPILOG:  br i1 %lcmp.mod, label %for.body.preheader.new, label
>>>>>>>>>> %for.end.loopexit.unr-lcssa
>>>>>>>>>> 
>>>>>>>>>> -; CHECK:  for.body.prol:
>>>>>>>>>> -; CHECK:  for.body:
>>>>>>>>>> +; PROLOG:  %xtraiter = and i32 %n
>>>>>>>>>> +; PROLOG:  %lcmp.mod = icmp ne i32 %xtraiter, 0
>>>>>>>>>> +; PROLOG:  br i1 %lcmp.mod, label %for.body.prol.preheader, label
>>>>>>>>>> %for.body.prol.loopexit
>>>>>>>>>> +
>>>>>>>>>> +; EPILOG:  for.body:
>>>>>>>>>> +; EPILOG:  for.body.epil:
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG:  for.body.prol:
>>>>>>>>>> +; PROLOG:  for.body:
>>>>>>>>>> 
>>>>>>>>>> define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
>>>>>>>>>> entry:
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -1,4 +1,5 @@
>>>>>>>>>> -; RUN: opt < %s -S -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2
>>>>>>>>>> -loop-unroll | FileCheck %s
>>>>>>>>>> +; RUN: opt < %s -S -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2
>>>>>>>>>> -loop-unroll | FileCheck %s -check-prefix=EPILOG
>>>>>>>>>> +; RUN: opt < %s -S -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2
>>>>>>>>>> -loop-unroll -unroll-runtime-epilog=false | FileCheck %s
>>>>>>>>>> -check-prefix=PROLOG
>>>>>>>>>> define void @unroll_opt_for_size() nounwind optsize {
>>>>>>>>>> entry:
>>>>>>>>>> br label %loop
>>>>>>>>>> @@ -13,11 +14,17 @@ exit:
>>>>>>>>>> ret void
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> -; CHECK-LABEL: @unroll_opt_for_size
>>>>>>>>>> -; CHECK:      add
>>>>>>>>>> -; CHECK-NEXT: add
>>>>>>>>>> -; CHECK-NEXT: add
>>>>>>>>>> -; CHECK: icmp
>>>>>>>>>> +; EPILOG-LABEL: @unroll_opt_for_size
>>>>>>>>>> +; EPILOG:      add
>>>>>>>>>> +; EPILOG-NEXT: add
>>>>>>>>>> +; EPILOG-NEXT: add
>>>>>>>>>> +; EPILOG: icmp
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG-LABEL: @unroll_opt_for_size
>>>>>>>>>> +; PROLOG:      add
>>>>>>>>>> +; PROLOG-NEXT: add
>>>>>>>>>> +; PROLOG-NEXT: add
>>>>>>>>>> +; PROLOG: icmp
>>>>>>>>>> 
>>>>>>>>>> define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
>>>>>>>>>> entry:
>>>>>>>>>> @@ -40,8 +47,13 @@ for.end:
>>>>>>>>>> ret i32 %sum.0.lcssa
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> -; CHECK-LABEL: @test
>>>>>>>>>> -; CHECK: for.body.prol{{.*}}:
>>>>>>>>>> -; CHECK: for.body:
>>>>>>>>>> -; CHECK: br i1 %exitcond.7, label %for.end.loopexit{{.*}}, label %for.body
>>>>>>>>>> +; EPILOG-LABEL: @test
>>>>>>>>>> +; EPILOG: for.body:
>>>>>>>>>> +; EPILOG: br i1 %niter.ncmp.7, label %for.end.loopexit{{.*}}, label
>>>>>>>>>> %for.body
>>>>>>>>>> +; EPILOG: for.body.epil{{.*}}:
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG-LABEL: @test
>>>>>>>>>> +; PROLOG: for.body.prol{{.*}}:
>>>>>>>>>> +; PROLOG: for.body:
>>>>>>>>>> +; PROLOG: br i1 %exitcond.7, label %for.end.loopexit{{.*}}, label %for.body
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/X86/mmx.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/X86/mmx.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/X86/mmx.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/X86/mmx.ll Tue Apr  5 07:19:35
>>>>>>>>>> 2016
>>>>>>>>>> @@ -14,9 +14,9 @@ for.body:
>>>>>>>>>> 
>>>>>>>>>> exit:                                             ; preds = %for.body
>>>>>>>>>> %ret = phi x86_mmx [ undef, %for.body ]
>>>>>>>>>> -  ; CHECK: %[[ret_unr:.*]] = phi x86_mmx [ undef,
>>>>>>>>>> -  ; CHECK: %[[ret_ph:.*]]  = phi x86_mmx [ undef,
>>>>>>>>>> -  ; CHECK: %[[ret:.*]] = phi x86_mmx [ %[[ret_unr]], {{.*}} ], [
>>>>>>>>>> %[[ret_ph]]
>>>>>>>>>> +  ; CHECK: %[[ret_ph:.*]] = phi x86_mmx [ undef, %entry
>>>>>>>>>> +  ; CHECK: %[[ret_ph1:.*]]  = phi x86_mmx [ undef,
>>>>>>>>>> +  ; CHECK: %[[ret:.*]] = phi x86_mmx [ %[[ret_ph]], {{.*}} ], [
>>>>>>>>>> %[[ret_ph1]],
>>>>>>>>>> ; CHECK: ret x86_mmx %[[ret]]
>>>>>>>>>> ret x86_mmx %ret
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> Modified:
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> ---
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
>>>>>>>>>> (original)
>>>>>>>>>> +++
>>>>>>>>>> llvm/trunk/test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
>>>>>>>>>> Tue Apr  5 07:19:35 2016
>>>>>>>>>> @@ -34,7 +34,7 @@ define i32 @test2(i64* %loc, i64 %conv7)
>>>>>>>>>> ; CHECK: udiv
>>>>>>>>>> ; CHECK: udiv
>>>>>>>>>> ; CHECK-NOT: udiv
>>>>>>>>>> -; CHECK-LABEL: for.body.prol
>>>>>>>>>> +; CHECK-LABEL: for.body
>>>>>>>>>> entry:
>>>>>>>>>> %rem0 = load i64, i64* %loc, align 8
>>>>>>>>>> %ExpensiveComputation = udiv i64 %rem0, 42 ; <<< Extra computations are
>>>>>>>>>> added to the trip-count expression
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -1,18 +1,30 @@
>>>>>>>>>> -; RUN: opt < %s -S -loop-unroll -unroll-runtime=true | FileCheck %s
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -unroll-runtime=true | FileCheck %s
>>>>>>>>>> -check-prefix=EPILOG
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -unroll-runtime=true
>>>>>>>>>> -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
>>>>>>>>>> 
>>>>>>>>>> target datalayout =
>>>>>>>>>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>>>>>>>>>> 
>>>>>>>>>> ; Tests for unrolling loops with run-time trip counts
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: %xtraiter = and i32 %n
>>>>>>>>>> -; CHECK:  %lcmp.mod = icmp ne i32 %xtraiter, 0
>>>>>>>>>> -; CHECK:  br i1 %lcmp.mod, label %for.body.prol, label
>>>>>>>>>> %for.body.preheader.split
>>>>>>>>>> -
>>>>>>>>>> -; CHECK: for.body.prol:
>>>>>>>>>> -; CHECK: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol
>>>>>>>>>> ], [ 0, %for.body.preheader ]
>>>>>>>>>> -; CHECK:  %prol.iter.sub = sub i32 %prol.iter, 1
>>>>>>>>>> -; CHECK:  %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
>>>>>>>>>> -; CHECK:  br i1 %prol.iter.cmp, label %for.body.prol, label
>>>>>>>>>> %for.body.preheader.split, !llvm.loop !0
>>>>>>>>>> +; EPILOG: %xtraiter = and i32 %n
>>>>>>>>>> +; EPILOG:  %lcmp.mod = icmp ne i32 %xtraiter, %n
>>>>>>>>>> +; EPILOG:  br i1 %lcmp.mod, label %for.body.preheader.new, label
>>>>>>>>>> %for.end.loopexit.unr-lcssa
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: %xtraiter = and i32 %n
>>>>>>>>>> +; PROLOG:  %lcmp.mod = icmp ne i32 %xtraiter, 0
>>>>>>>>>> +; PROLOG:  br i1 %lcmp.mod, label %for.body.prol.preheader, label
>>>>>>>>>> %for.body.prol.loopexit
>>>>>>>>>> +
>>>>>>>>>> +; EPILOG: for.body.epil:
>>>>>>>>>> +; EPILOG: %indvars.iv.epil = phi i64 [ %indvars.iv.next.epil,
>>>>>>>>>> %for.body.epil ],  [ %indvars.iv.unr, %for.body.epil.preheader ]
>>>>>>>>>> +; EPILOG:  %epil.iter.sub = sub i32 %epil.iter, 1
>>>>>>>>>> +; EPILOG:  %epil.iter.cmp = icmp ne i32 %epil.iter.sub, 0
>>>>>>>>>> +; EPILOG:  br i1 %epil.iter.cmp, label %for.body.epil, label
>>>>>>>>>> %for.end.loopexit.epilog-lcssa, !llvm.loop !0
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: for.body.prol:
>>>>>>>>>> +; PROLOG: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol,
>>>>>>>>>> %for.body.prol ], [ 0, %for.body.prol.preheader ]
>>>>>>>>>> +; PROLOG:  %prol.iter.sub = sub i32 %prol.iter, 1
>>>>>>>>>> +; PROLOG:  %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0
>>>>>>>>>> +; PROLOG:  br i1 %prol.iter.cmp, label %for.body.prol, label
>>>>>>>>>> %for.body.prol.loopexit, !llvm.loop !0
>>>>>>>>>> +
>>>>>>>>>> 
>>>>>>>>>> define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
>>>>>>>>>> entry:
>>>>>>>>>> @@ -39,8 +51,11 @@ for.end:
>>>>>>>>>> ; Still try to completely unroll loops with compile-time trip counts
>>>>>>>>>> ; even if the -unroll-runtime is specified
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: for.body:
>>>>>>>>>> -; CHECK-NOT: for.body.prol:
>>>>>>>>>> +; EPILOG: for.body:
>>>>>>>>>> +; EPILOG-NOT: for.body.epil:
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: for.body:
>>>>>>>>>> +; PROLOG-NOT: for.body.prol:
>>>>>>>>>> 
>>>>>>>>>> define i32 @test1(i32* nocapture %a) nounwind uwtable readonly {
>>>>>>>>>> entry:
>>>>>>>>>> @@ -64,7 +79,8 @@ for.end:
>>>>>>>>>> ; This is test 2007-05-09-UnknownTripCount.ll which can be unrolled now
>>>>>>>>>> ; if the -unroll-runtime option is turned on
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: bb72.2:
>>>>>>>>>> +; EPILOG: bb72.2:
>>>>>>>>>> +; PROLOG: bb72.2:
>>>>>>>>>> 
>>>>>>>>>> define void @foo(i32 %trips) {
>>>>>>>>>> entry:
>>>>>>>>>> @@ -86,8 +102,11 @@ cond_true138:
>>>>>>>>>> 
>>>>>>>>>> ; Test run-time unrolling for a loop that counts down by -2.
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: for.body.prol:
>>>>>>>>>> -; CHECK: br i1 %prol.iter.cmp, label %for.body.prol, label
>>>>>>>>>> %for.body.preheader.split
>>>>>>>>>> +; EPILOG: for.body.epil:
>>>>>>>>>> +; EPILOG: br i1 %epil.iter.cmp, label %for.body.epil, label
>>>>>>>>>> %for.cond.for.end_crit_edge.epilog-lcssa
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: for.body.prol:
>>>>>>>>>> +; PROLOG: br i1 %prol.iter.cmp, label %for.body.prol, label
>>>>>>>>>> %for.body.prol.loopexit
>>>>>>>>>> 
>>>>>>>>>> define zeroext i16 @down(i16* nocapture %p, i32 %len) nounwind uwtable
>>>>>>>>>> readonly {
>>>>>>>>>> entry:
>>>>>>>>>> @@ -116,8 +135,11 @@ for.end:
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> ; Test run-time unrolling disable metadata.
>>>>>>>>>> -; CHECK: for.body:
>>>>>>>>>> -; CHECK-NOT: for.body.prol:
>>>>>>>>>> +; EPILOG: for.body:
>>>>>>>>>> +; EPILOG-NOT: for.body.epil:
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: for.body:
>>>>>>>>>> +; PROLOG-NOT: for.body.prol:
>>>>>>>>>> 
>>>>>>>>>> define zeroext i16 @test2(i16* nocapture %p, i32 %len) nounwind uwtable
>>>>>>>>>> readonly {
>>>>>>>>>> entry:
>>>>>>>>>> @@ -148,6 +170,8 @@ for.end:
>>>>>>>>>> !0 = distinct !{!0, !1}
>>>>>>>>>> !1 = !{!"llvm.loop.unroll.runtime.disable"}
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: !0 = distinct !{!0, !1}
>>>>>>>>>> -; CHECK: !1 = !{!"llvm.loop.unroll.disable"}
>>>>>>>>>> +; EPILOG: !0 = distinct !{!0, !1}
>>>>>>>>>> +; EPILOG: !1 = !{!"llvm.loop.unroll.disable"}
>>>>>>>>>> 
>>>>>>>>>> +; PROLOG: !0 = distinct !{!0, !1}
>>>>>>>>>> +; PROLOG: !1 = !{!"llvm.loop.unroll.disable"}
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop1.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -1,19 +1,35 @@
>>>>>>>>>> -; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-count=2 | FileCheck
>>>>>>>>>> %s
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-count=2 | FileCheck
>>>>>>>>>> %s -check-prefix=EPILOG
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-count=2
>>>>>>>>>> -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
>>>>>>>>>> 
>>>>>>>>>> ; This tests that setting the unroll count works
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: for.body.preheader:
>>>>>>>>>> -; CHECK:   br {{.*}} label %for.body.prol, label %for.body.preheader.split,
>>>>>>>>>> !dbg [[PH_LOC:![0-9]+]]
>>>>>>>>>> -; CHECK: for.body.prol:
>>>>>>>>>> -; CHECK:   br label %for.body.preheader.split, !dbg [[BODY_LOC:![0-9]+]]
>>>>>>>>>> -; CHECK: for.body.preheader.split:
>>>>>>>>>> -; CHECK:   br {{.*}} label %for.end.loopexit, label
>>>>>>>>>> %for.body.preheader.split.split, !dbg [[PH_LOC]]
>>>>>>>>>> -; CHECK: for.body:
>>>>>>>>>> -; CHECK:   br i1 %exitcond.1, label %for.end.loopexit.unr-lcssa, label
>>>>>>>>>> %for.body, !dbg [[BODY_LOC]]
>>>>>>>>>> -; CHECK-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label
>>>>>>>>>> %for.body
>>>>>>>>>> 
>>>>>>>>>> -; CHECK-DAG: [[PH_LOC]] = !DILocation(line: 101, column: 1, scope: !{{.*}})
>>>>>>>>>> -; CHECK-DAG: [[BODY_LOC]] = !DILocation(line: 102, column: 1, scope:
>>>>>>>>>> !{{.*}})
>>>>>>>>>> +; EPILOG: for.body.preheader:
>>>>>>>>>> +; EPILOG:   br i1 %lcmp.mod, label %for.body.preheader.new, label
>>>>>>>>>> %for.end.loopexit.unr-lcssa, !dbg [[PH_LOC:![0-9]+]]
>>>>>>>>>> +; EPILOG: for.body:
>>>>>>>>>> +; EPILOG:   br i1 %niter.ncmp.1, label
>>>>>>>>>> %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !dbg
>>>>>>>>>> [[BODY_LOC:![0-9]+]]
>>>>>>>>>> +; EPILOG-NOT: br i1 %niter.ncmp.2, label %for.end.loopexit{{.*}}, label
>>>>>>>>>> %for.body
>>>>>>>>>> +; EPILOG: for.body.epil.preheader:
>>>>>>>>>> +; EPILOG:   br label %for.body.epil, !dbg [[EXIT_LOC:![0-9]+]]
>>>>>>>>>> +; EPILOG: for.body.epil:
>>>>>>>>>> +; EPILOG:   br label %for.end.loopexit.epilog-lcssa, !dbg
>>>>>>>>>> [[BODY_LOC:![0-9]+]]
>>>>>>>>>> +
>>>>>>>>>> +; EPILOG-DAG: [[PH_LOC]] = !DILocation(line: 101, column: 1, scope:
>>>>>>>>>> !{{.*}})
>>>>>>>>>> +; EPILOG-DAG: [[BODY_LOC]] = !DILocation(line: 102, column: 1, scope:
>>>>>>>>>> !{{.*}})
>>>>>>>>>> +; EPILOG-DAG: [[EXIT_LOC]] = !DILocation(line: 103, column: 1, scope:
>>>>>>>>>> !{{.*}})
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: for.body.preheader:
>>>>>>>>>> +; PROLOG:   br {{.*}} label %for.body.prol.preheader, label
>>>>>>>>>> %for.body.prol.loopexit, !dbg [[PH_LOC:![0-9]+]]
>>>>>>>>>> +; PROLOG: for.body.prol:
>>>>>>>>>> +; PROLOG:   br label %for.body.prol.loopexit, !dbg [[BODY_LOC:![0-9]+]]
>>>>>>>>>> +; PROLOG: for.body.prol.loopexit:
>>>>>>>>>> +; PROLOG:   br {{.*}} label %for.end.loopexit, label
>>>>>>>>>> %for.body.preheader.new, !dbg [[PH_LOC]]
>>>>>>>>>> +; PROLOG: for.body:
>>>>>>>>>> +; PROLOG:   br i1 %exitcond.1, label %for.end.loopexit.unr-lcssa, label
>>>>>>>>>> %for.body, !dbg [[BODY_LOC]]
>>>>>>>>>> +; PROLOG-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label
>>>>>>>>>> %for.body
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG-DAG: [[PH_LOC]] = !DILocation(line: 101, column: 1, scope:
>>>>>>>>>> !{{.*}})
>>>>>>>>>> +; PROLOG-DAG: [[BODY_LOC]] = !DILocation(line: 102, column: 1, scope:
>>>>>>>>>> !{{.*}})
>>>>>>>>>> 
>>>>>>>>>> define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly !dbg
>>>>>>>>>> !6 {
>>>>>>>>>> entry:
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop2.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop2.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop2.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop2.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -1,12 +1,18 @@
>>>>>>>>>> -; RUN: opt < %s -S -loop-unroll -unroll-threshold=25 -unroll-runtime
>>>>>>>>>> -unroll-count=8 | FileCheck %s
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -unroll-threshold=25 -unroll-runtime
>>>>>>>>>> -unroll-count=8 | FileCheck %s  -check-prefix=EPILOG
>>>>>>>>>> +; RUN: opt < %s -S -loop-unroll -unroll-threshold=25 -unroll-runtime
>>>>>>>>>> -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
>>>>>>>>>> 
>>>>>>>>>> ; Choose a smaller, power-of-two, unroll count if the loop is too large.
>>>>>>>>>> ; This test makes sure we're not unrolling 'odd' counts
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: for.body.prol:
>>>>>>>>>> -; CHECK: for.body:
>>>>>>>>>> -; CHECK: br i1 %exitcond.3, label %for.end.loopexit{{.*}}, label %for.body
>>>>>>>>>> -; CHECK-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label
>>>>>>>>>> %for.body
>>>>>>>>>> +; EPILOG: for.body:
>>>>>>>>>> +; EPILOG: br i1 %niter.ncmp.3, label
>>>>>>>>>> %for.end.loopexit.unr-lcssa.loopexit{{.*}}, label %for.body
>>>>>>>>>> +; EPILOG-NOT: br i1 %niter.ncmp.4, label
>>>>>>>>>> %for.end.loopexit.unr-lcssa.loopexit{{.*}}, label %for.body
>>>>>>>>>> +; EPILOG: for.body.epil:
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: for.body.prol:
>>>>>>>>>> +; PROLOG: for.body:
>>>>>>>>>> +; PROLOG: br i1 %exitcond.3, label %for.end.loopexit{{.*}}, label %for.body
>>>>>>>>>> +; PROLOG-NOT: br i1 %exitcond.4, label %for.end.loopexit{{.*}}, label
>>>>>>>>>> %for.body
>>>>>>>>>> 
>>>>>>>>>> define i32 @test(i32* nocapture %a, i32 %n) nounwind uwtable readonly {
>>>>>>>>>> entry:
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop4.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop4.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop4.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop4.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -1,13 +1,21 @@
>>>>>>>>>> -; RUN: opt < %s -S -O2 -unroll-runtime=true | FileCheck %s
>>>>>>>>>> +; RUN: opt < %s -S -O2 -unroll-runtime=true | FileCheck %s
>>>>>>>>>> -check-prefix=EPILOG
>>>>>>>>>> +; RUN: opt < %s -S -O2 -unroll-runtime=true -unroll-runtime-epilog=false |
>>>>>>>>>> FileCheck %s -check-prefix=PROLOG
>>>>>>>>>> 
>>>>>>>>>> ; Check runtime unrolling prologue can be promoted by LICM pass.
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: entry:
>>>>>>>>>> -; CHECK: %xtraiter
>>>>>>>>>> -; CHECK: %lcmp.mod
>>>>>>>>>> -; CHECK: loop1:
>>>>>>>>>> -; CHECK: br i1 %lcmp.mod
>>>>>>>>>> -; CHECK: loop2.prol:
>>>>>>>>>> +; EPILOG: entry:
>>>>>>>>>> +; EPILOG: %xtraiter
>>>>>>>>>> +; EPILOG: %lcmp.mod
>>>>>>>>>> +; EPILOG: loop1:
>>>>>>>>>> +; EPILOG: br i1 %lcmp.mod
>>>>>>>>>> +; EPILOG: loop2.epil:
>>>>>>>>>> +
>>>>>>>>>> +; PROLOG: entry:
>>>>>>>>>> +; PROLOG: %xtraiter
>>>>>>>>>> +; PROLOG: %lcmp.mod
>>>>>>>>>> +; PROLOG: loop1:
>>>>>>>>>> +; PROLOG: br i1 %lcmp.mod
>>>>>>>>>> +; PROLOG: loop2.prol:
>>>>>>>>>> 
>>>>>>>>>> define void @unroll(i32 %iter, i32* %addr1, i32* %addr2) nounwind {
>>>>>>>>>> entry:
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/runtime-loop5.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/runtime-loop5.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/runtime-loop5.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/runtime-loop5.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -11,9 +11,6 @@ entry:
>>>>>>>>>> %cmp1 = icmp eq i3 %n, 0
>>>>>>>>>> br i1 %cmp1, label %for.end, label %for.body
>>>>>>>>>> 
>>>>>>>>>> -; UNROLL-16-NOT: for.body.prol:
>>>>>>>>>> -; UNROLL-4: for.body.prol:
>>>>>>>>>> -
>>>>>>>>>> for.body:                                         ; preds = %for.body,
>>>>>>>>>> %entry
>>>>>>>>>> ; UNROLL-16-LABEL: for.body:
>>>>>>>>>> ; UNROLL-4-LABEL: for.body:
>>>>>>>>>> @@ -39,6 +36,10 @@ for.body:
>>>>>>>>>> 
>>>>>>>>>> ; UNROLL-16-LABEL: for.end
>>>>>>>>>> ; UNROLL-4-LABEL: for.end
>>>>>>>>>> +
>>>>>>>>>> +; UNROLL-16-NOT: for.body.epil:
>>>>>>>>>> +; UNROLL-4: for.body.epil:
>>>>>>>>>> +
>>>>>>>>>> for.end:                                          ; preds = %for.body,
>>>>>>>>>> %entry
>>>>>>>>>> %sum.0.lcssa = phi i3 [ 0, %entry ], [ %add, %for.body ]
>>>>>>>>>> ret i3 %sum.0.lcssa
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/tripcount-overflow.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -13,13 +13,13 @@ target datalayout = "e-m:o-i64:64-f80:12
>>>>>>>>>> ; CHECK: entry:
>>>>>>>>>> ; CHECK-NEXT: %0 = add i32 %N, 1
>>>>>>>>>> ; CHECK-NEXT: %xtraiter = and i32 %0, 1
>>>>>>>>>> -; CHECK-NEXT: %lcmp.mod = icmp ne i32 %xtraiter, 0
>>>>>>>>>> -; CHECK-NEXT: br i1 %lcmp.mod, label %while.body.prol, label %entry.split
>>>>>>>>>> +; CHECK-NEXT: %lcmp.mod = icmp ne i32 %xtraiter, %0
>>>>>>>>>> +; CHECK-NEXT: br i1 %lcmp.mod, label %entry.new, label %while.end.unr-lcssa
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: while.body.prol:
>>>>>>>>>> -; CHECK: br label %entry.split
>>>>>>>>>> +; CHECK: while.body.epil:
>>>>>>>>>> +; CHECK: br label %while.end.epilog-lcssa
>>>>>>>>>> 
>>>>>>>>>> -; CHECK: entry.split:
>>>>>>>>>> +; CHECK: while.end.epilog-lcssa:
>>>>>>>>>> 
>>>>>>>>>> ; Function Attrs: nounwind readnone ssp uwtable
>>>>>>>>>> define i32 @foo(i32 %N) {
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/unroll-cleanup.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/unroll-cleanup.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/unroll-cleanup.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/unroll-cleanup.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -4,14 +4,14 @@
>>>>>>>>>> ; RUN: opt < %s -O2 -S | FileCheck %s
>>>>>>>>>> 
>>>>>>>>>> ; After loop unroll:
>>>>>>>>>> -;       %dec18 = add nsw i32 %dec18.in, -1
>>>>>>>>>> +;       %niter.nsub = add nsw i32 %niter, -1
>>>>>>>>>> ;       ...
>>>>>>>>>> -;       %dec18.1 = add nsw i32 %dec18, -1
>>>>>>>>>> +;       %niter.nsub.1 = add nsw i32 %niter.nsub, -1
>>>>>>>>>> ; should be merged to:
>>>>>>>>>> -;       %dec18.1 = add nsw i32 %dec18.in, -2
>>>>>>>>>> +;       %dec18.1 = add nsw i32 %niter, -2
>>>>>>>>>> ;
>>>>>>>>>> ; CHECK-LABEL: @_Z3fn1v(
>>>>>>>>>> -; CHECK: %dec18.1 = add nsw i32 %dec18.in, -2
>>>>>>>>>> +; CHECK: %niter.nsub.1 = add i32 %niter, -2
>>>>>>>>>> 
>>>>>>>>>> ; ModuleID = '<stdin>'
>>>>>>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>>>>>> 
>>>>>>>>>> Modified: llvm/trunk/test/Transforms/LoopUnroll/unroll-pragmas.ll
>>>>>>>>>> URL:
>>>>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/unroll-pragmas.ll?rev=265388&r1=265387&r2=265388&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/test/Transforms/LoopUnroll/unroll-pragmas.ll (original)
>>>>>>>>>> +++ llvm/trunk/test/Transforms/LoopUnroll/unroll-pragmas.ll Tue Apr  5
>>>>>>>>>> 07:19:35 2016
>>>>>>>>>> @@ -171,10 +171,6 @@ for.end:
>>>>>>>>>> ; should be duplicated (original and 4x unrolled).
>>>>>>>>>> ;
>>>>>>>>>> ; CHECK-LABEL: @runtime_loop_with_count4(
>>>>>>>>>> -; CHECK: for.body.prol:
>>>>>>>>>> -; CHECK: store
>>>>>>>>>> -; CHECK-NOT: store
>>>>>>>>>> -; CHECK: br i1
>>>>>>>>>> ; CHECK: for.body
>>>>>>>>>> ; CHECK: store
>>>>>>>>>> ; CHECK: store
>>>>>>>>>> @@ -182,6 +178,10 @@ for.end:
>>>>>>>>>> ; CHECK: store
>>>>>>>>>> ; CHECK-NOT: store
>>>>>>>>>> ; CHECK: br i1
>>>>>>>>>> +; CHECK: for.body.epil:
>>>>>>>>>> +; CHECK: store
>>>>>>>>>> +; CHECK-NOT: store
>>>>>>>>>> +; CHECK: br i1
>>>>>>>>>> define void @runtime_loop_with_count4(i32* nocapture %a, i32 %b) {
>>>>>>>>>> entry:
>>>>>>>>>> %cmp3 = icmp sgt i32 %b, 0
>>>>>>>>>> @@ -287,10 +287,6 @@ for.end:
>>>>>>>>>> ; (original and 8x).
>>>>>>>>>> ;
>>>>>>>>>> ; CHECK-LABEL: @runtime_loop_with_enable(
>>>>>>>>>> -; CHECK: for.body.prol:
>>>>>>>>>> -; CHECK: store
>>>>>>>>>> -; CHECK-NOT: store
>>>>>>>>>> -; CHECK: br i1
>>>>>>>>>> ; CHECK: for.body:
>>>>>>>>>> ; CHECK: store i32
>>>>>>>>>> ; CHECK: store i32
>>>>>>>>>> @@ -302,6 +298,10 @@ for.end:
>>>>>>>>>> ; CHECK: store i32
>>>>>>>>>> ; CHECK-NOT: store i32
>>>>>>>>>> ; CHECK: br i1
>>>>>>>>>> +; CHECK: for.body.epil:
>>>>>>>>>> +; CHECK: store
>>>>>>>>>> +; CHECK-NOT: store
>>>>>>>>>> +; CHECK: br i1
>>>>>>>>>> define void @runtime_loop_with_enable(i32* nocapture %a, i32 %b) {
>>>>>>>>>> entry:
>>>>>>>>>> %cmp3 = icmp sgt i32 %b, 0
>>>>>>>>>> @@ -328,16 +328,16 @@ for.end:
>>>>>>>>>> ; should be duplicated (original and 3x unrolled).
>>>>>>>>>> ;
>>>>>>>>>> ; CHECK-LABEL: @runtime_loop_with_count3(
>>>>>>>>>> -; CHECK: for.body.prol:
>>>>>>>>>> -; CHECK: store
>>>>>>>>>> -; CHECK-NOT: store
>>>>>>>>>> -; CHECK: br i1
>>>>>>>>>> ; CHECK: for.body
>>>>>>>>>> ; CHECK: store
>>>>>>>>>> ; CHECK: store
>>>>>>>>>> ; CHECK: store
>>>>>>>>>> ; CHECK-NOT: store
>>>>>>>>>> ; CHECK: br i1
>>>>>>>>>> +; CHECK: for.body.epil:
>>>>>>>>>> +; CHECK: store
>>>>>>>>>> +; CHECK-NOT: store
>>>>>>>>>> +; CHECK: br i1
>>>>>>>>>> define void @runtime_loop_with_count3(i32* nocapture %a, i32 %b) {
>>>>>>>>>> entry:
>>>>>>>>>> %cmp3 = icmp sgt i32 %b, 0
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits