[llvm] r271925 - [MBP] Reduce code size by running tail merging in MBP.

Ahmed Bougacha via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 20 13:54:16 PDT 2016


On Thu, Jun 16, 2016 at 10:13 PM, Mikael Holmén
<llvm-commits at lists.llvm.org> wrote:
> Hi,
>
> On 06/16/2016 11:48 PM, Haicheng Wu wrote:
>>
>> Hi Mikael,
>>
>> Any luck of fixing this?
>>
>> I am thinking if SomeVeryCleverCondition = !canFallThrough() is good
>> enouth.  Or more explicitly and safely, SomeVeryCleverCondition =
>> (succ_size() == 1 && !isLayoutSuccessor(TBB))
>
>
> We're thinking along those lines as well. Currently we do
>
>   if (!FallthroughBB && (succ_size() == 1)) {
>
> and I haven't seen any problems with it so far but I'm not sure at all if
> it's safe and solid.

I agree; that probably works, but relying on updateTerminator seems
pretty fragile.  I'm not very familiar with MBP, but a couple ideas:

- Isn't canFallthrough() an approximation of AnalyzeBranch() returning
(!Cond.empty() && TBB && !FBB) ?
  If so, can that be used directly in updateTerminator()?  Seems like
that fixes this problem.

- Alternatively, what do you think of:
  - record AnalyzeBranch() (and canFallthrough()?) results before
splice()ing the block
  - add a variant of updateTerminator that also takes the results ("make it so")
  That would fix the MBP FIXME regarding updateTerminator/AnalyzeBranch.

-Ahmed

> Regards,
>
> Mikael
>
>>
>> Best,
>>
>> Haicheng
>>
>> -----Original Message-----
>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>> Sent: Wednesday, June 15, 2016 5:28 AM
>> To: Haicheng Wu <haicheng at codeaurora.org>; Chandler Carruth
>> <chandlerc at gmail.com>; grosbach at apple.com
>> Cc: 'Karl-Johan Karlsson' <karl-johan.karlsson at ericsson.com>;
>> 'llvm-commits' <llvm-commits at lists.llvm.org>
>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>> merging in MBP.
>>
>> Hi Chandler, Jim, Haicheng,
>>
>> (Adding Chandler Carruth and Jim Grosbach who seems to have fixed a
>> somewhat similar case in updateTerminator years ago.)
>>
>> On 06/14/2016 11:24 PM, Haicheng Wu wrote:
>>>
>>> >From your analysis, it seems updateTerminator() misses a case.  But I am
>>> surprised that it does not happen in other backends.  It would be ideal if
>>> you could reproduce it in other backend that is in tree.
>>
>>
>> Yes it surely would. Reproducing bugs far into the backend on other
>> targets is quite tricky though and we haven't mananged yet with this one :/
>>
>> Chandler, Jim, do you have any opinion about this?
>>
>> The problem we're seeing is:
>>
>> MachineBasicBlock::updateTerminator fails to fix a case where
>> MachineBlockPlacement has reordered blocks so we have
>>
>> BB#2: derived from LLVM BB %.split.us.split.us
>>       Live Ins: %a0_32 %af1 %CCReg %af1
>>       Predecessors according to CFG: BB#1
>>           brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>       Successors according to CFG: BB#3(0x80000000 / 0x80000000 = 100.00%)
>>
>> BB#6: derived from LLVM BB %.split
>>       Live Ins: %CCReg %af1
>>       Predecessors according to CFG: BB#0
>>           brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>       Successors according to CFG: BB#7(0x40000000 / 0x80000000 = 50.00%)
>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>
>> and then
>>
>> BB#3: derived from LLVM BB %bb1.us.us.us
>>       Live Ins: %a0_32 %af1
>>       Predecessors according to CFG: BB#2 BB#10
>>       %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg, pred:0,
>> %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>       brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>       Successors according to CFG: BB#5(0x30000000 / 0x80000000 = 37.50%)
>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>
>> Before moving BBs around, BB#2 fallthrough'd to BB#3, and
>> MachineBlockPlacement expects updateTerminator to insert a branch from
>> BB#2 to BB#3 for the old fallthrough edge, but it doesn't.
>>
>> Note that there is also a conditional branch from BB#2 to BB#3,
>> complicating the situation since the fallthrough bb and the conditional jump
>> target are the same.
>>
>> AnalyzeBranch sets TBB to BB#3, FBB to nullptr and Cond to the condition
>> used in the brr_cond in BB#2, so that seems reasonable to me.
>>
>> In updateTerminator, when searching for FallthroughBB it doesn't find
>> anything since the fallthrough bb is the same as TBB. And then
>> canFallThrough() returns false, because BB#6 is not in the successor list
>> of BB#2.
>>
>> Then TBB is not a layout successor and we test
>> !isLayoutSuccessor(FallthroughBB) which happens to return true when
>> FallthroughBB is nullptr, so we enter the last else if which removes our
>> conditional branch and inserts an identical one.
>>
>> To fix BB#2 I think we would like to do
>>
>>     if (!FallthroughBB && SomeVeryCleverCondition) {
>>       // We did fallthrough to the same basic block as the conditional
>> jump
>>       // targets but since canFallThrough() said false, our previous
>>       // fallthrough successor BB has been moved away.
>>       // Remove the conditional jump, leaving unconditional fallthrough.
>>       TII->RemoveBranch(*this);
>>
>>       // Finally update the unconditional successor to be reached via a
>> branch if
>>       // it would not be reached by fallthrough.
>>       if (!isLayoutSuccessor(TBB)) {
>>         ArrayRef<MachineOperand> EmptyCond;
>>         TII->InsertBranch(*this, TBB, nullptr, EmptyCond, DL);
>>       }
>>       return;
>>     }
>>
>> but the question is what SomeVeryCleverCondition should look like to catch
>> this and only this case and not mess up something else.
>>
>> Any thoughts appreciated.
>>
>> Thanks,
>> Mikael
>>
>>>
>>> Best,
>>>
>>> Haicheng
>>>
>>> -----Original Message-----
>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>> Sent: Tuesday, June 14, 2016 12:56 PM
>>> To: Haicheng Wu <haicheng at codeaurora.org>
>>> Cc: Karl-Johan Karlsson <karl-johan.karlsson at ericsson.com>;
>>> llvm-commits <llvm-commits at lists.llvm.org>
>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>>> merging in MBP.
>>>
>>> Hi,
>>>
>>> Karl-Johan got it right.
>>>
>>> TBB is BB#3, FBB is nullptr, and Cond is the condition set on the
>>> brr_cond instruction ending BB#2.
>>>
>>> And this makes updateTerminator pass everything until
>>>
>>>      // The block has a fallthrough conditional branch.
>>>      if (isLayoutSuccessor(TBB)) {
>>>        if (TII->ReverseBranchCondition(Cond)) {
>>>          // We can't reverse the condition, add an unconditional branch.
>>>          Cond.clear();
>>>          TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>>>          return;
>>>        }
>>>        TII->RemoveBranch(*this);
>>>        TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>>>      } else if (!isLayoutSuccessor(FallthroughBB)) {
>>>        TII->RemoveBranch(*this);
>>>        TII->InsertBranch(*this, TBB, FallthroughBB, Cond, DL);
>>>      }
>>>
>>> where we enter the else branch (FallthroughBB is nullptr), remove the
>>> conditiopnal branch, but then insert a new (identical) conditional branch.
>>>
>>> Regards,
>>> Mikael
>>>
>>> On 06/14/2016 06:08 PM, Karl-Johan Karlsson wrote:
>>>>
>>>> Hi!
>>>>
>>>> Mikael went home from work before your mail arrived. He will be back
>>>> tomorrow. I hope I have recreated the problem as Mikael described it
>>>> below. From my debugger session I printed the following information
>>>> about TBB, FBB and Cond before updateTerminator() is called on BB#2.
>>>>
>>>> (gdb) p PrevBB->dump()
>>>> $39 = void
>>>> (gdb)
>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>        Live Ins: %a0_32 %af1 %CCReg %af1
>>>>        Predecessors according to CFG: BB#1
>>>>        brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>        Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>> 100.00%)
>>>> (gdb) P TBB
>>>> $40 = (llvm::MachineBasicBlock *) 0x28e02d8
>>>> (gdb) P TBB->dump()
>>>> $41 = void
>>>> (gdb)
>>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>>        Live Ins: %a0_32 %af1
>>>>        Predecessors according to CFG: BB#2 BB#10
>>>>        %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg,
>>>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>>        brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>>        Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>>>> 37.50%)
>>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>> (gdb) P FBB
>>>> $42 = (llvm::MachineBasicBlock *) 0x0
>>>> (gdb)
>>>> (gdb) P Cond.empty()
>>>> $44 = false
>>>> (gdb)
>>>>
>>>> If I have got any information wrong Mikael will correct me with a
>>>> mail tomorrow.
>>>>
>>>> Regards
>>>> / Karl-Johan Karlsson
>>>>
>>>>
>>>> On 2016-06-14 17:07, Haicheng Wu wrote:
>>>>>
>>>>> Hi Mikael,
>>>>>
>>>>>    From your dumped information, I think the problem is either in
>>>>> AnalyzeBranch() or updateTerminator().  What is the TBB, FBB, and
>>>>> Cond returned by AnalyzeBranch() when it is called on BB#2 before
>>>>> calling updateTerminator()?
>>>>>
>>>>> Best,
>>>>>
>>>>> Haicheng
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>>>> Sent: Tuesday, June 14, 2016 7:54 AM
>>>>> To: Haicheng Wu <haicheng at codeaurora.org>
>>>>> Cc: llvm-commits <llvm-commits at lists.llvm.org>; Karl-Johan Karlsson
>>>>> <karl-johan.karlsson at ericsson.com>
>>>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>>>>> merging in MBP.
>>>>>
>>>>> Hi,
>>>>>
>>>>> We're seeing a problem on our out-of-tree backend with this patch
>>>>> applied. I think the problem is either in
>>>>> MachineBasicBlock::updateTerminator() or how it's used but I'm not
>>>>> sure who's responsibility it is to fix the situation.
>>>>>
>>>>> I've tried to reproduce it on in-tree backends but I've failed on
>>>>> X86 and Arm :/
>>>>>
>>>>> Anyway, the fault we see is that after MachineBlockPlacement we get
>>>>> two BBs like this:
>>>>>
>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>>         Live Ins: %a0_32 %af1 %CCReg %af1
>>>>>         Predecessors according to CFG: BB#1
>>>>>             brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>>         Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>> 100.00%)
>>>>>
>>>>> BB#6: derived from LLVM BB %.split
>>>>>         Live Ins: %CCReg %af1
>>>>>         Predecessors according to CFG: BB#0
>>>>>             brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>>>         Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>>>> 50.00%)
>>>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>>>
>>>>> So, looking at the instructions of BB#2 it looks like it does
>>>>> fallthrough to BB#6 (only one conditional branch to BB#3) but the
>>>>> CFG says only BB#3 is successor.
>>>>>
>>>>> And the verifier complains about this:
>>>>>
>>>>> *** Bad machine code: MBB exits via conditional branch/fall-through
>>>>> but only has one CFG successor! ***
>>>>> - function:    f2
>>>>> - basic block: BB#2 .split.us.split.us (0x492c1f8) LLVM ERROR: Found
>>>>> 1 machine code errors.
>>>>>
>>>>> At some point during MachineBlockPlacement we have
>>>>>
>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>>         Live Ins: %a0_32 %af1 %CCReg %af1
>>>>>         Predecessors according to CFG: BB#1
>>>>>       brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>>         Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>> 100.00%)
>>>>>
>>>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>>>         Live Ins: %a0_32 %af1
>>>>>         Predecessors according to CFG: BB#2 BB#10
>>>>>       %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg,
>>>>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>>>       brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>>>         Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>>>>> 37.50%)
>>>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>>>
>>>>> Note that BB#2 both does a conditional jump AND falltrhoughs to BB#3.
>>>>>
>>>>> Then it's decided by MachineBlockPlacement::buildChain that BB#6 is
>>>>> a better layout successor to BB#2 than BB#3, and we get:
>>>>>
>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>>         Live Ins: %a0_32 %af1 %CCReg %af1
>>>>>         Predecessors according to CFG: BB#1
>>>>>       brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>>         Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>> 100.00%)
>>>>>
>>>>> BB#6: derived from LLVM BB %.split
>>>>>         Live Ins: %CCReg %af1
>>>>>         Predecessors according to CFG: BB#0
>>>>>       brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>>>         Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>>>> 50.00%)
>>>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>>>
>>>>> Then in MachineBlockPlacement::buildCFGChains we have
>>>>>
>>>>>         if (!TII->AnalyzeBranch(*PrevBB, TBB, FBB, Cond)) {
>>>>>           PrevBB->updateTerminator();
>>>>>
>>>>> which is called on BB#2, and from what I understand the point is
>>>>> that updateTerminator should insert an unconditional branch from
>>>>> BB#2 to
>>>>> BB#3 now that there is no fallthrough anymore, but it doesn't and
>>>>> the code is left broken.
>>>>>
>>>>> What's your thought's on this Haicheng, who should fix BB#2 in this
>>>>> case?
>>>>>
>>>>> Thanks,
>>>>> Mikael
>>>>>
>>>>>
>>>>> On 06/06/2016 08:36 PM, Haicheng Wu via llvm-commits wrote:
>>>>>>
>>>>>> Author: haicheng
>>>>>> Date: Mon Jun  6 13:36:07 2016
>>>>>> New Revision: 271925
>>>>>>
>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=271925&view=rev
>>>>>> Log:
>>>>>> [MBP] Reduce code size by running tail merging in MBP.
>>>>>>
>>>>>> The code layout that TailMerging (inside BranchFolding) works on is
>>>>>> not the final layout optimized based on the branch probability.
>>>>>> Generally, after BlockPlacement, many new merging opportunities
>>>>>> emerge.
>>>>>>
>>>>>> This patch calls Tail Merging after MBP and calls MBP again if Tail
>>>>>> Merging merges anything.
>>>>>>
>>>>>> Differential Revision: http://reviews.llvm.org/D20276
>>>>>>
>>>>>> Added:
>>>>>>         llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>>>> Modified:
>>>>>>         llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>>>>         llvm/trunk/lib/CodeGen/BranchFolding.h
>>>>>>         llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>>>>         llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>>>>
>>>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>>>> l di ng.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.cpp (original)
>>>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.cpp Mon Jun  6 13:36:07
>>>>>> +++ 2016
>>>>>> @@ -27,6 +27,7 @@
>>>>>>      #include "llvm/CodeGen/MachineFunctionPass.h"
>>>>>>      #include "llvm/CodeGen/MachineJumpTableInfo.h"
>>>>>>      #include "llvm/CodeGen/MachineMemOperand.h"
>>>>>> +#include "llvm/CodeGen/MachineLoopInfo.h"
>>>>>>      #include "llvm/CodeGen/MachineModuleInfo.h"
>>>>>>      #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>>>>      #include "llvm/CodeGen/Passes.h"
>>>>>> @@ -137,6 +138,8 @@ void BranchFolder::RemoveDeadBlock(Machi
>>>>>>        // Remove the block.
>>>>>>        MF->erase(MBB);
>>>>>>        FuncletMembership.erase(MBB);
>>>>>> +  if (MLI)
>>>>>> +    MLI->removeBlock(MBB);
>>>>>>      }
>>>>>>
>>>>>>      /// OptimizeImpDefsBlock - If a basic block is just a bunch of
>>>>>> implicit_def @@ -193,18 +196,22 @@ bool
>>>>>> BranchFolder::OptimizeImpDefsBlock(
>>>>>>      }
>>>>>>
>>>>>>      /// OptimizeFunction - Perhaps branch folding, tail merging and
>>>>>> other -/// CFG optimizations on the given function.
>>>>>> +/// CFG optimizations on the given function.  Block placement
>>>>>> +changes the layout /// and may create new tail merging opportunities.
>>>>>>      bool BranchFolder::OptimizeFunction(MachineFunction &MF,
>>>>>>                                          const TargetInstrInfo *tii,
>>>>>>                                          const TargetRegisterInfo
>>>>>> *tri,
>>>>>> -                                    MachineModuleInfo *mmi) {
>>>>>> +                                    MachineModuleInfo *mmi,
>>>>>> +                                    MachineLoopInfo *mli, bool
>>>>>> + AfterPlacement) {
>>>>>>        if (!tii) return false;
>>>>>>
>>>>>>        TriedMerging.clear();
>>>>>>
>>>>>> +  AfterBlockPlacement = AfterPlacement;
>>>>>>        TII = tii;
>>>>>>        TRI = tri;
>>>>>>        MMI = mmi;
>>>>>> +  MLI = mli;
>>>>>>        RS = nullptr;
>>>>>>
>>>>>>        // Use a RegScavenger to help update liveness when required.
>>>>>> @@ -230,7 +237,10 @@ bool BranchFolder::OptimizeFunction(Mach
>>>>>>        bool MadeChangeThisIteration = true;
>>>>>>        while (MadeChangeThisIteration) {
>>>>>>          MadeChangeThisIteration    = TailMergeBlocks(MF);
>>>>>> -    MadeChangeThisIteration   |= OptimizeBranches(MF);
>>>>>> +    // No need to clean up if tail merging does not change
>>>>>> + anything
>>>>>> after the
>>>>>> +    // block placement.
>>>>>> +    if (!AfterBlockPlacement || MadeChangeThisIteration)
>>>>>> +      MadeChangeThisIteration |= OptimizeBranches(MF);
>>>>>>          if (EnableHoistCommonCode)
>>>>>>            MadeChangeThisIteration |= HoistCommonCode(MF);
>>>>>>          MadeChange |= MadeChangeThisIteration; @@ -447,6 +457,11 @@
>>>>>> MachineBasicBlock *BranchFolder::SplitMB
>>>>>>        // Splice the code over.
>>>>>>        NewMBB->splice(NewMBB->end(), &CurMBB, BBI1, CurMBB.end());
>>>>>>
>>>>>> +  // NewMBB belongs to the same loop as CurMBB.
>>>>>> +  if (MLI)
>>>>>> +    if (MachineLoop *ML = MLI->getLoopFor(&CurMBB))
>>>>>> +      ML->addBasicBlockToLoop(NewMBB, MLI->getBase());
>>>>>> +
>>>>>>        // NewMBB inherits CurMBB's block frequency.
>>>>>>        MBBFreqInfo.setBlockFreq(NewMBB,
>>>>>> MBBFreqInfo.getBlockFreq(&CurMBB));
>>>>>>
>>>>>> @@ -934,23 +949,27 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>>>        if (!EnableTailMerge) return MadeChange;
>>>>>>
>>>>>>        // First find blocks with no successors.
>>>>>> -  MergePotentials.clear();
>>>>>> -  for (MachineBasicBlock &MBB : MF) {
>>>>>> +  // Block placement does not create new tail merging
>>>>>> + opportunities for these  // blocks.
>>>>>> +  if (!AfterBlockPlacement) {
>>>>>> +    MergePotentials.clear();
>>>>>> +    for (MachineBasicBlock &MBB : MF) {
>>>>>> +      if (MergePotentials.size() == TailMergeThreshold)
>>>>>> +        break;
>>>>>> +      if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>>>> +
>>>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>>>> &MBB));
>>>>>> +    }
>>>>>> +
>>>>>> +    // If this is a large problem, avoid visiting the same basic
>>>>>> blocks
>>>>>> +    // multiple times.
>>>>>>          if (MergePotentials.size() == TailMergeThreshold)
>>>>>> -      break;
>>>>>> -    if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>>>> -
>>>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>>>> &MBB));
>>>>>> -  }
>>>>>> +      for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>>>> +        TriedMerging.insert(MergePotentials[i].getBlock());
>>>>>>
>>>>>> -  // If this is a large problem, avoid visiting the same basic
>>>>>> blocks
>>>>>> -  // multiple times.
>>>>>> -  if (MergePotentials.size() == TailMergeThreshold)
>>>>>> -    for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>>>> -      TriedMerging.insert(MergePotentials[i].getBlock());
>>>>>> -
>>>>>> -  // See if we can do any tail merging on those.
>>>>>> -  if (MergePotentials.size() >= 2)
>>>>>> -    MadeChange |= TryTailMergeBlocks(nullptr, nullptr);
>>>>>> +    // See if we can do any tail merging on those.
>>>>>> +    if (MergePotentials.size() >= 2)
>>>>>> +      MadeChange |= TryTailMergeBlocks(nullptr, nullptr);  }
>>>>>>
>>>>>>        // Look at blocks (IBB) with multiple predecessors (PBB).
>>>>>>        // We change each predecessor to a canonical form, by @@
>>>>>> -997,6
>>>>>> +1016,17 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>>>            if (PBB->hasEHPadSuccessor())
>>>>>>              continue;
>>>>>>
>>>>>> +      // Bail out if the loop header (IBB) is not the top of the
>>>>>> loop chain
>>>>>> +      // after the block placement.  Otherwise, the common tail of
>>>>>> IBB's
>>>>>> +      // predecessors may become the loop top if block placement
>>>>>> + is
>>>>>> called again
>>>>>> +      // and the predecessors may branch to this common tail.
>>>>>> +      // FIXME: Relaxed this check if the algorithm of finding
>>>>>> + loop
>>>>>> top is
>>>>>> +      // changed in MBP.
>>>>>> +      if (AfterBlockPlacement && MLI)
>>>>>> +        if (MachineLoop *ML = MLI->getLoopFor(IBB))
>>>>>> +          if (IBB == ML->getHeader() && ML == MLI->getLoopFor(PBB))
>>>>>> +            continue;
>>>>>> +
>>>>>>            MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
>>>>>>            SmallVector<MachineOperand, 4> Cond;
>>>>>>            if (!TII->AnalyzeBranch(*PBB, TBB, FBB, Cond, true)) {
>>>>>>
>>>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.h
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>>>> l di ng.h?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.h (original)
>>>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.h Mon Jun  6 13:36:07 2016
>>>>>> @@ -20,6 +20,7 @@ namespace llvm {
>>>>>>        class MachineBranchProbabilityInfo;
>>>>>>        class MachineFunction;
>>>>>>        class MachineModuleInfo;
>>>>>> +  class MachineLoopInfo;
>>>>>>        class RegScavenger;
>>>>>>        class TargetInstrInfo;
>>>>>>        class TargetRegisterInfo;
>>>>>> @@ -32,10 +33,11 @@ namespace llvm {
>>>>>>                                MBFIWrapper &MBFI,
>>>>>>                                const MachineBranchProbabilityInfo
>>>>>> &MBPI);
>>>>>>
>>>>>> -    bool OptimizeFunction(MachineFunction &MF,
>>>>>> -                          const TargetInstrInfo *tii,
>>>>>> -                          const TargetRegisterInfo *tri,
>>>>>> -                          MachineModuleInfo *mmi);
>>>>>> +    bool OptimizeFunction(MachineFunction &MF, const
>>>>>> + TargetInstrInfo
>>>>>> *tii,
>>>>>> +                          const TargetRegisterInfo *tri,
>>>>>> MachineModuleInfo *mmi,
>>>>>> +                          MachineLoopInfo *mli = nullptr,
>>>>>> +                          bool AfterPlacement = false);
>>>>>> +
>>>>>>        private:
>>>>>>          class MergePotentialsElt {
>>>>>>            unsigned Hash;
>>>>>> @@ -93,11 +95,13 @@ namespace llvm {
>>>>>>          };
>>>>>>          std::vector<SameTailElt> SameTails;
>>>>>>
>>>>>> +    bool AfterBlockPlacement;
>>>>>>          bool EnableTailMerge;
>>>>>>          bool EnableHoistCommonCode;
>>>>>>          const TargetInstrInfo *TII;
>>>>>>          const TargetRegisterInfo *TRI;
>>>>>>          MachineModuleInfo *MMI;
>>>>>> +    MachineLoopInfo *MLI;
>>>>>>          RegScavenger *RS;
>>>>>>
>>>>>>        public:
>>>>>>
>>>>>> Modified: llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineB
>>>>>> l oc kPlacement.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp (original)
>>>>>> +++ llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp Mon Jun  6
>>>>>> +++ 13:36:07 2016
>>>>>> @@ -26,6 +26,8 @@
>>>>>>
>>>>>> //===--------------------------------------------------------------
>>>>>> -
>>>>>> --
>>>>>> -----===//
>>>>>>
>>>>>>      #include "llvm/CodeGen/Passes.h"
>>>>>> +#include "llvm/CodeGen/TargetPassConfig.h"
>>>>>> +#include "BranchFolding.h"
>>>>>>      #include "llvm/ADT/DenseMap.h"
>>>>>>      #include "llvm/ADT/SmallPtrSet.h"
>>>>>>      #include "llvm/ADT/SmallVector.h"
>>>>>> @@ -116,6 +118,12 @@ static cl::opt<unsigned> JumpInstCost("j
>>>>>>                                            cl::desc("Cost of jump
>>>>>> instructions."),
>>>>>>                                            cl::init(1), cl::Hidden);
>>>>>>
>>>>>> +static cl::opt<bool>
>>>>>> +BranchFoldPlacement("branch-fold-placement",
>>>>>> +              cl::desc("Perform branch folding during placement. "
>>>>>> +                       "Reduces code size."),
>>>>>> +              cl::init(true), cl::Hidden);
>>>>>> +
>>>>>>      extern cl::opt<unsigned> StaticLikelyProb;
>>>>>>
>>>>>>      namespace {
>>>>>> @@ -232,10 +240,10 @@ class MachineBlockPlacement : public Mac
>>>>>>        const MachineBranchProbabilityInfo *MBPI;
>>>>>>
>>>>>>        /// \brief A handle to the function-wide block frequency pass.
>>>>>> -  const MachineBlockFrequencyInfo *MBFI;
>>>>>> +  std::unique_ptr<BranchFolder::MBFIWrapper> MBFI;
>>>>>>
>>>>>>        /// \brief A handle to the loop info.
>>>>>> -  const MachineLoopInfo *MLI;
>>>>>> +  MachineLoopInfo *MLI;
>>>>>>
>>>>>>        /// \brief A handle to the target's instruction info.
>>>>>>        const TargetInstrInfo *TII;
>>>>>> @@ -323,6 +331,7 @@ public:
>>>>>>          AU.addRequired<MachineBlockFrequencyInfo>();
>>>>>>          AU.addRequired<MachineDominatorTree>();
>>>>>>          AU.addRequired<MachineLoopInfo>();
>>>>>> +    AU.addRequired<TargetPassConfig>();
>>>>>>          MachineFunctionPass::getAnalysisUsage(AU);
>>>>>>        }
>>>>>>      };
>>>>>> @@ -1462,7 +1471,8 @@ bool MachineBlockPlacement::runOnMachine
>>>>>>          return false;
>>>>>>
>>>>>>        MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
>>>>>> -  MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
>>>>>> +  MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
>>>>>> +      getAnalysis<MachineBlockFrequencyInfo>());
>>>>>>        MLI = &getAnalysis<MachineLoopInfo>();
>>>>>>        TII = F.getSubtarget().getInstrInfo();
>>>>>>        TLI = F.getSubtarget().getTargetLowering();
>>>>>> @@ -1470,6 +1480,29 @@ bool MachineBlockPlacement::runOnMachine
>>>>>>        assert(BlockToChain.empty());
>>>>>>
>>>>>>        buildCFGChains(F);
>>>>>> +
>>>>>> +  // Changing the layout can create new tail merging opportunities.
>>>>>> +  TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();
>>>>>> + // TailMerge can create jump into if branches that make CFG
>>>>>> + irreducible for  // HW that requires structurized CFG.
>>>>>> +  bool EnableTailMerge = !F.getTarget().requiresStructuredCFG() &&
>>>>>> +                         PassConfig->getEnableTailMerge() &&
>>>>>> +                         BranchFoldPlacement;  // No tail merging
>>>>>> + opportunities if the block number is less than four.
>>>>>> +  if (F.size() > 3 && EnableTailMerge) {
>>>>>> +    BranchFolder BF(/*EnableTailMerge=*/true,
>>>>>> + /*CommonHoist=*/false,
>>>>>> *MBFI,
>>>>>> +                    *MBPI);
>>>>>> +
>>>>>> +    if (BF.OptimizeFunction(F, TII,
>>>>>> + F.getSubtarget().getRegisterInfo(),
>>>>>> +
>>>>>> getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
>>>>>> +                            /*AfterBlockPlacement=*/true)) {
>>>>>> +      // Redo the layout if tail merging creates/removes/moves
>>>>>> blocks.
>>>>>> +      BlockToChain.clear();
>>>>>> +      ChainAllocator.DestroyAll();
>>>>>> +      buildCFGChains(F);
>>>>>> +    }
>>>>>> +  }
>>>>>> +
>>>>>>        optimizeBranches(F);
>>>>>>        alignBlocks(F);
>>>>>>
>>>>>>
>>>>>> Added: llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64
>>>>>> / ta ilmerging_in_mbp.ll?rev=271925&view=auto
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll (added)
>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll Mon Jun
>>>>>> +++ 6
>>>>>> +++ 13:36:07 2016
>>>>>> @@ -0,0 +1,63 @@
>>>>>> +; RUN: llc <%s -march=aarch64 | FileCheck %s
>>>>>> +
>>>>>> +; CHECK-LABEL: test:
>>>>>> +; CHECK:       .LBB0_7
>>>>>> +; CHECK:         b.hi    .LBB0_2
>>>>>> +; CHECK-NEXT:    b    .LBB0_9
>>>>>> +; CHECK-NEXT:  .LBB0_8
>>>>>> +; CHECK-NEXT:    mov     x8, x9
>>>>>> +; CHECK-NEXT:  .LBB0_9
>>>>>> +define i64 @test(i64 %n, i64* %a, i64* %b, i64* %c, i64* %d, i64*
>>>>>> +%e,
>>>>>> +i64* %f) {
>>>>>> +entry:
>>>>>> +  %cmp28 = icmp sgt i64 %n, 1
>>>>>> +  br i1 %cmp28, label %for.body, label %for.end
>>>>>> +
>>>>>> +for.body:                                         ; preds =
>>>>>> %for.body.lr.ph, %if.end
>>>>>> +  %j = phi i64 [ %n, %entry ], [ %div, %if.end ]  %div = lshr i64
>>>>>> + %j, 1  %a.arrayidx = getelementptr inbounds i64, i64* %a, i64
>>>>>> + %div %a.j = load i64, i64* %a.arrayidx  %b.arrayidx =
>>>>>> + getelementptr inbounds i64, i64* %b, i64 %div  %b.j = load i64,
>>>>>> + i64* %b.arrayidx %cmp.i = icmp slt i64 %a.j, %b.j  br i1 %cmp.i,
>>>>>> + label %for.end.loopexit, label %cond.false.i
>>>>>> +
>>>>>> +cond.false.i:                                     ; preds = %for.body
>>>>>> +  %cmp4.i = icmp sgt i64 %a.j, %b.j
>>>>>> +  br i1 %cmp4.i, label %if.end, label %cond.false6.i
>>>>>> +
>>>>>> +cond.false6.i:                                    ; preds =
>>>>>> %cond.false.i
>>>>>> +  %c.arrayidx = getelementptr inbounds i64, i64* %c, i64 %div
>>>>>> + %c.j = load i64, i64* %c.arrayidx  %d.arrayidx = getelementptr
>>>>>> + inbounds i64, i64* %d, i64 %div  %d.j = load i64, i64*
>>>>>> + %d.arrayidx  %cmp9.i = icmp slt i64 %c.j, %d.j  br i1 %cmp9.i,
>>>>>> + label %for.end.loopexit, label %cond.false11.i
>>>>>> +
>>>>>> +cond.false11.i:                                   ; preds =
>>>>>> %cond.false6.i
>>>>>> +  %cmp14.i = icmp sgt i64 %c.j, %d.j  br i1 %cmp14.i, label
>>>>>> + %if.end, label %cond.false12.i
>>>>>> +
>>>>>> +cond.false12.i:                           ; preds = %cond.false11.i
>>>>>> +  %e.arrayidx = getelementptr inbounds i64, i64* %e, i64 %div
>>>>>> +  %e.j = load i64, i64* %e.arrayidx
>>>>>> +  %f.arrayidx = getelementptr inbounds i64, i64* %f, i64 %div
>>>>>> +  %f.j = load i64, i64* %f.arrayidx
>>>>>> +  %cmp19.i = icmp sgt i64 %e.j, %f.j
>>>>>> +  br i1 %cmp19.i, label %if.end, label %for.end.loopexit
>>>>>> +
>>>>>> +if.end:                                           ; preds =
>>>>>> %cond.false12.i, %cond.false11.i, %cond.false.i
>>>>>> +  %cmp = icmp ugt i64 %j, 3
>>>>>> +  br i1 %cmp, label %for.body, label %for.end.loopexit
>>>>>> +
>>>>>> +for.end.loopexit:                                 ; preds =
>>>>>> %cond.false12.i, %cond.false6.i, %for.body, %if.end
>>>>>> +  %j.0.lcssa.ph = phi i64 [ %j, %cond.false12.i ], [ %j,
>>>>>> +%cond.false6.i ], [ %j, %for.body ], [ %div, %if.end ]
>>>>>> +  br label %for.end
>>>>>> +
>>>>>> +for.end:                                          ; preds =
>>>>>> %for.end.loopexit, %entry
>>>>>> +  %j.0.lcssa = phi i64 [ %n, %entry ], [ %j.0.lcssa.ph,
>>>>>> +%for.end.loopexit ]
>>>>>> +  %j.2 = add i64 %j.0.lcssa, %n
>>>>>> +  %j.3 = mul i64 %j.2, %n
>>>>>> +  %j.4 = add i64 %j.3, 10
>>>>>> +  ret i64 %j.4
>>>>>> +}
>>>>>>
>>>>>> Modified: llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/arm
>>>>>> - an d-tst-peephole.ll?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll (original)
>>>>>> +++ llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll Mon Jun  6
>>>>>> +++ 13:36:07 2016
>>>>>> @@ -49,7 +49,7 @@ tailrecurse.switch:
>>>>>>      ; V8-NEXT: beq
>>>>>>      ; V8-NEXT: %tailrecurse.switch
>>>>>>      ; V8: cmp
>>>>>> -; V8-NEXT: bne
>>>>>> +; V8-NEXT: beq
>>>>>>      ; V8-NEXT: b
>>>>>>      ; The trailing space in the last line checks that the branch is
>>>>>> unconditional
>>>>>>        switch i32 %and, label %sw.epilog [
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list