[llvm] r271925 - [MBP] Reduce code size by running tail merging in MBP.

Ahmed Bougacha via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 20 14:04:47 PDT 2016


On Mon, Jun 20, 2016 at 1:54 PM, Ahmed Bougacha
<ahmed.bougacha at gmail.com> wrote:
> On Thu, Jun 16, 2016 at 10:13 PM, Mikael Holmén
> <llvm-commits at lists.llvm.org> wrote:
>> Hi,
>>
>> On 06/16/2016 11:48 PM, Haicheng Wu wrote:
>>>
>>> Hi Mikael,
>>>
>>> Any luck of fixing this?
>>>
>>> I am thinking if SomeVeryCleverCondition = !canFallThrough() is good
>>> enouth.  Or more explicitly and safely, SomeVeryCleverCondition =
>>> (succ_size() == 1 && !isLayoutSuccessor(TBB))
>>
>>
>> We're thinking along those lines as well. Currently we do
>>
>>   if (!FallthroughBB && (succ_size() == 1)) {
>>
>> and I haven't seen any problems with it so far but I'm not sure at all if
>> it's safe and solid.
>
> I agree; that probably works, but relying on updateTerminator seems
> pretty fragile.  I'm not very familiar with MBP, but a couple ideas:
>
> - Isn't canFallthrough() an approximation of AnalyzeBranch() returning
> (!Cond.empty() && TBB && !FBB) ?
>   If so, can that be used directly in updateTerminator()?  Seems like
> that fixes this problem.

Or perhaps add some sort of "TrustCFG" flag to canFallThrough,
bypassing isSuccessor(), and use that in updateTerminator() ?  We're
really asking if the block alone can fallthrough, not whether it does
fall through in its current parent function. Seems like a sensible
distinction?

-Ahmed

> - Alternatively, what do you think of:
>   - record AnalyzeBranch() (and canFallthrough()?) results before
> splice()ing the block
>   - add a variant of updateTerminator that also takes the results ("make it so")
>   That would fix the MBP FIXME regarding updateTerminator/AnalyzeBranch.
>
> -Ahmed
>
>> Regards,
>>
>> Mikael
>>
>>>
>>> Best,
>>>
>>> Haicheng
>>>
>>> -----Original Message-----
>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>> Sent: Wednesday, June 15, 2016 5:28 AM
>>> To: Haicheng Wu <haicheng at codeaurora.org>; Chandler Carruth
>>> <chandlerc at gmail.com>; grosbach at apple.com
>>> Cc: 'Karl-Johan Karlsson' <karl-johan.karlsson at ericsson.com>;
>>> 'llvm-commits' <llvm-commits at lists.llvm.org>
>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>>> merging in MBP.
>>>
>>> Hi Chandler, Jim, Haicheng,
>>>
>>> (Adding Chandler Carruth and Jim Grosbach who seems to have fixed a
>>> somewhat similar case in updateTerminator years ago.)
>>>
>>> On 06/14/2016 11:24 PM, Haicheng Wu wrote:
>>>>
>>>> >From your analysis, it seems updateTerminator() misses a case.  But I am
>>>> surprised that it does not happen in other backends.  It would be ideal if
>>>> you could reproduce it in other backend that is in tree.
>>>
>>>
>>> Yes it surely would. Reproducing bugs far into the backend on other
>>> targets is quite tricky though and we haven't mananged yet with this one :/
>>>
>>> Chandler, Jim, do you have any opinion about this?
>>>
>>> The problem we're seeing is:
>>>
>>> MachineBasicBlock::updateTerminator fails to fix a case where
>>> MachineBlockPlacement has reordered blocks so we have
>>>
>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>       Live Ins: %a0_32 %af1 %CCReg %af1
>>>       Predecessors according to CFG: BB#1
>>>           brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>       Successors according to CFG: BB#3(0x80000000 / 0x80000000 = 100.00%)
>>>
>>> BB#6: derived from LLVM BB %.split
>>>       Live Ins: %CCReg %af1
>>>       Predecessors according to CFG: BB#0
>>>           brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>       Successors according to CFG: BB#7(0x40000000 / 0x80000000 = 50.00%)
>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>
>>> and then
>>>
>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>       Live Ins: %a0_32 %af1
>>>       Predecessors according to CFG: BB#2 BB#10
>>>       %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg, pred:0,
>>> %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>       brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>       Successors according to CFG: BB#5(0x30000000 / 0x80000000 = 37.50%)
>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>
>>> Before moving BBs around, BB#2 fallthrough'd to BB#3, and
>>> MachineBlockPlacement expects updateTerminator to insert a branch from
>>> BB#2 to BB#3 for the old fallthrough edge, but it doesn't.
>>>
>>> Note that there is also a conditional branch from BB#2 to BB#3,
>>> complicating the situation since the fallthrough bb and the conditional jump
>>> target are the same.
>>>
>>> AnalyzeBranch sets TBB to BB#3, FBB to nullptr and Cond to the condition
>>> used in the brr_cond in BB#2, so that seems reasonable to me.
>>>
>>> In updateTerminator, when searching for FallthroughBB it doesn't find
>>> anything since the fallthrough bb is the same as TBB. And then
>>> canFallThrough() returns false, because BB#6 is not in the successor list
>>> of BB#2.
>>>
>>> Then TBB is not a layout successor and we test
>>> !isLayoutSuccessor(FallthroughBB) which happens to return true when
>>> FallthroughBB is nullptr, so we enter the last else if which removes our
>>> conditional branch and inserts an identical one.
>>>
>>> To fix BB#2 I think we would like to do
>>>
>>>     if (!FallthroughBB && SomeVeryCleverCondition) {
>>>       // We did fallthrough to the same basic block as the conditional
>>> jump
>>>       // targets but since canFallThrough() said false, our previous
>>>       // fallthrough successor BB has been moved away.
>>>       // Remove the conditional jump, leaving unconditional fallthrough.
>>>       TII->RemoveBranch(*this);
>>>
>>>       // Finally update the unconditional successor to be reached via a
>>> branch if
>>>       // it would not be reached by fallthrough.
>>>       if (!isLayoutSuccessor(TBB)) {
>>>         ArrayRef<MachineOperand> EmptyCond;
>>>         TII->InsertBranch(*this, TBB, nullptr, EmptyCond, DL);
>>>       }
>>>       return;
>>>     }
>>>
>>> but the question is what SomeVeryCleverCondition should look like to catch
>>> this and only this case and not mess up something else.
>>>
>>> Any thoughts appreciated.
>>>
>>> Thanks,
>>> Mikael
>>>
>>>>
>>>> Best,
>>>>
>>>> Haicheng
>>>>
>>>> -----Original Message-----
>>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>>> Sent: Tuesday, June 14, 2016 12:56 PM
>>>> To: Haicheng Wu <haicheng at codeaurora.org>
>>>> Cc: Karl-Johan Karlsson <karl-johan.karlsson at ericsson.com>;
>>>> llvm-commits <llvm-commits at lists.llvm.org>
>>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>>>> merging in MBP.
>>>>
>>>> Hi,
>>>>
>>>> Karl-Johan got it right.
>>>>
>>>> TBB is BB#3, FBB is nullptr, and Cond is the condition set on the
>>>> brr_cond instruction ending BB#2.
>>>>
>>>> And this makes updateTerminator pass everything until
>>>>
>>>>      // The block has a fallthrough conditional branch.
>>>>      if (isLayoutSuccessor(TBB)) {
>>>>        if (TII->ReverseBranchCondition(Cond)) {
>>>>          // We can't reverse the condition, add an unconditional branch.
>>>>          Cond.clear();
>>>>          TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>>>>          return;
>>>>        }
>>>>        TII->RemoveBranch(*this);
>>>>        TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>>>>      } else if (!isLayoutSuccessor(FallthroughBB)) {
>>>>        TII->RemoveBranch(*this);
>>>>        TII->InsertBranch(*this, TBB, FallthroughBB, Cond, DL);
>>>>      }
>>>>
>>>> where we enter the else branch (FallthroughBB is nullptr), remove the
>>>> conditiopnal branch, but then insert a new (identical) conditional branch.
>>>>
>>>> Regards,
>>>> Mikael
>>>>
>>>> On 06/14/2016 06:08 PM, Karl-Johan Karlsson wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> Mikael went home from work before your mail arrived. He will be back
>>>>> tomorrow. I hope I have recreated the problem as Mikael described it
>>>>> below. From my debugger session I printed the following information
>>>>> about TBB, FBB and Cond before updateTerminator() is called on BB#2.
>>>>>
>>>>> (gdb) p PrevBB->dump()
>>>>> $39 = void
>>>>> (gdb)
>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>>        Live Ins: %a0_32 %af1 %CCReg %af1
>>>>>        Predecessors according to CFG: BB#1
>>>>>        brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>>        Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>> 100.00%)
>>>>> (gdb) P TBB
>>>>> $40 = (llvm::MachineBasicBlock *) 0x28e02d8
>>>>> (gdb) P TBB->dump()
>>>>> $41 = void
>>>>> (gdb)
>>>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>>>        Live Ins: %a0_32 %af1
>>>>>        Predecessors according to CFG: BB#2 BB#10
>>>>>        %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg,
>>>>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>>>        brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>>>        Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>>>>> 37.50%)
>>>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>>> (gdb) P FBB
>>>>> $42 = (llvm::MachineBasicBlock *) 0x0
>>>>> (gdb)
>>>>> (gdb) P Cond.empty()
>>>>> $44 = false
>>>>> (gdb)
>>>>>
>>>>> If I have got any information wrong Mikael will correct me with a
>>>>> mail tomorrow.
>>>>>
>>>>> Regards
>>>>> / Karl-Johan Karlsson
>>>>>
>>>>>
>>>>> On 2016-06-14 17:07, Haicheng Wu wrote:
>>>>>>
>>>>>> Hi Mikael,
>>>>>>
>>>>>>    From your dumped information, I think the problem is either in
>>>>>> AnalyzeBranch() or updateTerminator().  What is the TBB, FBB, and
>>>>>> Cond returned by AnalyzeBranch() when it is called on BB#2 before
>>>>>> calling updateTerminator()?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Haicheng
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>>>>> Sent: Tuesday, June 14, 2016 7:54 AM
>>>>>> To: Haicheng Wu <haicheng at codeaurora.org>
>>>>>> Cc: llvm-commits <llvm-commits at lists.llvm.org>; Karl-Johan Karlsson
>>>>>> <karl-johan.karlsson at ericsson.com>
>>>>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>>>>>> merging in MBP.
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We're seeing a problem on our out-of-tree backend with this patch
>>>>>> applied. I think the problem is either in
>>>>>> MachineBasicBlock::updateTerminator() or how it's used but I'm not
>>>>>> sure who's responsibility it is to fix the situation.
>>>>>>
>>>>>> I've tried to reproduce it on in-tree backends but I've failed on
>>>>>> X86 and Arm :/
>>>>>>
>>>>>> Anyway, the fault we see is that after MachineBlockPlacement we get
>>>>>> two BBs like this:
>>>>>>
>>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>>>         Live Ins: %a0_32 %af1 %CCReg %af1
>>>>>>         Predecessors according to CFG: BB#1
>>>>>>             brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>>>         Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>>> 100.00%)
>>>>>>
>>>>>> BB#6: derived from LLVM BB %.split
>>>>>>         Live Ins: %CCReg %af1
>>>>>>         Predecessors according to CFG: BB#0
>>>>>>             brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>>>>         Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>>>>> 50.00%)
>>>>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>>>>
>>>>>> So, looking at the instructions of BB#2 it looks like it does
>>>>>> fallthrough to BB#6 (only one conditional branch to BB#3) but the
>>>>>> CFG says only BB#3 is successor.
>>>>>>
>>>>>> And the verifier complains about this:
>>>>>>
>>>>>> *** Bad machine code: MBB exits via conditional branch/fall-through
>>>>>> but only has one CFG successor! ***
>>>>>> - function:    f2
>>>>>> - basic block: BB#2 .split.us.split.us (0x492c1f8) LLVM ERROR: Found
>>>>>> 1 machine code errors.
>>>>>>
>>>>>> At some point during MachineBlockPlacement we have
>>>>>>
>>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>>>         Live Ins: %a0_32 %af1 %CCReg %af1
>>>>>>         Predecessors according to CFG: BB#1
>>>>>>       brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>>>         Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>>> 100.00%)
>>>>>>
>>>>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>>>>         Live Ins: %a0_32 %af1
>>>>>>         Predecessors according to CFG: BB#2 BB#10
>>>>>>       %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg,
>>>>>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>>>>       brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>>>>         Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>>>>>> 37.50%)
>>>>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>>>>
>>>>>> Note that BB#2 both does a conditional jump AND falltrhoughs to BB#3.
>>>>>>
>>>>>> Then it's decided by MachineBlockPlacement::buildChain that BB#6 is
>>>>>> a better layout successor to BB#2 than BB#3, and we get:
>>>>>>
>>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>>>         Live Ins: %a0_32 %af1 %CCReg %af1
>>>>>>         Predecessors according to CFG: BB#1
>>>>>>       brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>>>         Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>>> 100.00%)
>>>>>>
>>>>>> BB#6: derived from LLVM BB %.split
>>>>>>         Live Ins: %CCReg %af1
>>>>>>         Predecessors according to CFG: BB#0
>>>>>>       brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>>>>         Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>>>>> 50.00%)
>>>>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>>>>
>>>>>> Then in MachineBlockPlacement::buildCFGChains we have
>>>>>>
>>>>>>         if (!TII->AnalyzeBranch(*PrevBB, TBB, FBB, Cond)) {
>>>>>>           PrevBB->updateTerminator();
>>>>>>
>>>>>> which is called on BB#2, and from what I understand the point is
>>>>>> that updateTerminator should insert an unconditional branch from
>>>>>> BB#2 to
>>>>>> BB#3 now that there is no fallthrough anymore, but it doesn't and
>>>>>> the code is left broken.
>>>>>>
>>>>>> What's your thought's on this Haicheng, who should fix BB#2 in this
>>>>>> case?
>>>>>>
>>>>>> Thanks,
>>>>>> Mikael
>>>>>>
>>>>>>
>>>>>> On 06/06/2016 08:36 PM, Haicheng Wu via llvm-commits wrote:
>>>>>>>
>>>>>>> Author: haicheng
>>>>>>> Date: Mon Jun  6 13:36:07 2016
>>>>>>> New Revision: 271925
>>>>>>>
>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=271925&view=rev
>>>>>>> Log:
>>>>>>> [MBP] Reduce code size by running tail merging in MBP.
>>>>>>>
>>>>>>> The code layout that TailMerging (inside BranchFolding) works on is
>>>>>>> not the final layout optimized based on the branch probability.
>>>>>>> Generally, after BlockPlacement, many new merging opportunities
>>>>>>> emerge.
>>>>>>>
>>>>>>> This patch calls Tail Merging after MBP and calls MBP again if Tail
>>>>>>> Merging merges anything.
>>>>>>>
>>>>>>> Differential Revision: http://reviews.llvm.org/D20276
>>>>>>>
>>>>>>> Added:
>>>>>>>         llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>>>>> Modified:
>>>>>>>         llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>>>>>         llvm/trunk/lib/CodeGen/BranchFolding.h
>>>>>>>         llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>>>>>         llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>>>>>
>>>>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>>>>> URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>>>>> l di ng.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>>>>> ===================================================================
>>>>>>> =
>>>>>>> ==
>>>>>>> ========
>>>>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.cpp (original)
>>>>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.cpp Mon Jun  6 13:36:07
>>>>>>> +++ 2016
>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>      #include "llvm/CodeGen/MachineFunctionPass.h"
>>>>>>>      #include "llvm/CodeGen/MachineJumpTableInfo.h"
>>>>>>>      #include "llvm/CodeGen/MachineMemOperand.h"
>>>>>>> +#include "llvm/CodeGen/MachineLoopInfo.h"
>>>>>>>      #include "llvm/CodeGen/MachineModuleInfo.h"
>>>>>>>      #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>>>>>      #include "llvm/CodeGen/Passes.h"
>>>>>>> @@ -137,6 +138,8 @@ void BranchFolder::RemoveDeadBlock(Machi
>>>>>>>        // Remove the block.
>>>>>>>        MF->erase(MBB);
>>>>>>>        FuncletMembership.erase(MBB);
>>>>>>> +  if (MLI)
>>>>>>> +    MLI->removeBlock(MBB);
>>>>>>>      }
>>>>>>>
>>>>>>>      /// OptimizeImpDefsBlock - If a basic block is just a bunch of
>>>>>>> implicit_def @@ -193,18 +196,22 @@ bool
>>>>>>> BranchFolder::OptimizeImpDefsBlock(
>>>>>>>      }
>>>>>>>
>>>>>>>      /// OptimizeFunction - Perhaps branch folding, tail merging and
>>>>>>> other -/// CFG optimizations on the given function.
>>>>>>> +/// CFG optimizations on the given function.  Block placement
>>>>>>> +changes the layout /// and may create new tail merging opportunities.
>>>>>>>      bool BranchFolder::OptimizeFunction(MachineFunction &MF,
>>>>>>>                                          const TargetInstrInfo *tii,
>>>>>>>                                          const TargetRegisterInfo
>>>>>>> *tri,
>>>>>>> -                                    MachineModuleInfo *mmi) {
>>>>>>> +                                    MachineModuleInfo *mmi,
>>>>>>> +                                    MachineLoopInfo *mli, bool
>>>>>>> + AfterPlacement) {
>>>>>>>        if (!tii) return false;
>>>>>>>
>>>>>>>        TriedMerging.clear();
>>>>>>>
>>>>>>> +  AfterBlockPlacement = AfterPlacement;
>>>>>>>        TII = tii;
>>>>>>>        TRI = tri;
>>>>>>>        MMI = mmi;
>>>>>>> +  MLI = mli;
>>>>>>>        RS = nullptr;
>>>>>>>
>>>>>>>        // Use a RegScavenger to help update liveness when required.
>>>>>>> @@ -230,7 +237,10 @@ bool BranchFolder::OptimizeFunction(Mach
>>>>>>>        bool MadeChangeThisIteration = true;
>>>>>>>        while (MadeChangeThisIteration) {
>>>>>>>          MadeChangeThisIteration    = TailMergeBlocks(MF);
>>>>>>> -    MadeChangeThisIteration   |= OptimizeBranches(MF);
>>>>>>> +    // No need to clean up if tail merging does not change
>>>>>>> + anything
>>>>>>> after the
>>>>>>> +    // block placement.
>>>>>>> +    if (!AfterBlockPlacement || MadeChangeThisIteration)
>>>>>>> +      MadeChangeThisIteration |= OptimizeBranches(MF);
>>>>>>>          if (EnableHoistCommonCode)
>>>>>>>            MadeChangeThisIteration |= HoistCommonCode(MF);
>>>>>>>          MadeChange |= MadeChangeThisIteration; @@ -447,6 +457,11 @@
>>>>>>> MachineBasicBlock *BranchFolder::SplitMB
>>>>>>>        // Splice the code over.
>>>>>>>        NewMBB->splice(NewMBB->end(), &CurMBB, BBI1, CurMBB.end());
>>>>>>>
>>>>>>> +  // NewMBB belongs to the same loop as CurMBB.
>>>>>>> +  if (MLI)
>>>>>>> +    if (MachineLoop *ML = MLI->getLoopFor(&CurMBB))
>>>>>>> +      ML->addBasicBlockToLoop(NewMBB, MLI->getBase());
>>>>>>> +
>>>>>>>        // NewMBB inherits CurMBB's block frequency.
>>>>>>>        MBBFreqInfo.setBlockFreq(NewMBB,
>>>>>>> MBBFreqInfo.getBlockFreq(&CurMBB));
>>>>>>>
>>>>>>> @@ -934,23 +949,27 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>>>>        if (!EnableTailMerge) return MadeChange;
>>>>>>>
>>>>>>>        // First find blocks with no successors.
>>>>>>> -  MergePotentials.clear();
>>>>>>> -  for (MachineBasicBlock &MBB : MF) {
>>>>>>> +  // Block placement does not create new tail merging
>>>>>>> + opportunities for these  // blocks.
>>>>>>> +  if (!AfterBlockPlacement) {
>>>>>>> +    MergePotentials.clear();
>>>>>>> +    for (MachineBasicBlock &MBB : MF) {
>>>>>>> +      if (MergePotentials.size() == TailMergeThreshold)
>>>>>>> +        break;
>>>>>>> +      if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>>>>> +
>>>>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>>>>> &MBB));
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    // If this is a large problem, avoid visiting the same basic
>>>>>>> blocks
>>>>>>> +    // multiple times.
>>>>>>>          if (MergePotentials.size() == TailMergeThreshold)
>>>>>>> -      break;
>>>>>>> -    if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>>>>> -
>>>>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>>>>> &MBB));
>>>>>>> -  }
>>>>>>> +      for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>>>>> +        TriedMerging.insert(MergePotentials[i].getBlock());
>>>>>>>
>>>>>>> -  // If this is a large problem, avoid visiting the same basic
>>>>>>> blocks
>>>>>>> -  // multiple times.
>>>>>>> -  if (MergePotentials.size() == TailMergeThreshold)
>>>>>>> -    for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>>>>> -      TriedMerging.insert(MergePotentials[i].getBlock());
>>>>>>> -
>>>>>>> -  // See if we can do any tail merging on those.
>>>>>>> -  if (MergePotentials.size() >= 2)
>>>>>>> -    MadeChange |= TryTailMergeBlocks(nullptr, nullptr);
>>>>>>> +    // See if we can do any tail merging on those.
>>>>>>> +    if (MergePotentials.size() >= 2)
>>>>>>> +      MadeChange |= TryTailMergeBlocks(nullptr, nullptr);  }
>>>>>>>
>>>>>>>        // Look at blocks (IBB) with multiple predecessors (PBB).
>>>>>>>        // We change each predecessor to a canonical form, by @@
>>>>>>> -997,6
>>>>>>> +1016,17 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>>>>            if (PBB->hasEHPadSuccessor())
>>>>>>>              continue;
>>>>>>>
>>>>>>> +      // Bail out if the loop header (IBB) is not the top of the
>>>>>>> loop chain
>>>>>>> +      // after the block placement.  Otherwise, the common tail of
>>>>>>> IBB's
>>>>>>> +      // predecessors may become the loop top if block placement
>>>>>>> + is
>>>>>>> called again
>>>>>>> +      // and the predecessors may branch to this common tail.
>>>>>>> +      // FIXME: Relaxed this check if the algorithm of finding
>>>>>>> + loop
>>>>>>> top is
>>>>>>> +      // changed in MBP.
>>>>>>> +      if (AfterBlockPlacement && MLI)
>>>>>>> +        if (MachineLoop *ML = MLI->getLoopFor(IBB))
>>>>>>> +          if (IBB == ML->getHeader() && ML == MLI->getLoopFor(PBB))
>>>>>>> +            continue;
>>>>>>> +
>>>>>>>            MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
>>>>>>>            SmallVector<MachineOperand, 4> Cond;
>>>>>>>            if (!TII->AnalyzeBranch(*PBB, TBB, FBB, Cond, true)) {
>>>>>>>
>>>>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.h
>>>>>>> URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>>>>> l di ng.h?rev=271925&r1=271924&r2=271925&view=diff
>>>>>>> ===================================================================
>>>>>>> =
>>>>>>> ==
>>>>>>> ========
>>>>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.h (original)
>>>>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.h Mon Jun  6 13:36:07 2016
>>>>>>> @@ -20,6 +20,7 @@ namespace llvm {
>>>>>>>        class MachineBranchProbabilityInfo;
>>>>>>>        class MachineFunction;
>>>>>>>        class MachineModuleInfo;
>>>>>>> +  class MachineLoopInfo;
>>>>>>>        class RegScavenger;
>>>>>>>        class TargetInstrInfo;
>>>>>>>        class TargetRegisterInfo;
>>>>>>> @@ -32,10 +33,11 @@ namespace llvm {
>>>>>>>                                MBFIWrapper &MBFI,
>>>>>>>                                const MachineBranchProbabilityInfo
>>>>>>> &MBPI);
>>>>>>>
>>>>>>> -    bool OptimizeFunction(MachineFunction &MF,
>>>>>>> -                          const TargetInstrInfo *tii,
>>>>>>> -                          const TargetRegisterInfo *tri,
>>>>>>> -                          MachineModuleInfo *mmi);
>>>>>>> +    bool OptimizeFunction(MachineFunction &MF, const
>>>>>>> + TargetInstrInfo
>>>>>>> *tii,
>>>>>>> +                          const TargetRegisterInfo *tri,
>>>>>>> MachineModuleInfo *mmi,
>>>>>>> +                          MachineLoopInfo *mli = nullptr,
>>>>>>> +                          bool AfterPlacement = false);
>>>>>>> +
>>>>>>>        private:
>>>>>>>          class MergePotentialsElt {
>>>>>>>            unsigned Hash;
>>>>>>> @@ -93,11 +95,13 @@ namespace llvm {
>>>>>>>          };
>>>>>>>          std::vector<SameTailElt> SameTails;
>>>>>>>
>>>>>>> +    bool AfterBlockPlacement;
>>>>>>>          bool EnableTailMerge;
>>>>>>>          bool EnableHoistCommonCode;
>>>>>>>          const TargetInstrInfo *TII;
>>>>>>>          const TargetRegisterInfo *TRI;
>>>>>>>          MachineModuleInfo *MMI;
>>>>>>> +    MachineLoopInfo *MLI;
>>>>>>>          RegScavenger *RS;
>>>>>>>
>>>>>>>        public:
>>>>>>>
>>>>>>> Modified: llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>>>>> URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineB
>>>>>>> l oc kPlacement.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>>>>> ===================================================================
>>>>>>> =
>>>>>>> ==
>>>>>>> ========
>>>>>>> --- llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp (original)
>>>>>>> +++ llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp Mon Jun  6
>>>>>>> +++ 13:36:07 2016
>>>>>>> @@ -26,6 +26,8 @@
>>>>>>>
>>>>>>> //===--------------------------------------------------------------
>>>>>>> -
>>>>>>> --
>>>>>>> -----===//
>>>>>>>
>>>>>>>      #include "llvm/CodeGen/Passes.h"
>>>>>>> +#include "llvm/CodeGen/TargetPassConfig.h"
>>>>>>> +#include "BranchFolding.h"
>>>>>>>      #include "llvm/ADT/DenseMap.h"
>>>>>>>      #include "llvm/ADT/SmallPtrSet.h"
>>>>>>>      #include "llvm/ADT/SmallVector.h"
>>>>>>> @@ -116,6 +118,12 @@ static cl::opt<unsigned> JumpInstCost("j
>>>>>>>                                            cl::desc("Cost of jump
>>>>>>> instructions."),
>>>>>>>                                            cl::init(1), cl::Hidden);
>>>>>>>
>>>>>>> +static cl::opt<bool>
>>>>>>> +BranchFoldPlacement("branch-fold-placement",
>>>>>>> +              cl::desc("Perform branch folding during placement. "
>>>>>>> +                       "Reduces code size."),
>>>>>>> +              cl::init(true), cl::Hidden);
>>>>>>> +
>>>>>>>      extern cl::opt<unsigned> StaticLikelyProb;
>>>>>>>
>>>>>>>      namespace {
>>>>>>> @@ -232,10 +240,10 @@ class MachineBlockPlacement : public Mac
>>>>>>>        const MachineBranchProbabilityInfo *MBPI;
>>>>>>>
>>>>>>>        /// \brief A handle to the function-wide block frequency pass.
>>>>>>> -  const MachineBlockFrequencyInfo *MBFI;
>>>>>>> +  std::unique_ptr<BranchFolder::MBFIWrapper> MBFI;
>>>>>>>
>>>>>>>        /// \brief A handle to the loop info.
>>>>>>> -  const MachineLoopInfo *MLI;
>>>>>>> +  MachineLoopInfo *MLI;
>>>>>>>
>>>>>>>        /// \brief A handle to the target's instruction info.
>>>>>>>        const TargetInstrInfo *TII;
>>>>>>> @@ -323,6 +331,7 @@ public:
>>>>>>>          AU.addRequired<MachineBlockFrequencyInfo>();
>>>>>>>          AU.addRequired<MachineDominatorTree>();
>>>>>>>          AU.addRequired<MachineLoopInfo>();
>>>>>>> +    AU.addRequired<TargetPassConfig>();
>>>>>>>          MachineFunctionPass::getAnalysisUsage(AU);
>>>>>>>        }
>>>>>>>      };
>>>>>>> @@ -1462,7 +1471,8 @@ bool MachineBlockPlacement::runOnMachine
>>>>>>>          return false;
>>>>>>>
>>>>>>>        MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
>>>>>>> -  MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
>>>>>>> +  MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
>>>>>>> +      getAnalysis<MachineBlockFrequencyInfo>());
>>>>>>>        MLI = &getAnalysis<MachineLoopInfo>();
>>>>>>>        TII = F.getSubtarget().getInstrInfo();
>>>>>>>        TLI = F.getSubtarget().getTargetLowering();
>>>>>>> @@ -1470,6 +1480,29 @@ bool MachineBlockPlacement::runOnMachine
>>>>>>>        assert(BlockToChain.empty());
>>>>>>>
>>>>>>>        buildCFGChains(F);
>>>>>>> +
>>>>>>> +  // Changing the layout can create new tail merging opportunities.
>>>>>>> +  TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();
>>>>>>> + // TailMerge can create jump into if branches that make CFG
>>>>>>> + irreducible for  // HW that requires structurized CFG.
>>>>>>> +  bool EnableTailMerge = !F.getTarget().requiresStructuredCFG() &&
>>>>>>> +                         PassConfig->getEnableTailMerge() &&
>>>>>>> +                         BranchFoldPlacement;  // No tail merging
>>>>>>> + opportunities if the block number is less than four.
>>>>>>> +  if (F.size() > 3 && EnableTailMerge) {
>>>>>>> +    BranchFolder BF(/*EnableTailMerge=*/true,
>>>>>>> + /*CommonHoist=*/false,
>>>>>>> *MBFI,
>>>>>>> +                    *MBPI);
>>>>>>> +
>>>>>>> +    if (BF.OptimizeFunction(F, TII,
>>>>>>> + F.getSubtarget().getRegisterInfo(),
>>>>>>> +
>>>>>>> getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
>>>>>>> +                            /*AfterBlockPlacement=*/true)) {
>>>>>>> +      // Redo the layout if tail merging creates/removes/moves
>>>>>>> blocks.
>>>>>>> +      BlockToChain.clear();
>>>>>>> +      ChainAllocator.DestroyAll();
>>>>>>> +      buildCFGChains(F);
>>>>>>> +    }
>>>>>>> +  }
>>>>>>> +
>>>>>>>        optimizeBranches(F);
>>>>>>>        alignBlocks(F);
>>>>>>>
>>>>>>>
>>>>>>> Added: llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>>>>> URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64
>>>>>>> / ta ilmerging_in_mbp.ll?rev=271925&view=auto
>>>>>>> ===================================================================
>>>>>>> =
>>>>>>> ==
>>>>>>> ========
>>>>>>> --- llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll (added)
>>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll Mon Jun
>>>>>>> +++ 6
>>>>>>> +++ 13:36:07 2016
>>>>>>> @@ -0,0 +1,63 @@
>>>>>>> +; RUN: llc <%s -march=aarch64 | FileCheck %s
>>>>>>> +
>>>>>>> +; CHECK-LABEL: test:
>>>>>>> +; CHECK:       .LBB0_7
>>>>>>> +; CHECK:         b.hi    .LBB0_2
>>>>>>> +; CHECK-NEXT:    b    .LBB0_9
>>>>>>> +; CHECK-NEXT:  .LBB0_8
>>>>>>> +; CHECK-NEXT:    mov     x8, x9
>>>>>>> +; CHECK-NEXT:  .LBB0_9
>>>>>>> +define i64 @test(i64 %n, i64* %a, i64* %b, i64* %c, i64* %d, i64*
>>>>>>> +%e,
>>>>>>> +i64* %f) {
>>>>>>> +entry:
>>>>>>> +  %cmp28 = icmp sgt i64 %n, 1
>>>>>>> +  br i1 %cmp28, label %for.body, label %for.end
>>>>>>> +
>>>>>>> +for.body:                                         ; preds =
>>>>>>> %for.body.lr.ph, %if.end
>>>>>>> +  %j = phi i64 [ %n, %entry ], [ %div, %if.end ]  %div = lshr i64
>>>>>>> + %j, 1  %a.arrayidx = getelementptr inbounds i64, i64* %a, i64
>>>>>>> + %div %a.j = load i64, i64* %a.arrayidx  %b.arrayidx =
>>>>>>> + getelementptr inbounds i64, i64* %b, i64 %div  %b.j = load i64,
>>>>>>> + i64* %b.arrayidx %cmp.i = icmp slt i64 %a.j, %b.j  br i1 %cmp.i,
>>>>>>> + label %for.end.loopexit, label %cond.false.i
>>>>>>> +
>>>>>>> +cond.false.i:                                     ; preds = %for.body
>>>>>>> +  %cmp4.i = icmp sgt i64 %a.j, %b.j
>>>>>>> +  br i1 %cmp4.i, label %if.end, label %cond.false6.i
>>>>>>> +
>>>>>>> +cond.false6.i:                                    ; preds =
>>>>>>> %cond.false.i
>>>>>>> +  %c.arrayidx = getelementptr inbounds i64, i64* %c, i64 %div
>>>>>>> + %c.j = load i64, i64* %c.arrayidx  %d.arrayidx = getelementptr
>>>>>>> + inbounds i64, i64* %d, i64 %div  %d.j = load i64, i64*
>>>>>>> + %d.arrayidx  %cmp9.i = icmp slt i64 %c.j, %d.j  br i1 %cmp9.i,
>>>>>>> + label %for.end.loopexit, label %cond.false11.i
>>>>>>> +
>>>>>>> +cond.false11.i:                                   ; preds =
>>>>>>> %cond.false6.i
>>>>>>> +  %cmp14.i = icmp sgt i64 %c.j, %d.j  br i1 %cmp14.i, label
>>>>>>> + %if.end, label %cond.false12.i
>>>>>>> +
>>>>>>> +cond.false12.i:                           ; preds = %cond.false11.i
>>>>>>> +  %e.arrayidx = getelementptr inbounds i64, i64* %e, i64 %div
>>>>>>> +  %e.j = load i64, i64* %e.arrayidx
>>>>>>> +  %f.arrayidx = getelementptr inbounds i64, i64* %f, i64 %div
>>>>>>> +  %f.j = load i64, i64* %f.arrayidx
>>>>>>> +  %cmp19.i = icmp sgt i64 %e.j, %f.j
>>>>>>> +  br i1 %cmp19.i, label %if.end, label %for.end.loopexit
>>>>>>> +
>>>>>>> +if.end:                                           ; preds =
>>>>>>> %cond.false12.i, %cond.false11.i, %cond.false.i
>>>>>>> +  %cmp = icmp ugt i64 %j, 3
>>>>>>> +  br i1 %cmp, label %for.body, label %for.end.loopexit
>>>>>>> +
>>>>>>> +for.end.loopexit:                                 ; preds =
>>>>>>> %cond.false12.i, %cond.false6.i, %for.body, %if.end
>>>>>>> +  %j.0.lcssa.ph = phi i64 [ %j, %cond.false12.i ], [ %j,
>>>>>>> +%cond.false6.i ], [ %j, %for.body ], [ %div, %if.end ]
>>>>>>> +  br label %for.end
>>>>>>> +
>>>>>>> +for.end:                                          ; preds =
>>>>>>> %for.end.loopexit, %entry
>>>>>>> +  %j.0.lcssa = phi i64 [ %n, %entry ], [ %j.0.lcssa.ph,
>>>>>>> +%for.end.loopexit ]
>>>>>>> +  %j.2 = add i64 %j.0.lcssa, %n
>>>>>>> +  %j.3 = mul i64 %j.2, %n
>>>>>>> +  %j.4 = add i64 %j.3, 10
>>>>>>> +  ret i64 %j.4
>>>>>>> +}
>>>>>>>
>>>>>>> Modified: llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>>>>> URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/arm
>>>>>>> - an d-tst-peephole.ll?rev=271925&r1=271924&r2=271925&view=diff
>>>>>>> ===================================================================
>>>>>>> =
>>>>>>> ==
>>>>>>> ========
>>>>>>> --- llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll (original)
>>>>>>> +++ llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll Mon Jun  6
>>>>>>> +++ 13:36:07 2016
>>>>>>> @@ -49,7 +49,7 @@ tailrecurse.switch:
>>>>>>>      ; V8-NEXT: beq
>>>>>>>      ; V8-NEXT: %tailrecurse.switch
>>>>>>>      ; V8: cmp
>>>>>>> -; V8-NEXT: bne
>>>>>>> +; V8-NEXT: beq
>>>>>>>      ; V8-NEXT: b
>>>>>>>      ; The trailing space in the last line checks that the branch is
>>>>>>> unconditional
>>>>>>>        switch i32 %and, label %sw.epilog [
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list