[llvm] r271925 - [MBP] Reduce code size by running tail merging in MBP.

Haicheng Wu via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 16 14:48:17 PDT 2016


Hi Mikael,

Any luck of fixing this?

I am thinking if SomeVeryCleverCondition = !canFallThrough() is good enouth.  Or more explicitly and safely, SomeVeryCleverCondition = (succ_size() == 1 && !isLayoutSuccessor(TBB))

Best,

Haicheng

-----Original Message-----
From: Mikael Holmén [mailto:mikael.holmen at ericsson.com] 
Sent: Wednesday, June 15, 2016 5:28 AM
To: Haicheng Wu <haicheng at codeaurora.org>; Chandler Carruth <chandlerc at gmail.com>; grosbach at apple.com
Cc: 'Karl-Johan Karlsson' <karl-johan.karlsson at ericsson.com>; 'llvm-commits' <llvm-commits at lists.llvm.org>
Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail merging in MBP.

Hi Chandler, Jim, Haicheng,

(Adding Chandler Carruth and Jim Grosbach who seems to have fixed a somewhat similar case in updateTerminator years ago.)

On 06/14/2016 11:24 PM, Haicheng Wu wrote:
>>From your analysis, it seems updateTerminator() misses a case.  But I am surprised that it does not happen in other backends.  It would be ideal if you could reproduce it in other backend that is in tree.

Yes it surely would. Reproducing bugs far into the backend on other targets is quite tricky though and we haven't mananged yet with this one :/

Chandler, Jim, do you have any opinion about this?

The problem we're seeing is:

MachineBasicBlock::updateTerminator fails to fix a case where MachineBlockPlacement has reordered blocks so we have

BB#2: derived from LLVM BB %.split.us.split.us
     Live Ins: %a0_32 %af1 %CCReg %af1
     Predecessors according to CFG: BB#1
         brr_cond <BB#3>, pred:2, pred:%af1, pred:0
     Successors according to CFG: BB#3(0x80000000 / 0x80000000 = 100.00%)

BB#6: derived from LLVM BB %.split
     Live Ins: %CCReg %af1
     Predecessors according to CFG: BB#0
         brr_cond <BB#13>, pred:3, pred:%af1, pred:0
     Successors according to CFG: BB#7(0x40000000 / 0x80000000 = 50.00%)
BB#13(0x40000000 / 0x80000000 = 50.00%)

and then

BB#3: derived from LLVM BB %bb1.us.us.us
     Live Ins: %a0_32 %af1
     Predecessors according to CFG: BB#2 BB#10
     %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg, pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
     brr_cond <BB#5>, pred:6, pred:%af1, pred:0
     Successors according to CFG: BB#5(0x30000000 / 0x80000000 = 37.50%)
BB#4(0x50000000 / 0x80000000 = 62.50%)

Before moving BBs around, BB#2 fallthrough'd to BB#3, and MachineBlockPlacement expects updateTerminator to insert a branch from
BB#2 to BB#3 for the old fallthrough edge, but it doesn't.

Note that there is also a conditional branch from BB#2 to BB#3, complicating the situation since the fallthrough bb and the conditional jump target are the same.

AnalyzeBranch sets TBB to BB#3, FBB to nullptr and Cond to the condition used in the brr_cond in BB#2, so that seems reasonable to me.

In updateTerminator, when searching for FallthroughBB it doesn't find anything since the fallthrough bb is the same as TBB. And then
canFallThrough() returns false, because BB#6 is not in the successor list of BB#2.

Then TBB is not a layout successor and we test
!isLayoutSuccessor(FallthroughBB) which happens to return true when FallthroughBB is nullptr, so we enter the last else if which removes our conditional branch and inserts an identical one.

To fix BB#2 I think we would like to do

   if (!FallthroughBB && SomeVeryCleverCondition) {
     // We did fallthrough to the same basic block as the conditional jump
     // targets but since canFallThrough() said false, our previous
     // fallthrough successor BB has been moved away.
     // Remove the conditional jump, leaving unconditional fallthrough.
     TII->RemoveBranch(*this);

     // Finally update the unconditional successor to be reached via a branch if
     // it would not be reached by fallthrough.
     if (!isLayoutSuccessor(TBB)) {
       ArrayRef<MachineOperand> EmptyCond;
       TII->InsertBranch(*this, TBB, nullptr, EmptyCond, DL);
     }
     return;
   }

but the question is what SomeVeryCleverCondition should look like to catch this and only this case and not mess up something else.

Any thoughts appreciated.

Thanks,
Mikael

>
> Best,
>
> Haicheng
>
> -----Original Message-----
> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
> Sent: Tuesday, June 14, 2016 12:56 PM
> To: Haicheng Wu <haicheng at codeaurora.org>
> Cc: Karl-Johan Karlsson <karl-johan.karlsson at ericsson.com>; 
> llvm-commits <llvm-commits at lists.llvm.org>
> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail merging in MBP.
>
> Hi,
>
> Karl-Johan got it right.
>
> TBB is BB#3, FBB is nullptr, and Cond is the condition set on the brr_cond instruction ending BB#2.
>
> And this makes updateTerminator pass everything until
>
>     // The block has a fallthrough conditional branch.
>     if (isLayoutSuccessor(TBB)) {
>       if (TII->ReverseBranchCondition(Cond)) {
>         // We can't reverse the condition, add an unconditional branch.
>         Cond.clear();
>         TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>         return;
>       }
>       TII->RemoveBranch(*this);
>       TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>     } else if (!isLayoutSuccessor(FallthroughBB)) {
>       TII->RemoveBranch(*this);
>       TII->InsertBranch(*this, TBB, FallthroughBB, Cond, DL);
>     }
>
> where we enter the else branch (FallthroughBB is nullptr), remove the conditiopnal branch, but then insert a new (identical) conditional branch.
>
> Regards,
> Mikael
>
> On 06/14/2016 06:08 PM, Karl-Johan Karlsson wrote:
>> Hi!
>>
>> Mikael went home from work before your mail arrived. He will be back 
>> tomorrow. I hope I have recreated the problem as Mikael described it 
>> below. From my debugger session I printed the following information 
>> about TBB, FBB and Cond before updateTerminator() is called on BB#2.
>>
>> (gdb) p PrevBB->dump()
>> $39 = void
>> (gdb)
>> BB#2: derived from LLVM BB %.split.us.split.us
>>       Live Ins: %a0_32 %af1 %CCReg %af1
>>       Predecessors according to CFG: BB#1
>>       brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>       Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>> 100.00%)
>> (gdb) P TBB
>> $40 = (llvm::MachineBasicBlock *) 0x28e02d8
>> (gdb) P TBB->dump()
>> $41 = void
>> (gdb)
>> BB#3: derived from LLVM BB %bb1.us.us.us
>>       Live Ins: %a0_32 %af1
>>       Predecessors according to CFG: BB#2 BB#10
>>       %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg, 
>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>       brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>       Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>> 37.50%)
>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>> (gdb) P FBB
>> $42 = (llvm::MachineBasicBlock *) 0x0
>> (gdb)
>> (gdb) P Cond.empty()
>> $44 = false
>> (gdb)
>>
>> If I have got any information wrong Mikael will correct me with a 
>> mail tomorrow.
>>
>> Regards
>> / Karl-Johan Karlsson
>>
>>
>> On 2016-06-14 17:07, Haicheng Wu wrote:
>>> Hi Mikael,
>>>
>>>   From your dumped information, I think the problem is either in
>>> AnalyzeBranch() or updateTerminator().  What is the TBB, FBB, and 
>>> Cond returned by AnalyzeBranch() when it is called on BB#2 before 
>>> calling updateTerminator()?
>>>
>>> Best,
>>>
>>> Haicheng
>>>
>>> -----Original Message-----
>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>> Sent: Tuesday, June 14, 2016 7:54 AM
>>> To: Haicheng Wu <haicheng at codeaurora.org>
>>> Cc: llvm-commits <llvm-commits at lists.llvm.org>; Karl-Johan Karlsson 
>>> <karl-johan.karlsson at ericsson.com>
>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail 
>>> merging in MBP.
>>>
>>> Hi,
>>>
>>> We're seeing a problem on our out-of-tree backend with this patch 
>>> applied. I think the problem is either in
>>> MachineBasicBlock::updateTerminator() or how it's used but I'm not 
>>> sure who's responsibility it is to fix the situation.
>>>
>>> I've tried to reproduce it on in-tree backends but I've failed on 
>>> X86 and Arm :/
>>>
>>> Anyway, the fault we see is that after MachineBlockPlacement we get 
>>> two BBs like this:
>>>
>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>        Live Ins: %a0_32 %af1 %CCReg %af1
>>>        Predecessors according to CFG: BB#1
>>>            brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>        Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>> 100.00%)
>>>
>>> BB#6: derived from LLVM BB %.split
>>>        Live Ins: %CCReg %af1
>>>        Predecessors according to CFG: BB#0
>>>            brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>        Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>> 50.00%)
>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>
>>> So, looking at the instructions of BB#2 it looks like it does 
>>> fallthrough to BB#6 (only one conditional branch to BB#3) but the 
>>> CFG says only BB#3 is successor.
>>>
>>> And the verifier complains about this:
>>>
>>> *** Bad machine code: MBB exits via conditional branch/fall-through 
>>> but only has one CFG successor! ***
>>> - function:    f2
>>> - basic block: BB#2 .split.us.split.us (0x492c1f8) LLVM ERROR: Found
>>> 1 machine code errors.
>>>
>>> At some point during MachineBlockPlacement we have
>>>
>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>        Live Ins: %a0_32 %af1 %CCReg %af1
>>>        Predecessors according to CFG: BB#1
>>>      brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>        Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>> 100.00%)
>>>
>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>        Live Ins: %a0_32 %af1
>>>        Predecessors according to CFG: BB#2 BB#10
>>>      %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg, 
>>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>      brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>        Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>>> 37.50%)
>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>
>>> Note that BB#2 both does a conditional jump AND falltrhoughs to BB#3.
>>>
>>> Then it's decided by MachineBlockPlacement::buildChain that BB#6 is 
>>> a better layout successor to BB#2 than BB#3, and we get:
>>>
>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>        Live Ins: %a0_32 %af1 %CCReg %af1
>>>        Predecessors according to CFG: BB#1
>>>      brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>        Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>> 100.00%)
>>>
>>> BB#6: derived from LLVM BB %.split
>>>        Live Ins: %CCReg %af1
>>>        Predecessors according to CFG: BB#0
>>>      brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>        Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>> 50.00%)
>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>
>>> Then in MachineBlockPlacement::buildCFGChains we have
>>>
>>>        if (!TII->AnalyzeBranch(*PrevBB, TBB, FBB, Cond)) {
>>>          PrevBB->updateTerminator();
>>>
>>> which is called on BB#2, and from what I understand the point is 
>>> that updateTerminator should insert an unconditional branch from 
>>> BB#2 to
>>> BB#3 now that there is no fallthrough anymore, but it doesn't and 
>>> the code is left broken.
>>>
>>> What's your thought's on this Haicheng, who should fix BB#2 in this case?
>>>
>>> Thanks,
>>> Mikael
>>>
>>>
>>> On 06/06/2016 08:36 PM, Haicheng Wu via llvm-commits wrote:
>>>> Author: haicheng
>>>> Date: Mon Jun  6 13:36:07 2016
>>>> New Revision: 271925
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=271925&view=rev
>>>> Log:
>>>> [MBP] Reduce code size by running tail merging in MBP.
>>>>
>>>> The code layout that TailMerging (inside BranchFolding) works on is 
>>>> not the final layout optimized based on the branch probability.
>>>> Generally, after BlockPlacement, many new merging opportunities emerge.
>>>>
>>>> This patch calls Tail Merging after MBP and calls MBP again if Tail 
>>>> Merging merges anything.
>>>>
>>>> Differential Revision: http://reviews.llvm.org/D20276
>>>>
>>>> Added:
>>>>        llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>> Modified:
>>>>        llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>>        llvm/trunk/lib/CodeGen/BranchFolding.h
>>>>        llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>>        llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>>
>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>> l di ng.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>> ===================================================================
>>>> =
>>>> ==
>>>> ========
>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.cpp Mon Jun  6 13:36:07
>>>> +++ 2016
>>>> @@ -27,6 +27,7 @@
>>>>     #include "llvm/CodeGen/MachineFunctionPass.h"
>>>>     #include "llvm/CodeGen/MachineJumpTableInfo.h"
>>>>     #include "llvm/CodeGen/MachineMemOperand.h"
>>>> +#include "llvm/CodeGen/MachineLoopInfo.h"
>>>>     #include "llvm/CodeGen/MachineModuleInfo.h"
>>>>     #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>>     #include "llvm/CodeGen/Passes.h"
>>>> @@ -137,6 +138,8 @@ void BranchFolder::RemoveDeadBlock(Machi
>>>>       // Remove the block.
>>>>       MF->erase(MBB);
>>>>       FuncletMembership.erase(MBB);
>>>> +  if (MLI)
>>>> +    MLI->removeBlock(MBB);
>>>>     }
>>>>
>>>>     /// OptimizeImpDefsBlock - If a basic block is just a bunch of 
>>>> implicit_def @@ -193,18 +196,22 @@ bool 
>>>> BranchFolder::OptimizeImpDefsBlock(
>>>>     }
>>>>
>>>>     /// OptimizeFunction - Perhaps branch folding, tail merging and 
>>>> other -/// CFG optimizations on the given function.
>>>> +/// CFG optimizations on the given function.  Block placement 
>>>> +changes the layout /// and may create new tail merging opportunities.
>>>>     bool BranchFolder::OptimizeFunction(MachineFunction &MF,
>>>>                                         const TargetInstrInfo *tii,
>>>>                                         const TargetRegisterInfo *tri,
>>>> -                                    MachineModuleInfo *mmi) {
>>>> +                                    MachineModuleInfo *mmi,
>>>> +                                    MachineLoopInfo *mli, bool
>>>> + AfterPlacement) {
>>>>       if (!tii) return false;
>>>>
>>>>       TriedMerging.clear();
>>>>
>>>> +  AfterBlockPlacement = AfterPlacement;
>>>>       TII = tii;
>>>>       TRI = tri;
>>>>       MMI = mmi;
>>>> +  MLI = mli;
>>>>       RS = nullptr;
>>>>
>>>>       // Use a RegScavenger to help update liveness when required.
>>>> @@ -230,7 +237,10 @@ bool BranchFolder::OptimizeFunction(Mach
>>>>       bool MadeChangeThisIteration = true;
>>>>       while (MadeChangeThisIteration) {
>>>>         MadeChangeThisIteration    = TailMergeBlocks(MF);
>>>> -    MadeChangeThisIteration   |= OptimizeBranches(MF);
>>>> +    // No need to clean up if tail merging does not change 
>>>> + anything
>>>> after the
>>>> +    // block placement.
>>>> +    if (!AfterBlockPlacement || MadeChangeThisIteration)
>>>> +      MadeChangeThisIteration |= OptimizeBranches(MF);
>>>>         if (EnableHoistCommonCode)
>>>>           MadeChangeThisIteration |= HoistCommonCode(MF);
>>>>         MadeChange |= MadeChangeThisIteration; @@ -447,6 +457,11 @@ 
>>>> MachineBasicBlock *BranchFolder::SplitMB
>>>>       // Splice the code over.
>>>>       NewMBB->splice(NewMBB->end(), &CurMBB, BBI1, CurMBB.end());
>>>>
>>>> +  // NewMBB belongs to the same loop as CurMBB.
>>>> +  if (MLI)
>>>> +    if (MachineLoop *ML = MLI->getLoopFor(&CurMBB))
>>>> +      ML->addBasicBlockToLoop(NewMBB, MLI->getBase());
>>>> +
>>>>       // NewMBB inherits CurMBB's block frequency.
>>>>       MBBFreqInfo.setBlockFreq(NewMBB, 
>>>> MBBFreqInfo.getBlockFreq(&CurMBB));
>>>>
>>>> @@ -934,23 +949,27 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>       if (!EnableTailMerge) return MadeChange;
>>>>
>>>>       // First find blocks with no successors.
>>>> -  MergePotentials.clear();
>>>> -  for (MachineBasicBlock &MBB : MF) {
>>>> +  // Block placement does not create new tail merging 
>>>> + opportunities for these  // blocks.
>>>> +  if (!AfterBlockPlacement) {
>>>> +    MergePotentials.clear();
>>>> +    for (MachineBasicBlock &MBB : MF) {
>>>> +      if (MergePotentials.size() == TailMergeThreshold)
>>>> +        break;
>>>> +      if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>> +
>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>> &MBB));
>>>> +    }
>>>> +
>>>> +    // If this is a large problem, avoid visiting the same basic blocks
>>>> +    // multiple times.
>>>>         if (MergePotentials.size() == TailMergeThreshold)
>>>> -      break;
>>>> -    if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>> -
>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>> &MBB));
>>>> -  }
>>>> +      for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>> +        TriedMerging.insert(MergePotentials[i].getBlock());
>>>>
>>>> -  // If this is a large problem, avoid visiting the same basic 
>>>> blocks
>>>> -  // multiple times.
>>>> -  if (MergePotentials.size() == TailMergeThreshold)
>>>> -    for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>> -      TriedMerging.insert(MergePotentials[i].getBlock());
>>>> -
>>>> -  // See if we can do any tail merging on those.
>>>> -  if (MergePotentials.size() >= 2)
>>>> -    MadeChange |= TryTailMergeBlocks(nullptr, nullptr);
>>>> +    // See if we can do any tail merging on those.
>>>> +    if (MergePotentials.size() >= 2)
>>>> +      MadeChange |= TryTailMergeBlocks(nullptr, nullptr);  }
>>>>
>>>>       // Look at blocks (IBB) with multiple predecessors (PBB).
>>>>       // We change each predecessor to a canonical form, by @@ 
>>>> -997,6
>>>> +1016,17 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>           if (PBB->hasEHPadSuccessor())
>>>>             continue;
>>>>
>>>> +      // Bail out if the loop header (IBB) is not the top of the
>>>> loop chain
>>>> +      // after the block placement.  Otherwise, the common tail of
>>>> IBB's
>>>> +      // predecessors may become the loop top if block placement 
>>>> + is
>>>> called again
>>>> +      // and the predecessors may branch to this common tail.
>>>> +      // FIXME: Relaxed this check if the algorithm of finding 
>>>> + loop
>>>> top is
>>>> +      // changed in MBP.
>>>> +      if (AfterBlockPlacement && MLI)
>>>> +        if (MachineLoop *ML = MLI->getLoopFor(IBB))
>>>> +          if (IBB == ML->getHeader() && ML == MLI->getLoopFor(PBB))
>>>> +            continue;
>>>> +
>>>>           MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
>>>>           SmallVector<MachineOperand, 4> Cond;
>>>>           if (!TII->AnalyzeBranch(*PBB, TBB, FBB, Cond, true)) {
>>>>
>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>> l di ng.h?rev=271925&r1=271924&r2=271925&view=diff
>>>> ===================================================================
>>>> =
>>>> ==
>>>> ========
>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.h (original)
>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.h Mon Jun  6 13:36:07 2016
>>>> @@ -20,6 +20,7 @@ namespace llvm {
>>>>       class MachineBranchProbabilityInfo;
>>>>       class MachineFunction;
>>>>       class MachineModuleInfo;
>>>> +  class MachineLoopInfo;
>>>>       class RegScavenger;
>>>>       class TargetInstrInfo;
>>>>       class TargetRegisterInfo;
>>>> @@ -32,10 +33,11 @@ namespace llvm {
>>>>                               MBFIWrapper &MBFI,
>>>>                               const MachineBranchProbabilityInfo 
>>>> &MBPI);
>>>>
>>>> -    bool OptimizeFunction(MachineFunction &MF,
>>>> -                          const TargetInstrInfo *tii,
>>>> -                          const TargetRegisterInfo *tri,
>>>> -                          MachineModuleInfo *mmi);
>>>> +    bool OptimizeFunction(MachineFunction &MF, const 
>>>> + TargetInstrInfo
>>>> *tii,
>>>> +                          const TargetRegisterInfo *tri,
>>>> MachineModuleInfo *mmi,
>>>> +                          MachineLoopInfo *mli = nullptr,
>>>> +                          bool AfterPlacement = false);
>>>> +
>>>>       private:
>>>>         class MergePotentialsElt {
>>>>           unsigned Hash;
>>>> @@ -93,11 +95,13 @@ namespace llvm {
>>>>         };
>>>>         std::vector<SameTailElt> SameTails;
>>>>
>>>> +    bool AfterBlockPlacement;
>>>>         bool EnableTailMerge;
>>>>         bool EnableHoistCommonCode;
>>>>         const TargetInstrInfo *TII;
>>>>         const TargetRegisterInfo *TRI;
>>>>         MachineModuleInfo *MMI;
>>>> +    MachineLoopInfo *MLI;
>>>>         RegScavenger *RS;
>>>>
>>>>       public:
>>>>
>>>> Modified: llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineB
>>>> l oc kPlacement.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>> ===================================================================
>>>> =
>>>> ==
>>>> ========
>>>> --- llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp Mon Jun  6
>>>> +++ 13:36:07 2016
>>>> @@ -26,6 +26,8 @@
>>>>
>>>> //===--------------------------------------------------------------
>>>> -
>>>> --
>>>> -----===//
>>>>
>>>>     #include "llvm/CodeGen/Passes.h"
>>>> +#include "llvm/CodeGen/TargetPassConfig.h"
>>>> +#include "BranchFolding.h"
>>>>     #include "llvm/ADT/DenseMap.h"
>>>>     #include "llvm/ADT/SmallPtrSet.h"
>>>>     #include "llvm/ADT/SmallVector.h"
>>>> @@ -116,6 +118,12 @@ static cl::opt<unsigned> JumpInstCost("j
>>>>                                           cl::desc("Cost of jump 
>>>> instructions."),
>>>>                                           cl::init(1), cl::Hidden);
>>>>
>>>> +static cl::opt<bool>
>>>> +BranchFoldPlacement("branch-fold-placement",
>>>> +              cl::desc("Perform branch folding during placement. "
>>>> +                       "Reduces code size."),
>>>> +              cl::init(true), cl::Hidden);
>>>> +
>>>>     extern cl::opt<unsigned> StaticLikelyProb;
>>>>
>>>>     namespace {
>>>> @@ -232,10 +240,10 @@ class MachineBlockPlacement : public Mac
>>>>       const MachineBranchProbabilityInfo *MBPI;
>>>>
>>>>       /// \brief A handle to the function-wide block frequency pass.
>>>> -  const MachineBlockFrequencyInfo *MBFI;
>>>> +  std::unique_ptr<BranchFolder::MBFIWrapper> MBFI;
>>>>
>>>>       /// \brief A handle to the loop info.
>>>> -  const MachineLoopInfo *MLI;
>>>> +  MachineLoopInfo *MLI;
>>>>
>>>>       /// \brief A handle to the target's instruction info.
>>>>       const TargetInstrInfo *TII;
>>>> @@ -323,6 +331,7 @@ public:
>>>>         AU.addRequired<MachineBlockFrequencyInfo>();
>>>>         AU.addRequired<MachineDominatorTree>();
>>>>         AU.addRequired<MachineLoopInfo>();
>>>> +    AU.addRequired<TargetPassConfig>();
>>>>         MachineFunctionPass::getAnalysisUsage(AU);
>>>>       }
>>>>     };
>>>> @@ -1462,7 +1471,8 @@ bool MachineBlockPlacement::runOnMachine
>>>>         return false;
>>>>
>>>>       MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
>>>> -  MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
>>>> +  MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
>>>> +      getAnalysis<MachineBlockFrequencyInfo>());
>>>>       MLI = &getAnalysis<MachineLoopInfo>();
>>>>       TII = F.getSubtarget().getInstrInfo();
>>>>       TLI = F.getSubtarget().getTargetLowering();
>>>> @@ -1470,6 +1480,29 @@ bool MachineBlockPlacement::runOnMachine
>>>>       assert(BlockToChain.empty());
>>>>
>>>>       buildCFGChains(F);
>>>> +
>>>> +  // Changing the layout can create new tail merging opportunities.
>>>> +  TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>(); 
>>>> + // TailMerge can create jump into if branches that make CFG 
>>>> + irreducible for  // HW that requires structurized CFG.
>>>> +  bool EnableTailMerge = !F.getTarget().requiresStructuredCFG() &&
>>>> +                         PassConfig->getEnableTailMerge() &&
>>>> +                         BranchFoldPlacement;  // No tail merging 
>>>> + opportunities if the block number is less than four.
>>>> +  if (F.size() > 3 && EnableTailMerge) {
>>>> +    BranchFolder BF(/*EnableTailMerge=*/true, 
>>>> + /*CommonHoist=*/false,
>>>> *MBFI,
>>>> +                    *MBPI);
>>>> +
>>>> +    if (BF.OptimizeFunction(F, TII, 
>>>> + F.getSubtarget().getRegisterInfo(),
>>>> +
>>>> getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
>>>> +                            /*AfterBlockPlacement=*/true)) {
>>>> +      // Redo the layout if tail merging creates/removes/moves blocks.
>>>> +      BlockToChain.clear();
>>>> +      ChainAllocator.DestroyAll();
>>>> +      buildCFGChains(F);
>>>> +    }
>>>> +  }
>>>> +
>>>>       optimizeBranches(F);
>>>>       alignBlocks(F);
>>>>
>>>>
>>>> Added: llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64
>>>> / ta ilmerging_in_mbp.ll?rev=271925&view=auto
>>>> ===================================================================
>>>> =
>>>> ==
>>>> ========
>>>> --- llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll (added)
>>>> +++ llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll Mon Jun  
>>>> +++ 6
>>>> +++ 13:36:07 2016
>>>> @@ -0,0 +1,63 @@
>>>> +; RUN: llc <%s -march=aarch64 | FileCheck %s
>>>> +
>>>> +; CHECK-LABEL: test:
>>>> +; CHECK:       .LBB0_7
>>>> +; CHECK:         b.hi    .LBB0_2
>>>> +; CHECK-NEXT:    b    .LBB0_9
>>>> +; CHECK-NEXT:  .LBB0_8
>>>> +; CHECK-NEXT:    mov     x8, x9
>>>> +; CHECK-NEXT:  .LBB0_9
>>>> +define i64 @test(i64 %n, i64* %a, i64* %b, i64* %c, i64* %d, i64* 
>>>> +%e,
>>>> +i64* %f) {
>>>> +entry:
>>>> +  %cmp28 = icmp sgt i64 %n, 1
>>>> +  br i1 %cmp28, label %for.body, label %for.end
>>>> +
>>>> +for.body:                                         ; preds =
>>>> %for.body.lr.ph, %if.end
>>>> +  %j = phi i64 [ %n, %entry ], [ %div, %if.end ]  %div = lshr i64 
>>>> + %j, 1  %a.arrayidx = getelementptr inbounds i64, i64* %a, i64 
>>>> + %div %a.j = load i64, i64* %a.arrayidx  %b.arrayidx = 
>>>> + getelementptr inbounds i64, i64* %b, i64 %div  %b.j = load i64, 
>>>> + i64* %b.arrayidx %cmp.i = icmp slt i64 %a.j, %b.j  br i1 %cmp.i, 
>>>> + label %for.end.loopexit, label %cond.false.i
>>>> +
>>>> +cond.false.i:                                     ; preds = %for.body
>>>> +  %cmp4.i = icmp sgt i64 %a.j, %b.j
>>>> +  br i1 %cmp4.i, label %if.end, label %cond.false6.i
>>>> +
>>>> +cond.false6.i:                                    ; preds =
>>>> %cond.false.i
>>>> +  %c.arrayidx = getelementptr inbounds i64, i64* %c, i64 %div  
>>>> + %c.j = load i64, i64* %c.arrayidx  %d.arrayidx = getelementptr 
>>>> + inbounds i64, i64* %d, i64 %div  %d.j = load i64, i64* 
>>>> + %d.arrayidx  %cmp9.i = icmp slt i64 %c.j, %d.j  br i1 %cmp9.i, 
>>>> + label %for.end.loopexit, label %cond.false11.i
>>>> +
>>>> +cond.false11.i:                                   ; preds =
>>>> %cond.false6.i
>>>> +  %cmp14.i = icmp sgt i64 %c.j, %d.j  br i1 %cmp14.i, label 
>>>> + %if.end, label %cond.false12.i
>>>> +
>>>> +cond.false12.i:                           ; preds = %cond.false11.i
>>>> +  %e.arrayidx = getelementptr inbounds i64, i64* %e, i64 %div
>>>> +  %e.j = load i64, i64* %e.arrayidx
>>>> +  %f.arrayidx = getelementptr inbounds i64, i64* %f, i64 %div
>>>> +  %f.j = load i64, i64* %f.arrayidx
>>>> +  %cmp19.i = icmp sgt i64 %e.j, %f.j
>>>> +  br i1 %cmp19.i, label %if.end, label %for.end.loopexit
>>>> +
>>>> +if.end:                                           ; preds =
>>>> %cond.false12.i, %cond.false11.i, %cond.false.i
>>>> +  %cmp = icmp ugt i64 %j, 3
>>>> +  br i1 %cmp, label %for.body, label %for.end.loopexit
>>>> +
>>>> +for.end.loopexit:                                 ; preds =
>>>> %cond.false12.i, %cond.false6.i, %for.body, %if.end
>>>> +  %j.0.lcssa.ph = phi i64 [ %j, %cond.false12.i ], [ %j, 
>>>> +%cond.false6.i ], [ %j, %for.body ], [ %div, %if.end ]
>>>> +  br label %for.end
>>>> +
>>>> +for.end:                                          ; preds =
>>>> %for.end.loopexit, %entry
>>>> +  %j.0.lcssa = phi i64 [ %n, %entry ], [ %j.0.lcssa.ph, 
>>>> +%for.end.loopexit ]
>>>> +  %j.2 = add i64 %j.0.lcssa, %n
>>>> +  %j.3 = mul i64 %j.2, %n
>>>> +  %j.4 = add i64 %j.3, 10
>>>> +  ret i64 %j.4
>>>> +}
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/arm
>>>> - an d-tst-peephole.ll?rev=271925&r1=271924&r2=271925&view=diff
>>>> ===================================================================
>>>> =
>>>> ==
>>>> ========
>>>> --- llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll Mon Jun  6
>>>> +++ 13:36:07 2016
>>>> @@ -49,7 +49,7 @@ tailrecurse.switch:
>>>>     ; V8-NEXT: beq
>>>>     ; V8-NEXT: %tailrecurse.switch
>>>>     ; V8: cmp
>>>> -; V8-NEXT: bne
>>>> +; V8-NEXT: beq
>>>>     ; V8-NEXT: b
>>>>     ; The trailing space in the last line checks that the branch is 
>>>> unconditional
>>>>       switch i32 %and, label %sw.epilog [
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>
>
>
>




More information about the llvm-commits mailing list