[llvm] r271925 - [MBP] Reduce code size by running tail merging in MBP.
Ahmed Bougacha via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 20 13:54:16 PDT 2016
On Thu, Jun 16, 2016 at 10:13 PM, Mikael Holmén
<llvm-commits at lists.llvm.org> wrote:
> Hi,
>
> On 06/16/2016 11:48 PM, Haicheng Wu wrote:
>>
>> Hi Mikael,
>>
>> Any luck of fixing this?
>>
>> I am thinking if SomeVeryCleverCondition = !canFallThrough() is good
>> enouth. Or more explicitly and safely, SomeVeryCleverCondition =
>> (succ_size() == 1 && !isLayoutSuccessor(TBB))
>
>
> We're thinking along those lines as well. Currently we do
>
> if (!FallthroughBB && (succ_size() == 1)) {
>
> and I haven't seen any problems with it so far but I'm not sure at all if
> it's safe and solid.
I agree; that probably works, but relying on updateTerminator seems
pretty fragile. I'm not very familiar with MBP, but a couple ideas:
- Isn't canFallthrough() an approximation of AnalyzeBranch() returning
(!Cond.empty() && TBB && !FBB) ?
If so, can that be used directly in updateTerminator()? Seems like
that fixes this problem.
- Alternatively, what do you think of:
- record AnalyzeBranch() (and canFallthrough()?) results before
splice()ing the block
- add a variant of updateTerminator that also takes the results ("make it so")
That would fix the MBP FIXME regarding updateTerminator/AnalyzeBranch.
-Ahmed
> Regards,
>
> Mikael
>
>>
>> Best,
>>
>> Haicheng
>>
>> -----Original Message-----
>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>> Sent: Wednesday, June 15, 2016 5:28 AM
>> To: Haicheng Wu <haicheng at codeaurora.org>; Chandler Carruth
>> <chandlerc at gmail.com>; grosbach at apple.com
>> Cc: 'Karl-Johan Karlsson' <karl-johan.karlsson at ericsson.com>;
>> 'llvm-commits' <llvm-commits at lists.llvm.org>
>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>> merging in MBP.
>>
>> Hi Chandler, Jim, Haicheng,
>>
>> (Adding Chandler Carruth and Jim Grosbach who seems to have fixed a
>> somewhat similar case in updateTerminator years ago.)
>>
>> On 06/14/2016 11:24 PM, Haicheng Wu wrote:
>>>
>>> >From your analysis, it seems updateTerminator() misses a case. But I am
>>> surprised that it does not happen in other backends. It would be ideal if
>>> you could reproduce it in other backend that is in tree.
>>
>>
>> Yes it surely would. Reproducing bugs far into the backend on other
>> targets is quite tricky though and we haven't mananged yet with this one :/
>>
>> Chandler, Jim, do you have any opinion about this?
>>
>> The problem we're seeing is:
>>
>> MachineBasicBlock::updateTerminator fails to fix a case where
>> MachineBlockPlacement has reordered blocks so we have
>>
>> BB#2: derived from LLVM BB %.split.us.split.us
>> Live Ins: %a0_32 %af1 %CCReg %af1
>> Predecessors according to CFG: BB#1
>> brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>> Successors according to CFG: BB#3(0x80000000 / 0x80000000 = 100.00%)
>>
>> BB#6: derived from LLVM BB %.split
>> Live Ins: %CCReg %af1
>> Predecessors according to CFG: BB#0
>> brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>> Successors according to CFG: BB#7(0x40000000 / 0x80000000 = 50.00%)
>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>
>> and then
>>
>> BB#3: derived from LLVM BB %bb1.us.us.us
>> Live Ins: %a0_32 %af1
>> Predecessors according to CFG: BB#2 BB#10
>> %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg, pred:0,
>> %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>> brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>> Successors according to CFG: BB#5(0x30000000 / 0x80000000 = 37.50%)
>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>
>> Before moving BBs around, BB#2 fallthrough'd to BB#3, and
>> MachineBlockPlacement expects updateTerminator to insert a branch from
>> BB#2 to BB#3 for the old fallthrough edge, but it doesn't.
>>
>> Note that there is also a conditional branch from BB#2 to BB#3,
>> complicating the situation since the fallthrough bb and the conditional jump
>> target are the same.
>>
>> AnalyzeBranch sets TBB to BB#3, FBB to nullptr and Cond to the condition
>> used in the brr_cond in BB#2, so that seems reasonable to me.
>>
>> In updateTerminator, when searching for FallthroughBB it doesn't find
>> anything since the fallthrough bb is the same as TBB. And then
>> canFallThrough() returns false, because BB#6 is not in the successor list
>> of BB#2.
>>
>> Then TBB is not a layout successor and we test
>> !isLayoutSuccessor(FallthroughBB) which happens to return true when
>> FallthroughBB is nullptr, so we enter the last else if which removes our
>> conditional branch and inserts an identical one.
>>
>> To fix BB#2 I think we would like to do
>>
>> if (!FallthroughBB && SomeVeryCleverCondition) {
>> // We did fallthrough to the same basic block as the conditional
>> jump
>> // targets but since canFallThrough() said false, our previous
>> // fallthrough successor BB has been moved away.
>> // Remove the conditional jump, leaving unconditional fallthrough.
>> TII->RemoveBranch(*this);
>>
>> // Finally update the unconditional successor to be reached via a
>> branch if
>> // it would not be reached by fallthrough.
>> if (!isLayoutSuccessor(TBB)) {
>> ArrayRef<MachineOperand> EmptyCond;
>> TII->InsertBranch(*this, TBB, nullptr, EmptyCond, DL);
>> }
>> return;
>> }
>>
>> but the question is what SomeVeryCleverCondition should look like to catch
>> this and only this case and not mess up something else.
>>
>> Any thoughts appreciated.
>>
>> Thanks,
>> Mikael
>>
>>>
>>> Best,
>>>
>>> Haicheng
>>>
>>> -----Original Message-----
>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>> Sent: Tuesday, June 14, 2016 12:56 PM
>>> To: Haicheng Wu <haicheng at codeaurora.org>
>>> Cc: Karl-Johan Karlsson <karl-johan.karlsson at ericsson.com>;
>>> llvm-commits <llvm-commits at lists.llvm.org>
>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>>> merging in MBP.
>>>
>>> Hi,
>>>
>>> Karl-Johan got it right.
>>>
>>> TBB is BB#3, FBB is nullptr, and Cond is the condition set on the
>>> brr_cond instruction ending BB#2.
>>>
>>> And this makes updateTerminator pass everything until
>>>
>>> // The block has a fallthrough conditional branch.
>>> if (isLayoutSuccessor(TBB)) {
>>> if (TII->ReverseBranchCondition(Cond)) {
>>> // We can't reverse the condition, add an unconditional branch.
>>> Cond.clear();
>>> TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>>> return;
>>> }
>>> TII->RemoveBranch(*this);
>>> TII->InsertBranch(*this, FallthroughBB, nullptr, Cond, DL);
>>> } else if (!isLayoutSuccessor(FallthroughBB)) {
>>> TII->RemoveBranch(*this);
>>> TII->InsertBranch(*this, TBB, FallthroughBB, Cond, DL);
>>> }
>>>
>>> where we enter the else branch (FallthroughBB is nullptr), remove the
>>> conditiopnal branch, but then insert a new (identical) conditional branch.
>>>
>>> Regards,
>>> Mikael
>>>
>>> On 06/14/2016 06:08 PM, Karl-Johan Karlsson wrote:
>>>>
>>>> Hi!
>>>>
>>>> Mikael went home from work before your mail arrived. He will be back
>>>> tomorrow. I hope I have recreated the problem as Mikael described it
>>>> below. From my debugger session I printed the following information
>>>> about TBB, FBB and Cond before updateTerminator() is called on BB#2.
>>>>
>>>> (gdb) p PrevBB->dump()
>>>> $39 = void
>>>> (gdb)
>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>> Live Ins: %a0_32 %af1 %CCReg %af1
>>>> Predecessors according to CFG: BB#1
>>>> brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>> Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>> 100.00%)
>>>> (gdb) P TBB
>>>> $40 = (llvm::MachineBasicBlock *) 0x28e02d8
>>>> (gdb) P TBB->dump()
>>>> $41 = void
>>>> (gdb)
>>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>> Live Ins: %a0_32 %af1
>>>> Predecessors according to CFG: BB#2 BB#10
>>>> %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg,
>>>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>> brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>> Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>>>> 37.50%)
>>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>> (gdb) P FBB
>>>> $42 = (llvm::MachineBasicBlock *) 0x0
>>>> (gdb)
>>>> (gdb) P Cond.empty()
>>>> $44 = false
>>>> (gdb)
>>>>
>>>> If I have got any information wrong Mikael will correct me with a
>>>> mail tomorrow.
>>>>
>>>> Regards
>>>> / Karl-Johan Karlsson
>>>>
>>>>
>>>> On 2016-06-14 17:07, Haicheng Wu wrote:
>>>>>
>>>>> Hi Mikael,
>>>>>
>>>>> From your dumped information, I think the problem is either in
>>>>> AnalyzeBranch() or updateTerminator(). What is the TBB, FBB, and
>>>>> Cond returned by AnalyzeBranch() when it is called on BB#2 before
>>>>> calling updateTerminator()?
>>>>>
>>>>> Best,
>>>>>
>>>>> Haicheng
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mikael Holmén [mailto:mikael.holmen at ericsson.com]
>>>>> Sent: Tuesday, June 14, 2016 7:54 AM
>>>>> To: Haicheng Wu <haicheng at codeaurora.org>
>>>>> Cc: llvm-commits <llvm-commits at lists.llvm.org>; Karl-Johan Karlsson
>>>>> <karl-johan.karlsson at ericsson.com>
>>>>> Subject: Re: [llvm] r271925 - [MBP] Reduce code size by running tail
>>>>> merging in MBP.
>>>>>
>>>>> Hi,
>>>>>
>>>>> We're seeing a problem on our out-of-tree backend with this patch
>>>>> applied. I think the problem is either in
>>>>> MachineBasicBlock::updateTerminator() or how it's used but I'm not
>>>>> sure who's responsibility it is to fix the situation.
>>>>>
>>>>> I've tried to reproduce it on in-tree backends but I've failed on
>>>>> X86 and Arm :/
>>>>>
>>>>> Anyway, the fault we see is that after MachineBlockPlacement we get
>>>>> two BBs like this:
>>>>>
>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>> Live Ins: %a0_32 %af1 %CCReg %af1
>>>>> Predecessors according to CFG: BB#1
>>>>> brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>> Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>> 100.00%)
>>>>>
>>>>> BB#6: derived from LLVM BB %.split
>>>>> Live Ins: %CCReg %af1
>>>>> Predecessors according to CFG: BB#0
>>>>> brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>>> Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>>>> 50.00%)
>>>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>>>
>>>>> So, looking at the instructions of BB#2 it looks like it does
>>>>> fallthrough to BB#6 (only one conditional branch to BB#3) but the
>>>>> CFG says only BB#3 is successor.
>>>>>
>>>>> And the verifier complains about this:
>>>>>
>>>>> *** Bad machine code: MBB exits via conditional branch/fall-through
>>>>> but only has one CFG successor! ***
>>>>> - function: f2
>>>>> - basic block: BB#2 .split.us.split.us (0x492c1f8) LLVM ERROR: Found
>>>>> 1 machine code errors.
>>>>>
>>>>> At some point during MachineBlockPlacement we have
>>>>>
>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>> Live Ins: %a0_32 %af1 %CCReg %af1
>>>>> Predecessors according to CFG: BB#1
>>>>> brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>> Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>> 100.00%)
>>>>>
>>>>> BB#3: derived from LLVM BB %bb1.us.us.us
>>>>> Live Ins: %a0_32 %af1
>>>>> Predecessors according to CFG: BB#2 BB#10
>>>>> %a1_40<def,dead> = negh_a16_a32 %a0l, pred:0, pred:%noreg,
>>>>> pred:0, %CCReg<imp-def>, %cuc<imp-use>, %af1<imp-def>
>>>>> brr_cond <BB#5>, pred:6, pred:%af1, pred:0
>>>>> Successors according to CFG: BB#5(0x30000000 / 0x80000000 =
>>>>> 37.50%)
>>>>> BB#4(0x50000000 / 0x80000000 = 62.50%)
>>>>>
>>>>> Note that BB#2 both does a conditional jump AND falltrhoughs to BB#3.
>>>>>
>>>>> Then it's decided by MachineBlockPlacement::buildChain that BB#6 is
>>>>> a better layout successor to BB#2 than BB#3, and we get:
>>>>>
>>>>> BB#2: derived from LLVM BB %.split.us.split.us
>>>>> Live Ins: %a0_32 %af1 %CCReg %af1
>>>>> Predecessors according to CFG: BB#1
>>>>> brr_cond <BB#3>, pred:2, pred:%af1, pred:0
>>>>> Successors according to CFG: BB#3(0x80000000 / 0x80000000 =
>>>>> 100.00%)
>>>>>
>>>>> BB#6: derived from LLVM BB %.split
>>>>> Live Ins: %CCReg %af1
>>>>> Predecessors according to CFG: BB#0
>>>>> brr_cond <BB#13>, pred:3, pred:%af1, pred:0
>>>>> Successors according to CFG: BB#7(0x40000000 / 0x80000000 =
>>>>> 50.00%)
>>>>> BB#13(0x40000000 / 0x80000000 = 50.00%)
>>>>>
>>>>> Then in MachineBlockPlacement::buildCFGChains we have
>>>>>
>>>>> if (!TII->AnalyzeBranch(*PrevBB, TBB, FBB, Cond)) {
>>>>> PrevBB->updateTerminator();
>>>>>
>>>>> which is called on BB#2, and from what I understand the point is
>>>>> that updateTerminator should insert an unconditional branch from
>>>>> BB#2 to
>>>>> BB#3 now that there is no fallthrough anymore, but it doesn't and
>>>>> the code is left broken.
>>>>>
>>>>> What's your thought's on this Haicheng, who should fix BB#2 in this
>>>>> case?
>>>>>
>>>>> Thanks,
>>>>> Mikael
>>>>>
>>>>>
>>>>> On 06/06/2016 08:36 PM, Haicheng Wu via llvm-commits wrote:
>>>>>>
>>>>>> Author: haicheng
>>>>>> Date: Mon Jun 6 13:36:07 2016
>>>>>> New Revision: 271925
>>>>>>
>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=271925&view=rev
>>>>>> Log:
>>>>>> [MBP] Reduce code size by running tail merging in MBP.
>>>>>>
>>>>>> The code layout that TailMerging (inside BranchFolding) works on is
>>>>>> not the final layout optimized based on the branch probability.
>>>>>> Generally, after BlockPlacement, many new merging opportunities
>>>>>> emerge.
>>>>>>
>>>>>> This patch calls Tail Merging after MBP and calls MBP again if Tail
>>>>>> Merging merges anything.
>>>>>>
>>>>>> Differential Revision: http://reviews.llvm.org/D20276
>>>>>>
>>>>>> Added:
>>>>>> llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>>>> Modified:
>>>>>> llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>>>> llvm/trunk/lib/CodeGen/BranchFolding.h
>>>>>> llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>>>> llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>>>>
>>>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.cpp
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>>>> l di ng.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.cpp (original)
>>>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.cpp Mon Jun 6 13:36:07
>>>>>> +++ 2016
>>>>>> @@ -27,6 +27,7 @@
>>>>>> #include "llvm/CodeGen/MachineFunctionPass.h"
>>>>>> #include "llvm/CodeGen/MachineJumpTableInfo.h"
>>>>>> #include "llvm/CodeGen/MachineMemOperand.h"
>>>>>> +#include "llvm/CodeGen/MachineLoopInfo.h"
>>>>>> #include "llvm/CodeGen/MachineModuleInfo.h"
>>>>>> #include "llvm/CodeGen/MachineRegisterInfo.h"
>>>>>> #include "llvm/CodeGen/Passes.h"
>>>>>> @@ -137,6 +138,8 @@ void BranchFolder::RemoveDeadBlock(Machi
>>>>>> // Remove the block.
>>>>>> MF->erase(MBB);
>>>>>> FuncletMembership.erase(MBB);
>>>>>> + if (MLI)
>>>>>> + MLI->removeBlock(MBB);
>>>>>> }
>>>>>>
>>>>>> /// OptimizeImpDefsBlock - If a basic block is just a bunch of
>>>>>> implicit_def @@ -193,18 +196,22 @@ bool
>>>>>> BranchFolder::OptimizeImpDefsBlock(
>>>>>> }
>>>>>>
>>>>>> /// OptimizeFunction - Perhaps branch folding, tail merging and
>>>>>> other -/// CFG optimizations on the given function.
>>>>>> +/// CFG optimizations on the given function. Block placement
>>>>>> +changes the layout /// and may create new tail merging opportunities.
>>>>>> bool BranchFolder::OptimizeFunction(MachineFunction &MF,
>>>>>> const TargetInstrInfo *tii,
>>>>>> const TargetRegisterInfo
>>>>>> *tri,
>>>>>> - MachineModuleInfo *mmi) {
>>>>>> + MachineModuleInfo *mmi,
>>>>>> + MachineLoopInfo *mli, bool
>>>>>> + AfterPlacement) {
>>>>>> if (!tii) return false;
>>>>>>
>>>>>> TriedMerging.clear();
>>>>>>
>>>>>> + AfterBlockPlacement = AfterPlacement;
>>>>>> TII = tii;
>>>>>> TRI = tri;
>>>>>> MMI = mmi;
>>>>>> + MLI = mli;
>>>>>> RS = nullptr;
>>>>>>
>>>>>> // Use a RegScavenger to help update liveness when required.
>>>>>> @@ -230,7 +237,10 @@ bool BranchFolder::OptimizeFunction(Mach
>>>>>> bool MadeChangeThisIteration = true;
>>>>>> while (MadeChangeThisIteration) {
>>>>>> MadeChangeThisIteration = TailMergeBlocks(MF);
>>>>>> - MadeChangeThisIteration |= OptimizeBranches(MF);
>>>>>> + // No need to clean up if tail merging does not change
>>>>>> + anything
>>>>>> after the
>>>>>> + // block placement.
>>>>>> + if (!AfterBlockPlacement || MadeChangeThisIteration)
>>>>>> + MadeChangeThisIteration |= OptimizeBranches(MF);
>>>>>> if (EnableHoistCommonCode)
>>>>>> MadeChangeThisIteration |= HoistCommonCode(MF);
>>>>>> MadeChange |= MadeChangeThisIteration; @@ -447,6 +457,11 @@
>>>>>> MachineBasicBlock *BranchFolder::SplitMB
>>>>>> // Splice the code over.
>>>>>> NewMBB->splice(NewMBB->end(), &CurMBB, BBI1, CurMBB.end());
>>>>>>
>>>>>> + // NewMBB belongs to the same loop as CurMBB.
>>>>>> + if (MLI)
>>>>>> + if (MachineLoop *ML = MLI->getLoopFor(&CurMBB))
>>>>>> + ML->addBasicBlockToLoop(NewMBB, MLI->getBase());
>>>>>> +
>>>>>> // NewMBB inherits CurMBB's block frequency.
>>>>>> MBBFreqInfo.setBlockFreq(NewMBB,
>>>>>> MBBFreqInfo.getBlockFreq(&CurMBB));
>>>>>>
>>>>>> @@ -934,23 +949,27 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>>> if (!EnableTailMerge) return MadeChange;
>>>>>>
>>>>>> // First find blocks with no successors.
>>>>>> - MergePotentials.clear();
>>>>>> - for (MachineBasicBlock &MBB : MF) {
>>>>>> + // Block placement does not create new tail merging
>>>>>> + opportunities for these // blocks.
>>>>>> + if (!AfterBlockPlacement) {
>>>>>> + MergePotentials.clear();
>>>>>> + for (MachineBasicBlock &MBB : MF) {
>>>>>> + if (MergePotentials.size() == TailMergeThreshold)
>>>>>> + break;
>>>>>> + if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>>>> +
>>>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>>>> &MBB));
>>>>>> + }
>>>>>> +
>>>>>> + // If this is a large problem, avoid visiting the same basic
>>>>>> blocks
>>>>>> + // multiple times.
>>>>>> if (MergePotentials.size() == TailMergeThreshold)
>>>>>> - break;
>>>>>> - if (!TriedMerging.count(&MBB) && MBB.succ_empty())
>>>>>> -
>>>>>> MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB),
>>>>>> &MBB));
>>>>>> - }
>>>>>> + for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>>>> + TriedMerging.insert(MergePotentials[i].getBlock());
>>>>>>
>>>>>> - // If this is a large problem, avoid visiting the same basic
>>>>>> blocks
>>>>>> - // multiple times.
>>>>>> - if (MergePotentials.size() == TailMergeThreshold)
>>>>>> - for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
>>>>>> - TriedMerging.insert(MergePotentials[i].getBlock());
>>>>>> -
>>>>>> - // See if we can do any tail merging on those.
>>>>>> - if (MergePotentials.size() >= 2)
>>>>>> - MadeChange |= TryTailMergeBlocks(nullptr, nullptr);
>>>>>> + // See if we can do any tail merging on those.
>>>>>> + if (MergePotentials.size() >= 2)
>>>>>> + MadeChange |= TryTailMergeBlocks(nullptr, nullptr); }
>>>>>>
>>>>>> // Look at blocks (IBB) with multiple predecessors (PBB).
>>>>>> // We change each predecessor to a canonical form, by @@
>>>>>> -997,6
>>>>>> +1016,17 @@ bool BranchFolder::TailMergeBlocks(Machi
>>>>>> if (PBB->hasEHPadSuccessor())
>>>>>> continue;
>>>>>>
>>>>>> + // Bail out if the loop header (IBB) is not the top of the
>>>>>> loop chain
>>>>>> + // after the block placement. Otherwise, the common tail of
>>>>>> IBB's
>>>>>> + // predecessors may become the loop top if block placement
>>>>>> + is
>>>>>> called again
>>>>>> + // and the predecessors may branch to this common tail.
>>>>>> + // FIXME: Relaxed this check if the algorithm of finding
>>>>>> + loop
>>>>>> top is
>>>>>> + // changed in MBP.
>>>>>> + if (AfterBlockPlacement && MLI)
>>>>>> + if (MachineLoop *ML = MLI->getLoopFor(IBB))
>>>>>> + if (IBB == ML->getHeader() && ML == MLI->getLoopFor(PBB))
>>>>>> + continue;
>>>>>> +
>>>>>> MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
>>>>>> SmallVector<MachineOperand, 4> Cond;
>>>>>> if (!TII->AnalyzeBranch(*PBB, TBB, FBB, Cond, true)) {
>>>>>>
>>>>>> Modified: llvm/trunk/lib/CodeGen/BranchFolding.h
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/BranchFo
>>>>>> l di ng.h?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/lib/CodeGen/BranchFolding.h (original)
>>>>>> +++ llvm/trunk/lib/CodeGen/BranchFolding.h Mon Jun 6 13:36:07 2016
>>>>>> @@ -20,6 +20,7 @@ namespace llvm {
>>>>>> class MachineBranchProbabilityInfo;
>>>>>> class MachineFunction;
>>>>>> class MachineModuleInfo;
>>>>>> + class MachineLoopInfo;
>>>>>> class RegScavenger;
>>>>>> class TargetInstrInfo;
>>>>>> class TargetRegisterInfo;
>>>>>> @@ -32,10 +33,11 @@ namespace llvm {
>>>>>> MBFIWrapper &MBFI,
>>>>>> const MachineBranchProbabilityInfo
>>>>>> &MBPI);
>>>>>>
>>>>>> - bool OptimizeFunction(MachineFunction &MF,
>>>>>> - const TargetInstrInfo *tii,
>>>>>> - const TargetRegisterInfo *tri,
>>>>>> - MachineModuleInfo *mmi);
>>>>>> + bool OptimizeFunction(MachineFunction &MF, const
>>>>>> + TargetInstrInfo
>>>>>> *tii,
>>>>>> + const TargetRegisterInfo *tri,
>>>>>> MachineModuleInfo *mmi,
>>>>>> + MachineLoopInfo *mli = nullptr,
>>>>>> + bool AfterPlacement = false);
>>>>>> +
>>>>>> private:
>>>>>> class MergePotentialsElt {
>>>>>> unsigned Hash;
>>>>>> @@ -93,11 +95,13 @@ namespace llvm {
>>>>>> };
>>>>>> std::vector<SameTailElt> SameTails;
>>>>>>
>>>>>> + bool AfterBlockPlacement;
>>>>>> bool EnableTailMerge;
>>>>>> bool EnableHoistCommonCode;
>>>>>> const TargetInstrInfo *TII;
>>>>>> const TargetRegisterInfo *TRI;
>>>>>> MachineModuleInfo *MMI;
>>>>>> + MachineLoopInfo *MLI;
>>>>>> RegScavenger *RS;
>>>>>>
>>>>>> public:
>>>>>>
>>>>>> Modified: llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineB
>>>>>> l oc kPlacement.cpp?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp (original)
>>>>>> +++ llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp Mon Jun 6
>>>>>> +++ 13:36:07 2016
>>>>>> @@ -26,6 +26,8 @@
>>>>>>
>>>>>> //===--------------------------------------------------------------
>>>>>> -
>>>>>> --
>>>>>> -----===//
>>>>>>
>>>>>> #include "llvm/CodeGen/Passes.h"
>>>>>> +#include "llvm/CodeGen/TargetPassConfig.h"
>>>>>> +#include "BranchFolding.h"
>>>>>> #include "llvm/ADT/DenseMap.h"
>>>>>> #include "llvm/ADT/SmallPtrSet.h"
>>>>>> #include "llvm/ADT/SmallVector.h"
>>>>>> @@ -116,6 +118,12 @@ static cl::opt<unsigned> JumpInstCost("j
>>>>>> cl::desc("Cost of jump
>>>>>> instructions."),
>>>>>> cl::init(1), cl::Hidden);
>>>>>>
>>>>>> +static cl::opt<bool>
>>>>>> +BranchFoldPlacement("branch-fold-placement",
>>>>>> + cl::desc("Perform branch folding during placement. "
>>>>>> + "Reduces code size."),
>>>>>> + cl::init(true), cl::Hidden);
>>>>>> +
>>>>>> extern cl::opt<unsigned> StaticLikelyProb;
>>>>>>
>>>>>> namespace {
>>>>>> @@ -232,10 +240,10 @@ class MachineBlockPlacement : public Mac
>>>>>> const MachineBranchProbabilityInfo *MBPI;
>>>>>>
>>>>>> /// \brief A handle to the function-wide block frequency pass.
>>>>>> - const MachineBlockFrequencyInfo *MBFI;
>>>>>> + std::unique_ptr<BranchFolder::MBFIWrapper> MBFI;
>>>>>>
>>>>>> /// \brief A handle to the loop info.
>>>>>> - const MachineLoopInfo *MLI;
>>>>>> + MachineLoopInfo *MLI;
>>>>>>
>>>>>> /// \brief A handle to the target's instruction info.
>>>>>> const TargetInstrInfo *TII;
>>>>>> @@ -323,6 +331,7 @@ public:
>>>>>> AU.addRequired<MachineBlockFrequencyInfo>();
>>>>>> AU.addRequired<MachineDominatorTree>();
>>>>>> AU.addRequired<MachineLoopInfo>();
>>>>>> + AU.addRequired<TargetPassConfig>();
>>>>>> MachineFunctionPass::getAnalysisUsage(AU);
>>>>>> }
>>>>>> };
>>>>>> @@ -1462,7 +1471,8 @@ bool MachineBlockPlacement::runOnMachine
>>>>>> return false;
>>>>>>
>>>>>> MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
>>>>>> - MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
>>>>>> + MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
>>>>>> + getAnalysis<MachineBlockFrequencyInfo>());
>>>>>> MLI = &getAnalysis<MachineLoopInfo>();
>>>>>> TII = F.getSubtarget().getInstrInfo();
>>>>>> TLI = F.getSubtarget().getTargetLowering();
>>>>>> @@ -1470,6 +1480,29 @@ bool MachineBlockPlacement::runOnMachine
>>>>>> assert(BlockToChain.empty());
>>>>>>
>>>>>> buildCFGChains(F);
>>>>>> +
>>>>>> + // Changing the layout can create new tail merging opportunities.
>>>>>> + TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();
>>>>>> + // TailMerge can create jump into if branches that make CFG
>>>>>> + irreducible for // HW that requires structurized CFG.
>>>>>> + bool EnableTailMerge = !F.getTarget().requiresStructuredCFG() &&
>>>>>> + PassConfig->getEnableTailMerge() &&
>>>>>> + BranchFoldPlacement; // No tail merging
>>>>>> + opportunities if the block number is less than four.
>>>>>> + if (F.size() > 3 && EnableTailMerge) {
>>>>>> + BranchFolder BF(/*EnableTailMerge=*/true,
>>>>>> + /*CommonHoist=*/false,
>>>>>> *MBFI,
>>>>>> + *MBPI);
>>>>>> +
>>>>>> + if (BF.OptimizeFunction(F, TII,
>>>>>> + F.getSubtarget().getRegisterInfo(),
>>>>>> +
>>>>>> getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
>>>>>> + /*AfterBlockPlacement=*/true)) {
>>>>>> + // Redo the layout if tail merging creates/removes/moves
>>>>>> blocks.
>>>>>> + BlockToChain.clear();
>>>>>> + ChainAllocator.DestroyAll();
>>>>>> + buildCFGChains(F);
>>>>>> + }
>>>>>> + }
>>>>>> +
>>>>>> optimizeBranches(F);
>>>>>> alignBlocks(F);
>>>>>>
>>>>>>
>>>>>> Added: llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64
>>>>>> / ta ilmerging_in_mbp.ll?rev=271925&view=auto
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll (added)
>>>>>> +++ llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll Mon Jun
>>>>>> +++ 6
>>>>>> +++ 13:36:07 2016
>>>>>> @@ -0,0 +1,63 @@
>>>>>> +; RUN: llc <%s -march=aarch64 | FileCheck %s
>>>>>> +
>>>>>> +; CHECK-LABEL: test:
>>>>>> +; CHECK: .LBB0_7
>>>>>> +; CHECK: b.hi .LBB0_2
>>>>>> +; CHECK-NEXT: b .LBB0_9
>>>>>> +; CHECK-NEXT: .LBB0_8
>>>>>> +; CHECK-NEXT: mov x8, x9
>>>>>> +; CHECK-NEXT: .LBB0_9
>>>>>> +define i64 @test(i64 %n, i64* %a, i64* %b, i64* %c, i64* %d, i64*
>>>>>> +%e,
>>>>>> +i64* %f) {
>>>>>> +entry:
>>>>>> + %cmp28 = icmp sgt i64 %n, 1
>>>>>> + br i1 %cmp28, label %for.body, label %for.end
>>>>>> +
>>>>>> +for.body: ; preds =
>>>>>> %for.body.lr.ph, %if.end
>>>>>> + %j = phi i64 [ %n, %entry ], [ %div, %if.end ] %div = lshr i64
>>>>>> + %j, 1 %a.arrayidx = getelementptr inbounds i64, i64* %a, i64
>>>>>> + %div %a.j = load i64, i64* %a.arrayidx %b.arrayidx =
>>>>>> + getelementptr inbounds i64, i64* %b, i64 %div %b.j = load i64,
>>>>>> + i64* %b.arrayidx %cmp.i = icmp slt i64 %a.j, %b.j br i1 %cmp.i,
>>>>>> + label %for.end.loopexit, label %cond.false.i
>>>>>> +
>>>>>> +cond.false.i: ; preds = %for.body
>>>>>> + %cmp4.i = icmp sgt i64 %a.j, %b.j
>>>>>> + br i1 %cmp4.i, label %if.end, label %cond.false6.i
>>>>>> +
>>>>>> +cond.false6.i: ; preds =
>>>>>> %cond.false.i
>>>>>> + %c.arrayidx = getelementptr inbounds i64, i64* %c, i64 %div
>>>>>> + %c.j = load i64, i64* %c.arrayidx %d.arrayidx = getelementptr
>>>>>> + inbounds i64, i64* %d, i64 %div %d.j = load i64, i64*
>>>>>> + %d.arrayidx %cmp9.i = icmp slt i64 %c.j, %d.j br i1 %cmp9.i,
>>>>>> + label %for.end.loopexit, label %cond.false11.i
>>>>>> +
>>>>>> +cond.false11.i: ; preds =
>>>>>> %cond.false6.i
>>>>>> + %cmp14.i = icmp sgt i64 %c.j, %d.j br i1 %cmp14.i, label
>>>>>> + %if.end, label %cond.false12.i
>>>>>> +
>>>>>> +cond.false12.i: ; preds = %cond.false11.i
>>>>>> + %e.arrayidx = getelementptr inbounds i64, i64* %e, i64 %div
>>>>>> + %e.j = load i64, i64* %e.arrayidx
>>>>>> + %f.arrayidx = getelementptr inbounds i64, i64* %f, i64 %div
>>>>>> + %f.j = load i64, i64* %f.arrayidx
>>>>>> + %cmp19.i = icmp sgt i64 %e.j, %f.j
>>>>>> + br i1 %cmp19.i, label %if.end, label %for.end.loopexit
>>>>>> +
>>>>>> +if.end: ; preds =
>>>>>> %cond.false12.i, %cond.false11.i, %cond.false.i
>>>>>> + %cmp = icmp ugt i64 %j, 3
>>>>>> + br i1 %cmp, label %for.body, label %for.end.loopexit
>>>>>> +
>>>>>> +for.end.loopexit: ; preds =
>>>>>> %cond.false12.i, %cond.false6.i, %for.body, %if.end
>>>>>> + %j.0.lcssa.ph = phi i64 [ %j, %cond.false12.i ], [ %j,
>>>>>> +%cond.false6.i ], [ %j, %for.body ], [ %div, %if.end ]
>>>>>> + br label %for.end
>>>>>> +
>>>>>> +for.end: ; preds =
>>>>>> %for.end.loopexit, %entry
>>>>>> + %j.0.lcssa = phi i64 [ %n, %entry ], [ %j.0.lcssa.ph,
>>>>>> +%for.end.loopexit ]
>>>>>> + %j.2 = add i64 %j.0.lcssa, %n
>>>>>> + %j.3 = mul i64 %j.2, %n
>>>>>> + %j.4 = add i64 %j.3, 10
>>>>>> + ret i64 %j.4
>>>>>> +}
>>>>>>
>>>>>> Modified: llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/arm
>>>>>> - an d-tst-peephole.ll?rev=271925&r1=271924&r2=271925&view=diff
>>>>>> ===================================================================
>>>>>> =
>>>>>> ==
>>>>>> ========
>>>>>> --- llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll (original)
>>>>>> +++ llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll Mon Jun 6
>>>>>> +++ 13:36:07 2016
>>>>>> @@ -49,7 +49,7 @@ tailrecurse.switch:
>>>>>> ; V8-NEXT: beq
>>>>>> ; V8-NEXT: %tailrecurse.switch
>>>>>> ; V8: cmp
>>>>>> -; V8-NEXT: bne
>>>>>> +; V8-NEXT: beq
>>>>>> ; V8-NEXT: b
>>>>>> ; The trailing space in the last line checks that the branch is
>>>>>> unconditional
>>>>>> switch i32 %and, label %sw.epilog [
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list