[PATCH] R600: Clean if/then/else handling code in AMDILCFGStructurizer
Vincent Lejeune
vljn at ovi.com
Tue Nov 19 05:36:42 PST 2013
----- Mail original -----
> De : Tom Stellard <tom at stellard.net>
> À : reviews+D2188+public+3f8d5bc02a1045dd at llvm-reviews.chandlerc.com
> Cc : vljn at ovi.com; llvm-commits at cs.uiuc.edu
> Envoyé le : Lundi 18 novembre 2013 21h09
> Objet : Re: [PATCH] R600: Clean if/then/else handling code in AMDILCFGStructurizer
>
> Hi Vincent,
>
> I've rebased your patch on top of the current master branch (which
> includes my patches to enable the IR Structurizer), and attached it to
> this mail.
>
> Your patch regresses a test case I added in:
>
> r195030
> R600: Fix a crash in the AMDILCFGStrucurizer
structurize1.ll works fine on my computer, however structurize.ll does not (but it's older than r195030)
Structurize.ll actually check that branches-into-if don't crash llvm, but it run with ir structurizer disabled
whereas I assume in my patch that it is enabled. (it will probably require some change in the CHECK: sections too)
>
> I think the problem may be that not all of the if statements nested
> inside the loop are being matched by the ifPatternMatch() function.
>
> The other question I have about this patch is: Why is it safe to remove
> the code for handling branch into IF?
That's why my patch relies on the ir structurizer : the pass replaces branches into if with predicates
and (several) triangles cfg pattern, pretty much like you described in the comment of commit
a4f468f245d6e6869317007c548ee4d33ad97343rev at 192813.
If I understand correctly what StructurizeCFG does, it will convert pattern like this :
// entry
// / |
// diamond_head branch_from
// / \ |
// diamond_false diamond_true
// \ /
// done
//
into a pattern similar to that :
// entry
// | \
// | diamond_head
// | | \
// | | diamond_false
// | | /
// extra block
// | \
// | branch_from
// | /
// extra block
// | \
// | diamond_true
// | /
// done
with an additionnal predicate in diamond_true set to true in diamond_head's true path and at the end of branch_from.
>
> -Tom
>
> On Fri, Nov 15, 2013 at 09:16:13AM -0800, Vincent Lejeune wrote:
>> Further simplify the pass and fix some shadertoy's sample crashes.
>> The pass is no longer able to copy block in the jump into if situation and
> rely on structurizeCFG pass.
>>
>> http://llvm-reviews.chandlerc.com/D2188
>>
>> Files:
>> lib/Target/R600/AMDILCFGStructurizer.cpp
>
>> Index: lib/Target/R600/AMDILCFGStructurizer.cpp
>> ===================================================================
>> --- lib/Target/R600/AMDILCFGStructurizer.cpp
>> +++ lib/Target/R600/AMDILCFGStructurizer.cpp
>> @@ -133,18 +133,15 @@
>>
>> AMDGPUCFGStructurizer(TargetMachine &tm) :
>> MachineFunctionPass(ID), TM(tm),
>> - TII(static_cast<const R600InstrInfo *>(tm.getInstrInfo())),
>> - TRI(&TII->getRegisterInfo()) { }
>> + MLI(0), TII(0), TRI(0), FuncRep(0) { }
>>
>> const char *getPassName() const {
>> return "AMD IL Control Flow Graph structurizer Pass";
>> }
>>
>> void getAnalysisUsage(AnalysisUsage &AU) const {
>> AU.addPreserved<MachineFunctionAnalysis>();
>> AU.addRequired<MachineFunctionAnalysis>();
>> - AU.addRequired<MachineDominatorTree>();
>> - AU.addRequired<MachinePostDominatorTree>();
>> AU.addRequired<MachineLoopInfo>();
>> }
>>
>> @@ -161,21 +158,17 @@
>> OrderedBlks.clear();
>> FuncRep = &MF;
>> MLI = &getAnalysis<MachineLoopInfo>();
>> + TII = static_cast<const R600InstrInfo *>(TM.getInstrInfo());
>> + TRI = &TII->getRegisterInfo();
>> DEBUG(dbgs() << "LoopInfo:\n";
> PrintLoopinfo(*MLI););
>> - MDT = &getAnalysis<MachineDominatorTree>();
>> - DEBUG(MDT->print(dbgs(), (const llvm::Module*)0););
>> - PDT = &getAnalysis<MachinePostDominatorTree>();
>> - DEBUG(PDT->print(dbgs()););
>> prepare();
>> run();
>> DEBUG(MF.dump(););
>> return true;
>> }
>>
>> protected:
>> TargetMachine &TM;
>> - MachineDominatorTree *MDT;
>> - MachinePostDominatorTree *PDT;
>> MachineLoopInfo *MLI;
>> const R600InstrInfo *TII;
>> const AMDGPURegisterInfo *TRI;
>> @@ -208,12 +201,8 @@
>> bool hasBackEdge(MachineBasicBlock *MBB) const;
>> static unsigned getLoopDepth(MachineLoop *LoopRep);
>> bool isRetiredBlock(MachineBasicBlock *MBB) const;
>> - bool isActiveLoophead(MachineBasicBlock *MBB) const;
>> - PathToKind singlePathTo(MachineBasicBlock *SrcMBB, MachineBasicBlock
> *DstMBB,
>> - bool AllowSideEntry = true) const;
>> int countActiveBlock(MBBVector::const_iterator It,
>> MBBVector::const_iterator E) const;
>> - bool needMigrateBlock(MachineBasicBlock *MBB) const;
>>
>> // Utility Functions
>> void reversePredicateSetter(MachineBasicBlock::iterator I);
>> @@ -264,33 +253,16 @@
>>
>>
>> int patternMatch(MachineBasicBlock *MBB);
>> - int patternMatchGroup(MachineBasicBlock *MBB);
>> - int serialPatternMatch(MachineBasicBlock *MBB);
>> - int ifPatternMatch(MachineBasicBlock *MBB);
>> + bool patternMatchGroup(MachineBasicBlock *MBB);
>> + bool serialPatternMatch(MachineBasicBlock *MBB);
>> + bool ifPatternMatch(MachineBasicBlock *MBB);
>> int loopendPatternMatch();
>> int mergeLoop(MachineLoop *LoopRep);
>> int loopcontPatternMatch(MachineLoop *LoopRep, MachineBasicBlock
> *LoopHeader);
>>
>> void handleLoopcontBlock(MachineBasicBlock *ContingMBB,
>> MachineLoop *ContingLoop, MachineBasicBlock *ContMBB,
>> MachineLoop *ContLoop);
>> - /// return true iff src1Blk->succ_size() == 0 && src1Blk and
> src2Blk are in
>> - /// the same loop with LoopLandInfo without explicitly keeping track of
>> - /// loopContBlks and loopBreakBlks, this is a method to get the
> information.
>> - bool isSameloopDetachedContbreak(MachineBasicBlock *Src1MBB,
>> - MachineBasicBlock *Src2MBB);
>> - int handleJumpintoIf(MachineBasicBlock *HeadMBB,
>> - MachineBasicBlock *TrueMBB, MachineBasicBlock *FalseMBB);
>> - int handleJumpintoIfImp(MachineBasicBlock *HeadMBB,
>> - MachineBasicBlock *TrueMBB, MachineBasicBlock *FalseMBB);
>> - int improveSimpleJumpintoIf(MachineBasicBlock *HeadMBB,
>> - MachineBasicBlock *TrueMBB, MachineBasicBlock *FalseMBB,
>> - MachineBasicBlock **LandMBBPtr);
>> - void showImproveSimpleJumpintoIf(MachineBasicBlock *HeadMBB,
>> - MachineBasicBlock *TrueMBB, MachineBasicBlock *FalseMBB,
>> - MachineBasicBlock *LandMBB, bool Detail = false);
>> - int cloneOnSideEntryTo(MachineBasicBlock *PreMBB,
>> - MachineBasicBlock *SrcMBB, MachineBasicBlock *DstMBB);
>> void mergeSerialBlock(MachineBasicBlock *DstMBB,
>> MachineBasicBlock *SrcMBB);
>>
>> @@ -326,18 +298,10 @@
>> void removeSuccessor(MachineBasicBlock *MBB);
>> MachineBasicBlock *cloneBlockForPredecessor(MachineBasicBlock *MBB,
>> MachineBasicBlock *PredMBB);
>> - void migrateInstruction(MachineBasicBlock *SrcMBB,
>> - MachineBasicBlock *DstMBB, MachineBasicBlock::iterator I);
>> void recordSccnum(MachineBasicBlock *MBB, int SCCNum);
>> void retireBlock(MachineBasicBlock *MBB);
>> void setLoopLandBlock(MachineLoop *LoopRep, MachineBasicBlock *MBB =
> NULL);
>>
>> - MachineBasicBlock
> *findNearestCommonPostDom(std::set<MachineBasicBlock *>&);
>> - /// This is work around solution for findNearestCommonDominator not
> avaiable
>> - /// to post dom a proper fix should go to Dominators.h.
>> - MachineBasicBlock *findNearestCommonPostDom(MachineBasicBlock *MBB1,
>> - MachineBasicBlock *MBB2);
>> -
>> private:
>> MBBInfoMap BlockInfoMap;
>> LoopLandInfoMap LLInfoMap;
>> @@ -380,36 +344,6 @@
>> return (*It).second->IsRetired;
>> }
>>
>> -bool AMDGPUCFGStructurizer::isActiveLoophead(MachineBasicBlock *MBB) const
> {
>> - MachineLoop *LoopRep = MLI->getLoopFor(MBB);
>> - while (LoopRep && LoopRep->getHeader() == MBB) {
>> - MachineBasicBlock *LoopLand = getLoopLandInfo(LoopRep);
>> - if(!LoopLand)
>> - return true;
>> - if (!isRetiredBlock(LoopLand))
>> - return true;
>> - LoopRep = LoopRep->getParentLoop();
>> - }
>> - return false;
>> -}
>> -AMDGPUCFGStructurizer::PathToKind AMDGPUCFGStructurizer::singlePathTo(
>> - MachineBasicBlock *SrcMBB, MachineBasicBlock *DstMBB,
>> - bool AllowSideEntry) const {
>> - assert(DstMBB);
>> - if (SrcMBB == DstMBB)
>> - return SinglePath_InPath;
>> - while (SrcMBB && SrcMBB->succ_size() == 1) {
>> - SrcMBB = *SrcMBB->succ_begin();
>> - if (SrcMBB == DstMBB)
>> - return SinglePath_InPath;
>> - if (!AllowSideEntry && SrcMBB->pred_size() > 1)
>> - return Not_SinglePath;
>> - }
>> - if (SrcMBB && SrcMBB->succ_size()==0)
>> - return SinglePath_NotInPath;
>> - return Not_SinglePath;
>> -}
>> -
>> int AMDGPUCFGStructurizer::countActiveBlock(MBBVector::const_iterator It,
>> MBBVector::const_iterator E) const {
>> int Count = 0;
>> @@ -421,18 +355,6 @@
>> return Count;
>> }
>>
>> -bool AMDGPUCFGStructurizer::needMigrateBlock(MachineBasicBlock *MBB) const
> {
>> - unsigned BlockSizeThreshold = 30;
>> - unsigned CloneInstrThreshold = 100;
>> - bool MultiplePreds = MBB && (MBB->pred_size() > 1);
>> -
>> - if(!MultiplePreds)
>> - return false;
>> - unsigned BlkSize = MBB->size();
>> - return ((BlkSize > BlockSizeThreshold) &&
>> - (BlkSize * (MBB->pred_size() - 1) > CloneInstrThreshold));
>> -}
>> -
>> void AMDGPUCFGStructurizer::reversePredicateSetter(
>> MachineBasicBlock::iterator I) {
>> while (I--) {
>> @@ -800,6 +722,7 @@
>> bool MakeProgress = false;
>> int NumRemainedBlk = countActiveBlock(OrderedBlks.begin(),
>> OrderedBlks.end());
>> + loopendPatternMatch();
>>
>> do {
>> ++NumIter;
>> @@ -972,103 +895,96 @@
>> return NumMatch;
>> }
>>
>> -int AMDGPUCFGStructurizer::patternMatchGroup(MachineBasicBlock *MBB)
> {
>> - int NumMatch = 0;
>> - NumMatch += loopendPatternMatch();
>> - NumMatch += serialPatternMatch(MBB);
>> - NumMatch += ifPatternMatch(MBB);
>> - return NumMatch;
>> +bool AMDGPUCFGStructurizer::patternMatchGroup(MachineBasicBlock *MBB)
> {
>> + bool ChangedInSingleIteration, Changed = false;
>> + do {
>> + ChangedInSingleIteration = false;
>> + DEBUG(dbgs() << "Pattern matching starting from BB#"
> << MBB->getNumber()
>> + << "\n";);
>> + ChangedInSingleIteration |= serialPatternMatch(MBB);
>> + ChangedInSingleIteration |= ifPatternMatch(MBB);
>> + Changed |= ChangedInSingleIteration;
>> + } while (ChangedInSingleIteration);
>> +
>> + return Changed;
>> }
>>
>>
>> -int AMDGPUCFGStructurizer::serialPatternMatch(MachineBasicBlock *MBB)
> {
>> +bool AMDGPUCFGStructurizer::serialPatternMatch(MachineBasicBlock *MBB)
> {
>> if (MBB->succ_size() != 1)
>> - return 0;
>> + return false;
>>
>> MachineBasicBlock *childBlk = *MBB->succ_begin();
>> - if (childBlk->pred_size() != 1 || isActiveLoophead(childBlk))
>> - return 0;
>> + if (childBlk->pred_size() != 1)
>> + return false;
>>
>> mergeSerialBlock(MBB, childBlk);
>> ++numSerialPatternMatch;
>> - return 1;
>> + return true;
>> }
>>
>> -int AMDGPUCFGStructurizer::ifPatternMatch(MachineBasicBlock *MBB) {
>> - //two edges
>> +bool AMDGPUCFGStructurizer::ifPatternMatch(MachineBasicBlock *MBB) {
>> if (MBB->succ_size() != 2)
>> - return 0;
>> + return false;
>> if (hasBackEdge(MBB))
>> - return 0;
>> + return false;
>> MachineInstr *BranchMI = getNormalBlockBranchInstr(MBB);
>> if (!BranchMI)
>> - return 0;
>> + return false;
>>
>> assert(isCondBranch(BranchMI));
>>
>> + bool Changed = false;
>> MachineBasicBlock *TrueMBB = getTrueBranch(BranchMI);
>> - serialPatternMatch(TrueMBB);
>> - ifPatternMatch(TrueMBB);
>> MachineBasicBlock *FalseMBB = getFalseBranch(MBB, BranchMI);
>> - serialPatternMatch(FalseMBB);
>> - ifPatternMatch(FalseMBB);
>> + DEBUG(dbgs() << "if pattern starting at BB#" <<
> MBB->getNumber() <<
>> + ", True branch to BB#" << TrueMBB->getNumber()
> <<
>> + ", False branch to BB#" << FalseMBB->getNumber()
> << "\n");
>> +
>> + Changed |= patternMatchGroup(TrueMBB);
>> + Changed |= patternMatchGroup(FalseMBB);
>> + DEBUG(
>> + dbgs() << "BB#" << TrueMBB->getNumber()
> << "'successors :";
>> + for (MachineBasicBlock::succ_iterator I = TrueMBB->succ_begin(),
>> + E = TrueMBB->succ_end(); I != E; ++I)
>> + dbgs() << "BB#" << (*I)->getNumber()
> <<", ";
>> + dbgs() << "\n";
>> + dbgs() << "BB#" << FalseMBB->getNumber()
> << "'successors :";
>> + for (MachineBasicBlock::succ_iterator I = FalseMBB->succ_begin(),
>> + E = FalseMBB->succ_end(); I != E; ++I)
>> + dbgs() << "BB#" << (*I)->getNumber()
> << ", ";
>> + dbgs() << "\n";
>> +);
>> MachineBasicBlock *LandBlk;
>> - int Cloned = 0;
>>
>> assert (!TrueMBB->succ_empty() || !FalseMBB->succ_empty());
>> - // TODO: Simplify
>> if (TrueMBB->succ_size() == 1 && FalseMBB->succ_size() ==
> 1
>> - && *TrueMBB->succ_begin() == *FalseMBB->succ_begin())
> {
>> + && *TrueMBB->succ_begin() == *FalseMBB->succ_begin()
>> + && TrueMBB->pred_size() == 1 &&
> FalseMBB->pred_size() == 1) {
>> // Diamond pattern
>> LandBlk = *TrueMBB->succ_begin();
>> - } else if (TrueMBB->succ_size() == 1 &&
> *TrueMBB->succ_begin() == FalseMBB) {
>> + } else if (TrueMBB->succ_size() == 1 &&
> *TrueMBB->succ_begin() == FalseMBB &&
>> + TrueMBB->pred_size() == 1) {
>> // Triangle pattern, false is empty
>> LandBlk = FalseMBB;
>> FalseMBB = NULL;
>> - } else if (FalseMBB->succ_size() == 1
>> + } else if (FalseMBB->succ_size() == 1 &&
> FalseMBB->pred_size() == 1
>> && *FalseMBB->succ_begin() == TrueMBB) {
>> // Triangle pattern, true is empty
>> // We reverse the predicate to make a triangle, empty false pattern;
>> std::swap(TrueMBB, FalseMBB);
>> reversePredicateSetter(MBB->end());
>> LandBlk = FalseMBB;
>> FalseMBB = NULL;
>> - } else if (FalseMBB->succ_size() == 1
>> - && isSameloopDetachedContbreak(TrueMBB, FalseMBB))
> {
>> - LandBlk = *FalseMBB->succ_begin();
>> - } else if (TrueMBB->succ_size() == 1
>> - && isSameloopDetachedContbreak(FalseMBB, TrueMBB)) {
>> - LandBlk = *TrueMBB->succ_begin();
>> - } else {
>> - return handleJumpintoIf(MBB, TrueMBB, FalseMBB);
>> - }
>> -
>> - // improveSimpleJumpinfoIf can handle the case where landBlk == NULL but
> the
>> - // new BB created for landBlk==NULL may introduce new challenge to the
>> - // reduction process.
>> - if (LandBlk &&
>> - ((TrueMBB && TrueMBB->pred_size() > 1)
>> - || (FalseMBB && FalseMBB->pred_size() > 1))) {
>> - Cloned += improveSimpleJumpintoIf(MBB, TrueMBB, FalseMBB,
> &LandBlk);
>> - }
>> + } else
>> + return Changed;
>>
>> - if (TrueMBB && TrueMBB->pred_size() > 1) {
>> + if (TrueMBB && TrueMBB->pred_size() > 1)
>> TrueMBB = cloneBlockForPredecessor(TrueMBB, MBB);
>> - ++Cloned;
>> - }
>> -
>> - if (FalseMBB && FalseMBB->pred_size() > 1) {
>> + if (FalseMBB && FalseMBB->pred_size() > 1)
>> FalseMBB = cloneBlockForPredecessor(FalseMBB, MBB);
>> - ++Cloned;
>> - }
>> -
>> mergeIfthenelseBlock(BranchMI, MBB, TrueMBB, FalseMBB, LandBlk);
>> -
>> - ++numIfPatternMatch;
>> -
>> - numClonedBlock += Cloned;
>> -
>> - return 1 + Cloned;
>> + return true;
>> }
>>
>> int AMDGPUCFGStructurizer::loopendPatternMatch() {
>> @@ -1129,11 +1045,7 @@
>> for (unsigned i = 0, e = LatchBlks.size(); i < e; ++i)
>> settleLoopcontBlock(LatchBlks[i], LoopHeader);
>> int Match = 0;
>> - do {
>> - Match = 0;
>> - Match += serialPatternMatch(LoopHeader);
>> - Match += ifPatternMatch(LoopHeader);
>> - } while (Match > 0);
>> + patternMatchGroup(LoopHeader);
>> mergeLooplandBlock(LoopHeader, ExitBlk);
>> MachineLoop *ParentLoop = LoopRep->getParentLoop();
>> if (ParentLoop)
>> @@ -1171,302 +1083,6 @@
>> return NumCont;
>> }
>>
>> -
>> -bool AMDGPUCFGStructurizer::isSameloopDetachedContbreak(
>> - MachineBasicBlock *Src1MBB, MachineBasicBlock *Src2MBB) {
>> - if (Src1MBB->succ_size() == 0) {
>> - MachineLoop *LoopRep = MLI->getLoopFor(Src1MBB);
>> - if (LoopRep&& LoopRep == MLI->getLoopFor(Src2MBB)) {
>> - MachineBasicBlock *&TheEntry = LLInfoMap[LoopRep];
>> - if (TheEntry) {
>> - DEBUG(
>> - dbgs() << "isLoopContBreakBlock yes src1 = BB"
>> - << Src1MBB->getNumber()
>> - << " src2 = BB" <<
> Src2MBB->getNumber() << "\n";
>> - );
>> - return true;
>> - }
>> - }
>> - }
>> - return false;
>> -}
>> -
>> -int AMDGPUCFGStructurizer::handleJumpintoIf(MachineBasicBlock *HeadMBB,
>> - MachineBasicBlock *TrueMBB, MachineBasicBlock *FalseMBB) {
>> - int Num = handleJumpintoIfImp(HeadMBB, TrueMBB, FalseMBB);
>> - if (Num == 0) {
>> - DEBUG(
>> - dbgs() << "handleJumpintoIf swap trueBlk and
> FalseBlk" << "\n";
>> - );
>> - Num = handleJumpintoIfImp(HeadMBB, FalseMBB, TrueMBB);
>> - }
>> - return Num;
>> -}
>> -
>> -int AMDGPUCFGStructurizer::handleJumpintoIfImp(MachineBasicBlock *HeadMBB,
>> - MachineBasicBlock *TrueMBB, MachineBasicBlock *FalseMBB) {
>> - int Num = 0;
>> - MachineBasicBlock *DownBlk;
>> -
>> - //trueBlk could be the common post dominator
>> - DownBlk = TrueMBB;
>> -
>> - DEBUG(
>> - dbgs() << "handleJumpintoIfImp head = BB" <<
> HeadMBB->getNumber()
>> - << " true = BB" <<
> TrueMBB->getNumber()
>> - << ", numSucc=" <<
> TrueMBB->succ_size()
>> - << " false = BB" <<
> FalseMBB->getNumber() << "\n";
>> - );
>> -
>> - while (DownBlk) {
>> - DEBUG(
>> - dbgs() << "check down = BB" <<
> DownBlk->getNumber();
>> - );
>> -
>> - if (singlePathTo(FalseMBB, DownBlk) == SinglePath_InPath) {
>> - DEBUG(
>> - dbgs() << " working\n";
>> - );
>> -
>> - Num += cloneOnSideEntryTo(HeadMBB, TrueMBB, DownBlk);
>> - Num += cloneOnSideEntryTo(HeadMBB, FalseMBB, DownBlk);
>> -
>> - numClonedBlock += Num;
>> - Num += serialPatternMatch(*HeadMBB->succ_begin());
>> - Num += serialPatternMatch(*llvm::next(HeadMBB->succ_begin()));
>> - Num += ifPatternMatch(HeadMBB);
>> - assert(Num > 0);
>> -
>> - break;
>> - }
>> - DEBUG(
>> - dbgs() << " not working\n";
>> - );
>> - DownBlk = (DownBlk->succ_size() == 1) ? (*DownBlk->succ_begin())
> : NULL;
>> - } // walk down the postDomTree
>> -
>> - return Num;
>> -}
>> -
>> -void AMDGPUCFGStructurizer::showImproveSimpleJumpintoIf(
>> - MachineBasicBlock *HeadMBB, MachineBasicBlock *TrueMBB,
>> - MachineBasicBlock *FalseMBB, MachineBasicBlock *LandMBB, bool Detail)
> {
>> - dbgs() << "head = BB" << HeadMBB->getNumber()
>> - << " size = " << HeadMBB->size();
>> - if (Detail) {
>> - dbgs() << "\n";
>> - HeadMBB->print(dbgs());
>> - dbgs() << "\n";
>> - }
>> -
>> - if (TrueMBB) {
>> - dbgs() << ", true = BB" <<
> TrueMBB->getNumber() << " size = "
>> - << TrueMBB->size() << " numPred = "
> << TrueMBB->pred_size();
>> - if (Detail) {
>> - dbgs() << "\n";
>> - TrueMBB->print(dbgs());
>> - dbgs() << "\n";
>> - }
>> - }
>> - if (FalseMBB) {
>> - dbgs() << ", false = BB" <<
> FalseMBB->getNumber() << " size = "
>> - << FalseMBB->size() << " numPred = "
> << FalseMBB->pred_size();
>> - if (Detail) {
>> - dbgs() << "\n";
>> - FalseMBB->print(dbgs());
>> - dbgs() << "\n";
>> - }
>> - }
>> - if (LandMBB) {
>> - dbgs() << ", land = BB" <<
> LandMBB->getNumber() << " size = "
>> - << LandMBB->size() << " numPred = "
> << LandMBB->pred_size();
>> - if (Detail) {
>> - dbgs() << "\n";
>> - LandMBB->print(dbgs());
>> - dbgs() << "\n";
>> - }
>> - }
>> -
>> - dbgs() << "\n";
>> -}
>> -
>> -int AMDGPUCFGStructurizer::improveSimpleJumpintoIf(MachineBasicBlock
> *HeadMBB,
>> - MachineBasicBlock *TrueMBB, MachineBasicBlock *FalseMBB,
>> - MachineBasicBlock **LandMBBPtr) {
>> - bool MigrateTrue = false;
>> - bool MigrateFalse = false;
>> -
>> - MachineBasicBlock *LandBlk = *LandMBBPtr;
>> -
>> - assert((!TrueMBB || TrueMBB->succ_size() <= 1)
>> - && (!FalseMBB || FalseMBB->succ_size() <= 1));
>> -
>> - if (TrueMBB == FalseMBB)
>> - return 0;
>> -
>> - MigrateTrue = needMigrateBlock(TrueMBB);
>> - MigrateFalse = needMigrateBlock(FalseMBB);
>> -
>> - if (!MigrateTrue && !MigrateFalse)
>> - return 0;
>> -
>> - // If we need to migrate either trueBlk and falseBlk, migrate the rest
> that
>> - // have more than one predecessors. without doing this, its predecessor
>> - // rather than headBlk will have undefined value in initReg.
>> - if (!MigrateTrue && TrueMBB && TrueMBB->pred_size()
>> 1)
>> - MigrateTrue = true;
>> - if (!MigrateFalse && FalseMBB &&
> FalseMBB->pred_size() > 1)
>> - MigrateFalse = true;
>> -
>> - DEBUG(
>> - dbgs() << "before improveSimpleJumpintoIf: ";
>> - showImproveSimpleJumpintoIf(HeadMBB, TrueMBB, FalseMBB, LandBlk, 0);
>> - );
>> -
>> - // org: headBlk => if () {trueBlk} else {falseBlk} =>
> landBlk
>> - //
>> - // new: headBlk => if () {initReg = 1; org trueBlk branch} else
>> - // {initReg = 0; org falseBlk branch }
>> - // => landBlk => if (initReg) {org trueBlk} else
> {org falseBlk}
>> - // => org landBlk
>> - // if landBlk->pred_size() > 2, put the about if-else inside
>> - // if (initReg !=2) {...}
>> - //
>> - // add initReg = initVal to headBlk
>> -
>> - const TargetRegisterClass * I32RC =
> TRI->getCFGStructurizerRegClass(MVT::i32);
>> - if (!MigrateTrue || !MigrateFalse) {
>> - // XXX: We have an opportunity here to optimize the "branch into
> if" case
>> - // here. Branch into if looks like this:
>> - // entry
>> - // / |
>> - // diamond_head branch_from
>> - // / \ |
>> - // diamond_false diamond_true
>> - // \ /
>> - // done
>> - //
>> - // The diamond_head block begins the "if" and the
> diamond_true block
>> - // is the block being "branched into".
>> - //
>> - // If MigrateTrue is true, then TrueBB is the block being
> "branched into"
>> - // and if MigrateFalse is true, then FalseBB is the block being
>> - // "branched into"
>> - //
>> - // Here is the pseudo code for how I think the optimization should
> work:
>> - // 1. Insert MOV GPR0, 0 before the branch instruction in
> diamond_head.
>> - // 2. Insert MOV GPR0, 1 before the branch instruction in branch_from.
>> - // 3. Move the branch instruction from diamond_head into its own basic
>> - // block (new_block).
>> - // 4. Add an unconditional branch from diamond_head to new_block
>> - // 5. Replace the branch instruction in branch_from with an
> unconditional
>> - // branch to new_block. If branch_from has multiple predecessors,
> then
>> - // we need to replace the True/False block in the branch
>> - // instruction instead of replacing it.
>> - // 6. Change the condition of the branch instruction in new_block from
>> - // COND to (COND || GPR0)
>> - //
>> - // In order insert these MOV instruction, we will need to use the
>> - // RegisterScavenger. Usually liveness stops being tracked during
>> - // the late machine optimization passes, however if we implement
>> - // bool TargetRegisterInfo::requiresRegisterScavenging(
>> - // const
> MachineFunction &MF)
>> - // and have it return true, liveness will be tracked correctly
>> - // by generic optimization passes. We will also need to make sure
> that
>> - // all of our target-specific passes that run after regalloc and
> before
>> - // the CFGStructurizer track liveness and we will need to modify this
> pass
>> - // to correctly track liveness.
>> - //
>> - // After the above changes, the new CFG should look like this:
>> - // entry
>> - // / |
>> - // diamond_head branch_from
>> - // \ /
>> - // new_block
>> - // / |
>> - // diamond_false diamond_true
>> - // \ /
>> - // done
>> - //
>> - // Without this optimization, we are forced to duplicate the
> diamond_true
>> - // block and we will end up with a CFG like this:
>> - //
>> - // entry
>> - // / |
>> - // diamond_head branch_from
>> - // / \ |
>> - // diamond_false diamond_true diamond_true (duplicate)
>> - // \ / |
>> - // done --------------------|
>> - //
>> - // Duplicating diamond_true can be very costly especially if it has a
>> - // lot of instructions.
>> - return 0;
>> - }
>> -
>> - int NumNewBlk = 0;
>> -
>> - bool LandBlkHasOtherPred = (LandBlk->pred_size() > 2);
>> -
>> - //insert AMDGPU::ENDIF to avoid special case "input landBlk ==
> NULL"
>> - MachineBasicBlock::iterator I = insertInstrBefore(LandBlk,
> AMDGPU::ENDIF);
>> -
>> - if (LandBlkHasOtherPred) {
>> - llvm_unreachable("Extra register needed to handle CFG");
>> - unsigned CmpResReg =
>> -
> HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);
>> - llvm_unreachable("Extra compare instruction needed to handle
> CFG");
>> - insertCondBranchBefore(LandBlk, I, AMDGPU::IF_PREDICATE_SET,
>> - CmpResReg, DebugLoc());
>> - }
>> -
>> - // XXX: We are running this after RA, so creating virtual registers will
>> - // cause an assertion failure in the PostRA scheduling pass.
>> - unsigned InitReg =
>> - HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);
>> - insertCondBranchBefore(LandBlk, I, AMDGPU::IF_PREDICATE_SET, InitReg,
>> - DebugLoc());
>> -
>> - if (MigrateTrue) {
>> - migrateInstruction(TrueMBB, LandBlk, I);
>> - // need to uncondionally insert the assignment to ensure a path from
> its
>> - // predecessor rather than headBlk has valid value in initReg if
>> - // (initVal != 1).
>> - llvm_unreachable("Extra register needed to handle CFG");
>> - }
>> - insertInstrBefore(I, AMDGPU::ELSE);
>> -
>> - if (MigrateFalse) {
>> - migrateInstruction(FalseMBB, LandBlk, I);
>> - // need to uncondionally insert the assignment to ensure a path from
> its
>> - // predecessor rather than headBlk has valid value in initReg if
>> - // (initVal != 0)
>> - llvm_unreachable("Extra register needed to handle CFG");
>> - }
>> -
>> - if (LandBlkHasOtherPred) {
>> - // add endif
>> - insertInstrBefore(I, AMDGPU::ENDIF);
>> -
>> - // put initReg = 2 to other predecessors of landBlk
>> - for (MachineBasicBlock::pred_iterator PI = LandBlk->pred_begin(),
>> - PE = LandBlk->pred_end(); PI != PE; ++PI) {
>> - MachineBasicBlock *MBB = *PI;
>> - if (MBB != TrueMBB && MBB != FalseMBB)
>> - llvm_unreachable("Extra register needed to handle CFG");
>> - }
>> - }
>> - DEBUG(
>> - dbgs() << "result from improveSimpleJumpintoIf: ";
>> - showImproveSimpleJumpintoIf(HeadMBB, TrueMBB, FalseMBB, LandBlk, 0);
>> - );
>> -
>> - // update landBlk
>> - *LandMBBPtr = LandBlk;
>> -
>> - return NumNewBlk;
>> -}
>> -
>> void AMDGPUCFGStructurizer::handleLoopcontBlock(MachineBasicBlock
> *ContingMBB,
>> MachineLoop *ContingLoop, MachineBasicBlock *ContMBB,
>> MachineLoop *ContLoop) {
>> @@ -1637,24 +1253,6 @@
>> }
>> }
>>
>> -int AMDGPUCFGStructurizer::cloneOnSideEntryTo(MachineBasicBlock *PreMBB,
>> - MachineBasicBlock *SrcMBB, MachineBasicBlock *DstMBB) {
>> - int Cloned = 0;
>> - assert(PreMBB->isSuccessor(SrcMBB));
>> - while (SrcMBB && SrcMBB != DstMBB) {
>> - assert(SrcMBB->succ_size() == 1);
>> - if (SrcMBB->pred_size() > 1) {
>> - SrcMBB = cloneBlockForPredecessor(SrcMBB, PreMBB);
>> - ++Cloned;
>> - }
>> -
>> - PreMBB = SrcMBB;
>> - SrcMBB = *SrcMBB->succ_begin();
>> - }
>> -
>> - return Cloned;
>> -}
>> -
>> MachineBasicBlock *
>> AMDGPUCFGStructurizer::cloneBlockForPredecessor(MachineBasicBlock *MBB,
>> MachineBasicBlock *PredMBB) {
>> @@ -1683,37 +1281,6 @@
>> return CloneMBB;
>> }
>>
>> -void AMDGPUCFGStructurizer::migrateInstruction(MachineBasicBlock *SrcMBB,
>> - MachineBasicBlock *DstMBB, MachineBasicBlock::iterator I) {
>> - MachineBasicBlock::iterator SpliceEnd;
>> - //look for the input branchinstr, not the AMDGPU branchinstr
>> - MachineInstr *BranchMI = getNormalBlockBranchInstr(SrcMBB);
>> - if (!BranchMI) {
>> - DEBUG(
>> - dbgs() << "migrateInstruction don't see branch
> instr\n" ;
>> - );
>> - SpliceEnd = SrcMBB->end();
>> - } else {
>> - DEBUG(
>> - dbgs() << "migrateInstruction see branch
> instr\n" ;
>> - BranchMI->dump();
>> - );
>> - SpliceEnd = BranchMI;
>> - }
>> - DEBUG(
>> - dbgs() << "migrateInstruction before splice dstSize =
> " << DstMBB->size()
>> - << "srcSize = " << SrcMBB->size() <<
> "\n";
>> - );
>> -
>> - //splice insert before insertPos
>> - DstMBB->splice(I, SrcMBB, SrcMBB->begin(), SpliceEnd);
>> -
>> - DEBUG(
>> - dbgs() << "migrateInstruction after splice dstSize = "
> << DstMBB->size()
>> - << "srcSize = " << SrcMBB->size() <<
> "\n";
>> - );
>> -}
>> -
>> MachineBasicBlock *
>> AMDGPUCFGStructurizer::normalizeInfiniteLoopExit(MachineLoop* LoopRep)
> {
>> MachineBasicBlock *LoopHeader = LoopRep->getHeader();
>> @@ -1839,60 +1406,6 @@
>> );
>> }
>>
>> -MachineBasicBlock *
>> -AMDGPUCFGStructurizer::findNearestCommonPostDom(MachineBasicBlock *MBB1,
>> - MachineBasicBlock *MBB2) {
>> -
>> - if (PDT->dominates(MBB1, MBB2))
>> - return MBB1;
>> - if (PDT->dominates(MBB2, MBB1))
>> - return MBB2;
>> -
>> - MachineDomTreeNode *Node1 = PDT->getNode(MBB1);
>> - MachineDomTreeNode *Node2 = PDT->getNode(MBB2);
>> -
>> - // Handle newly cloned node.
>> - if (!Node1 && MBB1->succ_size() == 1)
>> - return findNearestCommonPostDom(*MBB1->succ_begin(), MBB2);
>> - if (!Node2 && MBB2->succ_size() == 1)
>> - return findNearestCommonPostDom(MBB1, *MBB2->succ_begin());
>> -
>> - if (!Node1 || !Node2)
>> - return NULL;
>> -
>> - Node1 = Node1->getIDom();
>> - while (Node1) {
>> - if (PDT->dominates(Node1, Node2))
>> - return Node1->getBlock();
>> - Node1 = Node1->getIDom();
>> - }
>> -
>> - return NULL;
>> -}
>> -
>> -MachineBasicBlock *
>> -AMDGPUCFGStructurizer::findNearestCommonPostDom(
>> - std::set<MachineBasicBlock *> &MBBs) {
>> - MachineBasicBlock *CommonDom;
>> - std::set<MachineBasicBlock *>::const_iterator It = MBBs.begin();
>> - std::set<MachineBasicBlock *>::const_iterator E = MBBs.end();
>> - for (CommonDom = *It; It != E && CommonDom; ++It) {
>> - MachineBasicBlock *MBB = *It;
>> - if (MBB != CommonDom)
>> - CommonDom = findNearestCommonPostDom(MBB, CommonDom);
>> - }
>> -
>> - DEBUG(
>> - dbgs() << "Common post dominator for exit blocks is ";
>> - if (CommonDom)
>> - dbgs() << "BB" <<
> CommonDom->getNumber() << "\n";
>> - else
>> - dbgs() << "NULL\n";
>> - );
>> -
>> - return CommonDom;
>> -}
>> -
>> char AMDGPUCFGStructurizer::ID = 0;
>>
>> } // end anonymous namespace
>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list