<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Arnaud,<div class=""><br class=""></div><div class="">Thanks for making a new patch that fast!</div><div class=""><br class=""></div><div class="">LGTM with one question:</div><div class=""><div class="">- GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB);</div><div class="">+ GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB,</div><div class="">+ AllSuccsCache &AllSuccessors) const;</div><div class=""><br class=""></div><div class="">I haven’t checked but does this method need to access anything in MachineSinkPass or could it be marked as static?</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div><blockquote type="cite" class=""><div class="">On Jun 15, 2015, at 2:25 PM, Arnaud A. de Grandmaison <<a href="mailto:arnaud.degrandmaison@arm.com" class="">arnaud.degrandmaison@arm.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">Hi Quentin,<br class=""><br class="">Here is a patch that I think will address your comments. Let me know if you are ok with it.<br class=""><br class="">By the way, on a different (but related) subject, we find that llvm is 2x to 3x slower than gcc on a specific testscase (a big switch/case in a loop). llvm's codegen lasts about 2+ minutes, with about 40% spent in the Spiller. Although I understand regalloc can take a significant portion of time for this code, I was wondering whether you would be aware of any room for improvement there ?<br class=""></div></blockquote><div><br class=""></div><div>There are things that can be improved, for instance, for the spiller we have this: <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D17409&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=C29AoSe9gIf-RtFqF8R8ydEOwU4Q4cNRU6pG1LkIDKc&s=erzxB0t98HuUKQXxknPhmj3Av7ArZiOcVPNxpPYQ08M&e=" class="">https://llvm.org/bugs/show_bug.cgi?id=17409</a>.</div><div>If you have more information on the profile of the case you want to solve, I may have some other suggestions. This is the only one I have in my mind for the spiller, but I have a few on the coalescer as well.</div><div><br class=""></div><div>Thanks,</div><div>Q. </div><br class=""><blockquote type="cite" class=""><div class=""><br class="">Cheers,<br class="">Arnaud<br class=""><br class=""><blockquote type="cite" class="">-----Original Message-----<br class="">From: <a href="mailto:llvm-commits-bounces@cs.uiuc.edu" class="">llvm-commits-bounces@cs.uiuc.edu</a> [mailto:llvm-commits-<br class=""><a href="mailto:bounces@cs.uiuc.edu" class="">bounces@cs.uiuc.edu</a>] On Behalf Of Arnaud A. de Grandmaison<br class="">Sent: 15 June 2015 22:35<br class="">To: 'Quentin Colombet'<br class="">Cc: <a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a><br class="">Subject: RE: [llvm] r239720 - [MachineSink] Improve runtime performance.<br class="">NFC.<br class=""><br class=""><blockquote type="cite" class="">-----Original Message-----<br class="">From: Quentin Colombet [<a href="mailto:qcolombet@apple.com" class="">mailto:qcolombet@apple.com</a>]<br class="">Sent: 15 June 2015 19:45<br class="">To: Arnaud De Grandmaison<br class="">Cc: <a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a><br class="">Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.<br class="">NFC.<br class=""><br class="">Hi Arnaud,<br class=""></blockquote><br class="">Hi Quentin,<br class=""><br class=""><blockquote type="cite" class=""><br class="">The current implementation of the caching mechanism seems confusing to<br class="">me.<br class="">Looking at the comment on the related field seems to imply that the<br class="">cache is global to the pass.<br class="">However, when we look at the implementation, the cache is reset each<br class="">time we process a new block.<br class=""><br class="">I found that confusing for two reasons:<br class="">1. We invalid the cache for apparently no good reason, i.e., why do we<br class="">assume the list of successors changed when processing a new block?<br class=""></blockquote><br class="">The reason is this is in fact a double-entry cache, but we do not need to have<br class="">this double entry cache for the whole pass duration (I guess I'm answering to<br class="">your second question here as well).<br class="">Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic<br class="">blocks immediately dominated by MI. MI->getParent() will stay the same for<br class="">each instruction processed by ProcessBlock, thus the clearing when another<br class="">basic block is processed.<br class=""><br class=""><blockquote type="cite" class="">2. If this cache is only valid for the current process block (and the<br class="">related traversed blocks), why is this a field global to the structure?<br class=""><br class="">I see two ways of fixing that:<br class="">1. Have the map being local to process block function and pass it to<br class="">the functions that use it.<br class=""></blockquote><br class="">I considered having the structure declared locally in ProcessBlock, and<br class="">passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink,<br class="">but it seemed a bit clumsy to me. I however agree it would have at least the<br class="">merit of making the cache lifetime explicit.<br class=""><br class=""><blockquote type="cite" class="">2. Have the cache being really global and invalid it only when it makes<br class=""></blockquote>sense.<br class=""><blockquote type="cite" class=""><br class="">Also, side remark, for caching/lazy structures, I expect those to be<br class="">mutable and have the related function being const… My 2c.<br class=""></blockquote><br class="">Given my above comments, I'll modify the patch to make the cache local, as I<br class="">think it makes things clearer.<br class=""><br class=""><blockquote type="cite" class=""><br class="">Cheers,<br class="">-Quentin<br class=""><br class=""><blockquote type="cite" class="">On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison<br class=""></blockquote><<a href="mailto:arnaud.degrandmaison@arm.com" class="">arnaud.degrandmaison@arm.com</a>> wrote:<br class=""><blockquote type="cite" class=""><br class="">Author: aadg<br class="">Date: Mon Jun 15 04:09:06 2015<br class="">New Revision: 239720<br class=""><br class="">URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject-3Frev-3D239720-26view-3Drev&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=C29AoSe9gIf-RtFqF8R8ydEOwU4Q4cNRU6pG1LkIDKc&s=hKmeqBYbh5TacA_VfGFcVUrZdjc9ucG2FjgdtWiASOE&e=" class="">http://llvm.org/viewvc/llvm-project?rev=239720&view=rev</a><br class="">Log:<br class="">[MachineSink] Improve runtime performance. NFC.<br class=""><br class="">This patch fixes a compilation time issue, when MachineSink faces<br class="">PHIs with a huge number of operands. This can happen for example in<br class="">goto table based interpreters, where some basic blocks can have<br class="">several of those PHIs, each one with several hundreds operands.<br class="">MachineSink was spending a significant time re-building and<br class="">re-sorting the list of successors of the current MachineBasicBlock.<br class="">The computing and sorting of the current MachineBasicBlock successors is<br class=""></blockquote></blockquote>now cached.<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><br class="">Modified:<br class=""> llvm/trunk/lib/CodeGen/MachineSink.cpp<br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp<br class="">URL:<br class=""><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject_llvm_trunk_lib_CodeGen_MachineSi&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=C29AoSe9gIf-RtFqF8R8ydEOwU4Q4cNRU6pG1LkIDKc&s=xwdMSk_g2xPbNJ4lCxx6aqDMOcWzaSmChRM4WnboiIQ&e=" class="">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi</a><br class="">nk .cpp?rev=239720&r1=239719&r2=239720&view=diff<br class=""><br class=""></blockquote><br class=""></blockquote>==========================================================<br class=""><blockquote type="cite" class="">============<br class=""><blockquote type="cite" class="">========<br class="">--- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015<br class="">@@ -73,6 +73,11 @@ namespace {<br class=""><br class=""> SparseBitVector<> RegsToClearKillFlags;<br class=""><br class="">+ // Cache all successors to a MachineBasicBlock, sorted by<br class="">+ frequency info<br class=""></blockquote>and<br class=""><blockquote type="cite" class="">+ // loop depth. AllSuccessors is lazily populated.<br class="">+ std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *,<br class=""></blockquote></blockquote>4>><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">+ AllSuccessors;<br class="">+<br class=""> public:<br class=""> static char ID; // Pass identification<br class=""> MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@<br class="">namespace {<br class=""><br class=""> bool PerformTrivialForwardCoalescing(MachineInstr *MI,<br class=""> MachineBasicBlock *MBB);<br class="">+<br class="">+ SmallVector<MachineBasicBlock *, 4> &<br class="">+ GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock<br class="">+ *MBB);<br class=""> };<br class="">} // end anonymous namespace<br class=""><br class="">@@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio<br class=""> // Process all basic blocks.<br class=""> CEBCandidates.clear();<br class=""> ToSplit.clear();<br class="">- for (MachineFunction::iterator I = MF.begin(), E = MF.end();<br class="">- I != E; ++I)<br class="">- MadeChange |= ProcessBlock(*I);<br class="">+ for (auto &MBB: MF)<br class="">+ MadeChange |= ProcessBlock(MBB);<br class=""><br class=""> // If we have anything we marked as toSplit, split it now.<br class=""> for (auto &Pair : ToSplit) {<br class="">@@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin<br class=""><br class=""> bool MadeChange = false;<br class=""><br class="">+ // MBB changed, reset all cached information.<br class="">+ AllSuccessors.clear();<br class="">+<br class=""> // Walk the basic block bottom-up. Remember if we saw a store.<br class=""> MachineBasicBlock::iterator I = MBB.end();<br class=""> --I;<br class="">@@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT<br class=""> return false;<br class="">}<br class=""><br class="">+/// Get the sorted sequence of successors for this<br class="">+MachineBasicBlock, possibly /// computing it if it was not already<br class=""></blockquote></blockquote>cached.<br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">+SmallVector<MachineBasicBlock *, 4> &<br class="">+MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,<br class="">+MachineBasicBlock *MBB) {<br class="">+<br class="">+ // Do we have the sorted successors in cache ?<br class="">+ auto Succs = AllSuccessors.find(MBB); if (Succs !=<br class="">+ AllSuccessors.end())<br class="">+ return Succs->second;<br class="">+<br class="">+ SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),<br class="">+ MBB->succ_end());<br class="">+<br class="">+ // Handle cases where sinking can happen but where the sink point<br class="">+ isn't a // successor. For example:<br class="">+ //<br class="">+ // x = computation<br class="">+ // if () {} else {}<br class="">+ // use x<br class="">+ //<br class="">+ const std::vector<MachineDomTreeNode *> &Children =<br class="">+ DT->getNode(MBB)->getChildren(); for (const auto &DTChild :<br class="">+ Children)<br class="">+ // DomTree children of MBB that have MBB as immediate dominator<br class=""></blockquote>are added.<br class=""><blockquote type="cite" class="">+ if (DTChild->getIDom()->getBlock() == MI->getParent() &&<br class="">+ // Skip MBBs already added to the AllSuccs vector above.<br class="">+ !MBB->isSuccessor(DTChild->getBlock()))<br class="">+ AllSuccs.push_back(DTChild->getBlock());<br class="">+<br class="">+ // Sort Successors according to their loop depth or block frequency<br class=""></blockquote></blockquote>info.<br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">+ std::stable_sort(<br class="">+ AllSuccs.begin(), AllSuccs.end(),<br class="">+ [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {<br class="">+ uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :<br class=""></blockquote></blockquote>0;<br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">+ uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :<br class=""></blockquote>0;<br class=""><blockquote type="cite" class="">+ bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;<br class="">+ return HasBlockFreq ? LHSFreq < RHSFreq<br class="">+ : LI->getLoopDepth(L) < LI->getLoopDepth(R);<br class="">+ });<br class="">+<br class="">+ auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));<br class="">+<br class="">+ return it.first->second;<br class="">+}<br class="">+<br class="">/// FindSuccToSinkTo - Find a successor to sink this instruction to.<br class="">MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr<br class=""></blockquote></blockquote>*MI,<br class=""><blockquote type="cite" class=""><blockquote type="cite" class=""> MachineBasicBlock *MBB, @@<br class="">-579,38<br class="">+634,7 @@ MachineBasicBlock *MachineSinking::FindS<br class=""> // we should sink to. If we have reliable block frequency information<br class=""> // (frequency != 0) available, give successors with smaller frequencies<br class=""> // higher priority, otherwise prioritize smaller loop depths.<br class="">- SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),<br class="">- MBB->succ_end());<br class="">-<br class="">- // Handle cases where sinking can happen but where the sink point<br class=""></blockquote>isn't a<br class=""><blockquote type="cite" class="">- // successor. For example:<br class="">- //<br class="">- // x = computation<br class="">- // if () {} else {}<br class="">- // use x<br class="">- //<br class="">- const std::vector<MachineDomTreeNode *> &Children =<br class="">- DT->getNode(MBB)->getChildren();<br class="">- for (const auto &DTChild : Children)<br class="">- // DomTree children of MBB that have MBB as immediate dominator<br class=""></blockquote>are added.<br class=""><blockquote type="cite" class="">- if (DTChild->getIDom()->getBlock() == MI->getParent() &&<br class="">- // Skip MBBs already added to the Succs vector above.<br class="">- !MBB->isSuccessor(DTChild->getBlock()))<br class="">- Succs.push_back(DTChild->getBlock());<br class="">-<br class="">- // Sort Successors according to their loop depth or block frequency<br class=""></blockquote></blockquote>info.<br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">- std::stable_sort(<br class="">- Succs.begin(), Succs.end(),<br class="">- [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {<br class="">- uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency()<br class=""></blockquote></blockquote>:<br class=""><blockquote type="cite" class="">0;<br class=""><blockquote type="cite" class="">- uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency()<br class=""></blockquote></blockquote>:<br class=""><blockquote type="cite" class="">0;<br class=""><blockquote type="cite" class="">- bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;<br class="">- return HasBlockFreq ? LHSFreq < RHSFreq<br class="">- : LI->getLoopDepth(L) < LI->getLoopDepth(R);<br class="">- });<br class="">- for (SmallVectorImpl<MachineBasicBlock *>::iterator SI =<br class=""></blockquote></blockquote>Succs.begin(),<br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">- E = Succs.end(); SI != E; ++SI) {<br class="">- MachineBasicBlock *SuccBlock = *SI;<br class="">+ for (MachineBasicBlock *SuccBlock :<br class="">+ GetAllSortedSuccessors(MI,<br class="">+ MBB)) {<br class=""> bool LocalUse = false;<br class=""> if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,<br class=""> BreakPHIEdge, LocalUse)) {<br class=""><br class=""><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class=""><a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a><br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits<br class=""></blockquote></blockquote><br class=""><br class=""><br class=""><br class="">_______________________________________________<br class="">llvm-commits mailing list<br class=""><a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a><br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits<br class=""></blockquote><span id="cid:091EEED9-DFC8-40C3-AEC0-54D4C3D0FDA3"><0001-MachineSink-Address-post-commit-review-comments.patch></span></div></blockquote></div><br class=""></div></body></html>