[llvm] r239720 - [MachineSink] Improve runtime performance. NFC.
Quentin Colombet
qcolombet at apple.com
Tue Jun 16 09:38:45 PDT 2015
Hi Arnaud,
> On Jun 16, 2015, at 2:07 AM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com> wrote:
>
> Hi Quentin,
>
> GetAllSortedSuccessors does use some members of the MachineSink pass (MBFI, LI, DT), so I think it makes sense to leave it as a private method of MachineSink.
Thanks for checking!
Cheers,
-Quentin
>
> I committed the fixes @ r239807.
>
> Regarding the Spiller runtime, PR17409 is indeed the problem we are facing.
>
> Cheers,
> Arnaud
>
>
> From: Quentin Colombet [mailto:qcolombet at apple.com]
> Sent: 16 June 2015 00:26
> To: Arnaud De Grandmaison
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance. NFC.
>
> Hi Arnaud,
>
> Thanks for making a new patch that fast!
>
> LGTM with one question:
> - GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB);
> + GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB,
> + AllSuccsCache &AllSuccessors) const;
>
> I haven’t checked but does this method need to access anything in MachineSinkPass or could it be marked as static?
>
>
>
>> On Jun 15, 2015, at 2:25 PM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>> wrote:
>>
>> Hi Quentin,
>>
>> Here is a patch that I think will address your comments. Let me know if you are ok with it.
>>
>> By the way, on a different (but related) subject, we find that llvm is 2x to 3x slower than gcc on a specific testscase (a big switch/case in a loop). llvm's codegen lasts about 2+ minutes, with about 40% spent in the Spiller. Although I understand regalloc can take a significant portion of time for this code, I was wondering whether you would be aware of any room for improvement there ?
>
> There are things that can be improved, for instance, for the spiller we have this: https://llvm.org/bugs/show_bug.cgi?id=17409 <https://llvm.org/bugs/show_bug.cgi?id=17409>.
> If you have more information on the profile of the case you want to solve, I may have some other suggestions. This is the only one I have in my mind for the spiller, but I have a few on the coalescer as well.
>
> Thanks,
> Q.
>
>
>
> Cheers,
> Arnaud
>
>
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu <mailto:llvm-commits-bounces at cs.uiuc.edu> [mailto:llvm-commits-
> bounces at cs.uiuc.edu <mailto:bounces at cs.uiuc.edu>] On Behalf Of Arnaud A. de Grandmaison
> Sent: 15 June 2015 22:35
> To: 'Quentin Colombet'
> Cc: llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> Subject: RE: [llvm] r239720 - [MachineSink] Improve runtime performance.
> NFC.
>
>
> -----Original Message-----
> From: Quentin Colombet [mailto:qcolombet at apple.com <mailto:qcolombet at apple.com>]
> Sent: 15 June 2015 19:45
> To: Arnaud De Grandmaison
> Cc: llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.
> NFC.
>
> Hi Arnaud,
>
> Hi Quentin,
>
>
>
> The current implementation of the caching mechanism seems confusing to
> me.
> Looking at the comment on the related field seems to imply that the
> cache is global to the pass.
> However, when we look at the implementation, the cache is reset each
> time we process a new block.
>
> I found that confusing for two reasons:
> 1. We invalid the cache for apparently no good reason, i.e., why do we
> assume the list of successors changed when processing a new block?
>
> The reason is this is in fact a double-entry cache, but we do not need to have
> this double entry cache for the whole pass duration (I guess I'm answering to
> your second question here as well).
> Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic
> blocks immediately dominated by MI. MI->getParent() will stay the same for
> each instruction processed by ProcessBlock, thus the clearing when another
> basic block is processed.
>
>
> 2. If this cache is only valid for the current process block (and the
> related traversed blocks), why is this a field global to the structure?
>
> I see two ways of fixing that:
> 1. Have the map being local to process block function and pass it to
> the functions that use it.
>
> I considered having the structure declared locally in ProcessBlock, and
> passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink,
> but it seemed a bit clumsy to me. I however agree it would have at least the
> merit of making the cache lifetime explicit.
>
>
> 2. Have the cache being really global and invalid it only when it makes
> sense.
>
>
> Also, side remark, for caching/lazy structures, I expect those to be
> mutable and have the related function being const… My 2c.
>
> Given my above comments, I'll modify the patch to make the cache local, as I
> think it makes things clearer.
>
>
>
> Cheers,
> -Quentin
>
>
> On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison
> <arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>> wrote:
>
>
> Author: aadg
> Date: Mon Jun 15 04:09:06 2015
> New Revision: 239720
>
> URL: http://llvm.org/viewvc/llvm-project?rev=239720&view=rev <http://llvm.org/viewvc/llvm-project?rev=239720&view=rev>
> Log:
> [MachineSink] Improve runtime performance. NFC.
>
> This patch fixes a compilation time issue, when MachineSink faces
> PHIs with a huge number of operands. This can happen for example in
> goto table based interpreters, where some basic blocks can have
> several of those PHIs, each one with several hundreds operands.
> MachineSink was spending a significant time re-building and
> re-sorting the list of successors of the current MachineBasicBlock.
> The computing and sorting of the current MachineBasicBlock successors is
> now cached.
>
>>
>> Modified:
>> llvm/trunk/lib/CodeGen/MachineSink.cpp
>>
>> Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi>
>> nk .cpp?rev=239720&r1=239719&r2=239720&view=diff
>>
>
> ==========================================================
>
> ============
>
> ========
> --- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
> +++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015
> @@ -73,6 +73,11 @@ namespace {
>
> SparseBitVector<> RegsToClearKillFlags;
>
> + // Cache all successors to a MachineBasicBlock, sorted by
> + frequency info
> and
>
> + // loop depth. AllSuccessors is lazily populated.
> + std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *,
> 4>>
>
>> + AllSuccessors;
>> +
>> public:
>> static char ID; // Pass identification
>> MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@
>> namespace {
>>
>> bool PerformTrivialForwardCoalescing(MachineInstr *MI,
>> MachineBasicBlock *MBB);
>> +
>> + SmallVector<MachineBasicBlock *, 4> &
>> + GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock
>> + *MBB);
>> };
>> } // end anonymous namespace
>>
>> @@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio
>> // Process all basic blocks.
>> CEBCandidates.clear();
>> ToSplit.clear();
>> - for (MachineFunction::iterator I = MF.begin(), E = MF.end();
>> - I != E; ++I)
>> - MadeChange |= ProcessBlock(*I);
>> + for (auto &MBB: MF)
>> + MadeChange |= ProcessBlock(MBB);
>>
>> // If we have anything we marked as toSplit, split it now.
>> for (auto &Pair : ToSplit) {
>> @@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin
>>
>> bool MadeChange = false;
>>
>> + // MBB changed, reset all cached information.
>> + AllSuccessors.clear();
>> +
>> // Walk the basic block bottom-up. Remember if we saw a store.
>> MachineBasicBlock::iterator I = MBB.end();
>> --I;
>> @@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT
>> return false;
>> }
>>
>> +/// Get the sorted sequence of successors for this
>> +MachineBasicBlock, possibly /// computing it if it was not already
> cached.
>
>> +SmallVector<MachineBasicBlock *, 4> &
>> +MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,
>> +MachineBasicBlock *MBB) {
>> +
>> + // Do we have the sorted successors in cache ?
>> + auto Succs = AllSuccessors.find(MBB); if (Succs !=
>> + AllSuccessors.end())
>> + return Succs->second;
>> +
>> + SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),
>> + MBB->succ_end());
>> +
>> + // Handle cases where sinking can happen but where the sink point
>> + isn't a // successor. For example:
>> + //
>> + // x = computation
>> + // if () {} else {}
>> + // use x
>> + //
>> + const std::vector<MachineDomTreeNode *> &Children =
>> + DT->getNode(MBB)->getChildren(); for (const auto &DTChild :
>> + Children)
>> + // DomTree children of MBB that have MBB as immediate dominator
> are added.
>
> + if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> + // Skip MBBs already added to the AllSuccs vector above.
> + !MBB->isSuccessor(DTChild->getBlock()))
> + AllSuccs.push_back(DTChild->getBlock());
> +
> + // Sort Successors according to their loop depth or block frequency
> info.
>
>> + std::stable_sort(
>> + AllSuccs.begin(), AllSuccs.end(),
>> + [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>> + uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :
> 0;
>
>> + uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :
> 0;
>
> + bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> + return HasBlockFreq ? LHSFreq < RHSFreq
> + : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> + });
> +
> + auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));
> +
> + return it.first->second;
> +}
> +
> /// FindSuccToSinkTo - Find a successor to sink this instruction to.
> MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr
> *MI,
>
>> MachineBasicBlock *MBB, @@
>> -579,38
>> +634,7 @@ MachineBasicBlock *MachineSinking::FindS
>> // we should sink to. If we have reliable block frequency information
>> // (frequency != 0) available, give successors with smaller frequencies
>> // higher priority, otherwise prioritize smaller loop depths.
>> - SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),
>> - MBB->succ_end());
>> -
>> - // Handle cases where sinking can happen but where the sink point
> isn't a
>
> - // successor. For example:
> - //
> - // x = computation
> - // if () {} else {}
> - // use x
> - //
> - const std::vector<MachineDomTreeNode *> &Children =
> - DT->getNode(MBB)->getChildren();
> - for (const auto &DTChild : Children)
> - // DomTree children of MBB that have MBB as immediate dominator
> are added.
>
> - if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> - // Skip MBBs already added to the Succs vector above.
> - !MBB->isSuccessor(DTChild->getBlock()))
> - Succs.push_back(DTChild->getBlock());
> -
> - // Sort Successors according to their loop depth or block frequency
> info.
>
>> - std::stable_sort(
>> - Succs.begin(), Succs.end(),
>> - [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>> - uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency()
> :
>
> 0;
>
> - uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency()
> :
>
> 0;
>
> - bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> - return HasBlockFreq ? LHSFreq < RHSFreq
> - : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> - });
> - for (SmallVectorImpl<MachineBasicBlock *>::iterator SI =
> Succs.begin(),
>
>> - E = Succs.end(); SI != E; ++SI) {
>> - MachineBasicBlock *SuccBlock = *SI;
>> + for (MachineBasicBlock *SuccBlock :
>> + GetAllSortedSuccessors(MI,
>> + MBB)) {
>> bool LocalUse = false;
>> if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,
>> BreakPHIEdge, LocalUse)) {
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
> <0001-MachineSink-Address-post-commit-review-comments.patch>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150616/4edf7ced/attachment.html>
More information about the llvm-commits
mailing list