[llvm] r239720 - [MachineSink] Improve runtime performance. NFC.
Quentin Colombet
qcolombet at apple.com
Mon Jun 15 15:25:40 PDT 2015
Hi Arnaud,
Thanks for making a new patch that fast!
LGTM with one question:
- GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB);
+ GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB,
+ AllSuccsCache &AllSuccessors) const;
I haven’t checked but does this method need to access anything in MachineSinkPass or could it be marked as static?
> On Jun 15, 2015, at 2:25 PM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com> wrote:
>
> Hi Quentin,
>
> Here is a patch that I think will address your comments. Let me know if you are ok with it.
>
> By the way, on a different (but related) subject, we find that llvm is 2x to 3x slower than gcc on a specific testscase (a big switch/case in a loop). llvm's codegen lasts about 2+ minutes, with about 40% spent in the Spiller. Although I understand regalloc can take a significant portion of time for this code, I was wondering whether you would be aware of any room for improvement there ?
There are things that can be improved, for instance, for the spiller we have this: https://llvm.org/bugs/show_bug.cgi?id=17409 <https://llvm.org/bugs/show_bug.cgi?id=17409>.
If you have more information on the profile of the case you want to solve, I may have some other suggestions. This is the only one I have in my mind for the spiller, but I have a few on the coalescer as well.
Thanks,
Q.
>
> Cheers,
> Arnaud
>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of Arnaud A. de Grandmaison
>> Sent: 15 June 2015 22:35
>> To: 'Quentin Colombet'
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: RE: [llvm] r239720 - [MachineSink] Improve runtime performance.
>> NFC.
>>
>>> -----Original Message-----
>>> From: Quentin Colombet [mailto:qcolombet at apple.com]
>>> Sent: 15 June 2015 19:45
>>> To: Arnaud De Grandmaison
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.
>>> NFC.
>>>
>>> Hi Arnaud,
>>
>> Hi Quentin,
>>
>>>
>>> The current implementation of the caching mechanism seems confusing to
>>> me.
>>> Looking at the comment on the related field seems to imply that the
>>> cache is global to the pass.
>>> However, when we look at the implementation, the cache is reset each
>>> time we process a new block.
>>>
>>> I found that confusing for two reasons:
>>> 1. We invalid the cache for apparently no good reason, i.e., why do we
>>> assume the list of successors changed when processing a new block?
>>
>> The reason is this is in fact a double-entry cache, but we do not need to have
>> this double entry cache for the whole pass duration (I guess I'm answering to
>> your second question here as well).
>> Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic
>> blocks immediately dominated by MI. MI->getParent() will stay the same for
>> each instruction processed by ProcessBlock, thus the clearing when another
>> basic block is processed.
>>
>>> 2. If this cache is only valid for the current process block (and the
>>> related traversed blocks), why is this a field global to the structure?
>>>
>>> I see two ways of fixing that:
>>> 1. Have the map being local to process block function and pass it to
>>> the functions that use it.
>>
>> I considered having the structure declared locally in ProcessBlock, and
>> passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink,
>> but it seemed a bit clumsy to me. I however agree it would have at least the
>> merit of making the cache lifetime explicit.
>>
>>> 2. Have the cache being really global and invalid it only when it makes
>> sense.
>>>
>>> Also, side remark, for caching/lazy structures, I expect those to be
>>> mutable and have the related function being const… My 2c.
>>
>> Given my above comments, I'll modify the patch to make the cache local, as I
>> think it makes things clearer.
>>
>>>
>>> Cheers,
>>> -Quentin
>>>
>>>> On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison
>>> <arnaud.degrandmaison at arm.com> wrote:
>>>>
>>>> Author: aadg
>>>> Date: Mon Jun 15 04:09:06 2015
>>>> New Revision: 239720
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=239720&view=rev
>>>> Log:
>>>> [MachineSink] Improve runtime performance. NFC.
>>>>
>>>> This patch fixes a compilation time issue, when MachineSink faces
>>>> PHIs with a huge number of operands. This can happen for example in
>>>> goto table based interpreters, where some basic blocks can have
>>>> several of those PHIs, each one with several hundreds operands.
>>>> MachineSink was spending a significant time re-building and
>>>> re-sorting the list of successors of the current MachineBasicBlock.
>>>> The computing and sorting of the current MachineBasicBlock successors is
>> now cached.
>>>>
>>>> Modified:
>>>> llvm/trunk/lib/CodeGen/MachineSink.cpp
>>>>
>>>> Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi
>>>> nk .cpp?rev=239720&r1=239719&r2=239720&view=diff
>>>>
>>>
>> ==========================================================
>>> ============
>>>> ========
>>>> --- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015
>>>> @@ -73,6 +73,11 @@ namespace {
>>>>
>>>> SparseBitVector<> RegsToClearKillFlags;
>>>>
>>>> + // Cache all successors to a MachineBasicBlock, sorted by
>>>> + frequency info
>>> and
>>>> + // loop depth. AllSuccessors is lazily populated.
>>>> + std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *,
>> 4>>
>>>> + AllSuccessors;
>>>> +
>>>> public:
>>>> static char ID; // Pass identification
>>>> MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@
>>>> namespace {
>>>>
>>>> bool PerformTrivialForwardCoalescing(MachineInstr *MI,
>>>> MachineBasicBlock *MBB);
>>>> +
>>>> + SmallVector<MachineBasicBlock *, 4> &
>>>> + GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock
>>>> + *MBB);
>>>> };
>>>> } // end anonymous namespace
>>>>
>>>> @@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio
>>>> // Process all basic blocks.
>>>> CEBCandidates.clear();
>>>> ToSplit.clear();
>>>> - for (MachineFunction::iterator I = MF.begin(), E = MF.end();
>>>> - I != E; ++I)
>>>> - MadeChange |= ProcessBlock(*I);
>>>> + for (auto &MBB: MF)
>>>> + MadeChange |= ProcessBlock(MBB);
>>>>
>>>> // If we have anything we marked as toSplit, split it now.
>>>> for (auto &Pair : ToSplit) {
>>>> @@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin
>>>>
>>>> bool MadeChange = false;
>>>>
>>>> + // MBB changed, reset all cached information.
>>>> + AllSuccessors.clear();
>>>> +
>>>> // Walk the basic block bottom-up. Remember if we saw a store.
>>>> MachineBasicBlock::iterator I = MBB.end();
>>>> --I;
>>>> @@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT
>>>> return false;
>>>> }
>>>>
>>>> +/// Get the sorted sequence of successors for this
>>>> +MachineBasicBlock, possibly /// computing it if it was not already
>> cached.
>>>> +SmallVector<MachineBasicBlock *, 4> &
>>>> +MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,
>>>> +MachineBasicBlock *MBB) {
>>>> +
>>>> + // Do we have the sorted successors in cache ?
>>>> + auto Succs = AllSuccessors.find(MBB); if (Succs !=
>>>> + AllSuccessors.end())
>>>> + return Succs->second;
>>>> +
>>>> + SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),
>>>> + MBB->succ_end());
>>>> +
>>>> + // Handle cases where sinking can happen but where the sink point
>>>> + isn't a // successor. For example:
>>>> + //
>>>> + // x = computation
>>>> + // if () {} else {}
>>>> + // use x
>>>> + //
>>>> + const std::vector<MachineDomTreeNode *> &Children =
>>>> + DT->getNode(MBB)->getChildren(); for (const auto &DTChild :
>>>> + Children)
>>>> + // DomTree children of MBB that have MBB as immediate dominator
>>> are added.
>>>> + if (DTChild->getIDom()->getBlock() == MI->getParent() &&
>>>> + // Skip MBBs already added to the AllSuccs vector above.
>>>> + !MBB->isSuccessor(DTChild->getBlock()))
>>>> + AllSuccs.push_back(DTChild->getBlock());
>>>> +
>>>> + // Sort Successors according to their loop depth or block frequency
>> info.
>>>> + std::stable_sort(
>>>> + AllSuccs.begin(), AllSuccs.end(),
>>>> + [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>>>> + uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :
>> 0;
>>>> + uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :
>>> 0;
>>>> + bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
>>>> + return HasBlockFreq ? LHSFreq < RHSFreq
>>>> + : LI->getLoopDepth(L) < LI->getLoopDepth(R);
>>>> + });
>>>> +
>>>> + auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));
>>>> +
>>>> + return it.first->second;
>>>> +}
>>>> +
>>>> /// FindSuccToSinkTo - Find a successor to sink this instruction to.
>>>> MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr
>> *MI,
>>>> MachineBasicBlock *MBB, @@
>>>> -579,38
>>>> +634,7 @@ MachineBasicBlock *MachineSinking::FindS
>>>> // we should sink to. If we have reliable block frequency information
>>>> // (frequency != 0) available, give successors with smaller frequencies
>>>> // higher priority, otherwise prioritize smaller loop depths.
>>>> - SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),
>>>> - MBB->succ_end());
>>>> -
>>>> - // Handle cases where sinking can happen but where the sink point
>>> isn't a
>>>> - // successor. For example:
>>>> - //
>>>> - // x = computation
>>>> - // if () {} else {}
>>>> - // use x
>>>> - //
>>>> - const std::vector<MachineDomTreeNode *> &Children =
>>>> - DT->getNode(MBB)->getChildren();
>>>> - for (const auto &DTChild : Children)
>>>> - // DomTree children of MBB that have MBB as immediate dominator
>>> are added.
>>>> - if (DTChild->getIDom()->getBlock() == MI->getParent() &&
>>>> - // Skip MBBs already added to the Succs vector above.
>>>> - !MBB->isSuccessor(DTChild->getBlock()))
>>>> - Succs.push_back(DTChild->getBlock());
>>>> -
>>>> - // Sort Successors according to their loop depth or block frequency
>> info.
>>>> - std::stable_sort(
>>>> - Succs.begin(), Succs.end(),
>>>> - [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>>>> - uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency()
>> :
>>> 0;
>>>> - uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency()
>> :
>>> 0;
>>>> - bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
>>>> - return HasBlockFreq ? LHSFreq < RHSFreq
>>>> - : LI->getLoopDepth(L) < LI->getLoopDepth(R);
>>>> - });
>>>> - for (SmallVectorImpl<MachineBasicBlock *>::iterator SI =
>> Succs.begin(),
>>>> - E = Succs.end(); SI != E; ++SI) {
>>>> - MachineBasicBlock *SuccBlock = *SI;
>>>> + for (MachineBasicBlock *SuccBlock :
>>>> + GetAllSortedSuccessors(MI,
>>>> + MBB)) {
>>>> bool LocalUse = false;
>>>> if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,
>>>> BreakPHIEdge, LocalUse)) {
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> <0001-MachineSink-Address-post-commit-review-comments.patch>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150615/16b05ec9/attachment.html>
More information about the llvm-commits
mailing list