[llvm] r239720 - [MachineSink] Improve runtime performance. NFC.

Quentin Colombet qcolombet at apple.com
Tue Jun 16 09:38:45 PDT 2015


Hi Arnaud,

> On Jun 16, 2015, at 2:07 AM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com> wrote:
> 
> Hi Quentin,
>  
> GetAllSortedSuccessors does use some members of the MachineSink pass (MBFI, LI, DT), so I think it makes sense to leave it as a private method of MachineSink.

Thanks for checking!

Cheers,
-Quentin

>  
> I committed the fixes @ r239807.
>  
> Regarding the Spiller runtime, PR17409 is indeed the problem we are facing.
>  
> Cheers,
> Arnaud
>  
>  
> From: Quentin Colombet [mailto:qcolombet at apple.com] 
> Sent: 16 June 2015 00:26
> To: Arnaud De Grandmaison
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance. NFC.
>  
> Hi Arnaud,
>  
> Thanks for making a new patch that fast!
>  
> LGTM with one question:
> -    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB);
> +    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB,
> +                           AllSuccsCache &AllSuccessors) const;
>  
> I haven’t checked but does this method need to access anything in MachineSinkPass or could it be marked as static?
>  
>  
>  
>> On Jun 15, 2015, at 2:25 PM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>> wrote:
>>  
>> Hi Quentin,
>> 
>> Here is a patch that I think will address your comments. Let me know if you are ok with it.
>> 
>> By the way, on a different (but related) subject, we find that llvm is 2x to 3x slower than gcc on a specific testscase (a big switch/case in a loop). llvm's codegen lasts about 2+ minutes, with about 40% spent in the Spiller. Although I understand regalloc can take a significant portion of time for this code, I was wondering whether you would be aware of any room for improvement there ?
>  
> There are things that can be improved, for instance, for the spiller we have this: https://llvm.org/bugs/show_bug.cgi?id=17409 <https://llvm.org/bugs/show_bug.cgi?id=17409>.
> If you have more information on the profile of the case you want to solve, I may have some other suggestions. This is the only one I have in my mind for the spiller, but I have a few on the coalescer as well.
>  
> Thanks,
> Q. 
> 
> 
> 
> Cheers,
> Arnaud
> 
> 
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu <mailto:llvm-commits-bounces at cs.uiuc.edu> [mailto:llvm-commits-
> bounces at cs.uiuc.edu <mailto:bounces at cs.uiuc.edu>] On Behalf Of Arnaud A. de Grandmaison
> Sent: 15 June 2015 22:35
> To: 'Quentin Colombet'
> Cc: llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> Subject: RE: [llvm] r239720 - [MachineSink] Improve runtime performance.
> NFC.
> 
> 
> -----Original Message-----
> From: Quentin Colombet [mailto:qcolombet at apple.com <mailto:qcolombet at apple.com>]
> Sent: 15 June 2015 19:45
> To: Arnaud De Grandmaison
> Cc: llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.
> NFC.
> 
> Hi Arnaud,
> 
> Hi Quentin,
> 
> 
> 
> The current implementation of the caching mechanism seems confusing to
> me.
> Looking at the comment on the related field seems to imply that the
> cache is global to the pass.
> However, when we look at the implementation, the cache is reset each
> time we process a new block.
> 
> I found that confusing for two reasons:
> 1. We invalid the cache for apparently no good reason, i.e., why do we
> assume the list of successors changed when processing a new block?
> 
> The reason is this is in fact a double-entry cache, but we do not need to have
> this double entry cache for the whole pass duration (I guess I'm answering to
> your second question here as well).
> Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic
> blocks immediately dominated by MI. MI->getParent() will stay the same for
> each instruction processed by ProcessBlock, thus the clearing when another
> basic block is processed.
> 
> 
> 2. If this cache is only valid for the current process block (and the
> related traversed blocks), why is this a field global to the structure?
> 
> I see two ways of fixing that:
> 1. Have the map being local to process block function and pass it to
> the functions that use it.
> 
> I considered having the structure declared locally in ProcessBlock, and
> passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink,
> but it seemed a bit clumsy to me. I however agree it would have at least the
> merit of making the cache lifetime explicit.
> 
> 
> 2. Have the cache being really global and invalid it only when it makes
> sense.
> 
> 
> Also, side remark, for caching/lazy structures, I expect those to be
> mutable and have the related function being const… My 2c.
> 
> Given my above comments, I'll modify the patch to make the cache local, as I
> think it makes things clearer.
> 
> 
> 
> Cheers,
> -Quentin
> 
> 
> On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison
> <arnaud.degrandmaison at arm.com <mailto:arnaud.degrandmaison at arm.com>> wrote:
> 
> 
> Author: aadg
> Date: Mon Jun 15 04:09:06 2015
> New Revision: 239720
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=239720&view=rev <http://llvm.org/viewvc/llvm-project?rev=239720&view=rev>
> Log:
> [MachineSink] Improve runtime performance. NFC.
> 
> This patch fixes a compilation time issue, when MachineSink faces
> PHIs with a huge number of operands. This can happen for example in
> goto table based interpreters, where some basic blocks can have
> several of those PHIs, each one with several hundreds operands.
> MachineSink was spending a significant time re-building and
> re-sorting the list of successors of the current MachineBasicBlock.
> The computing and sorting of the current MachineBasicBlock successors is
> now cached.
> 
>> 
>> Modified:
>>   llvm/trunk/lib/CodeGen/MachineSink.cpp
>> 
>> Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi>
>> nk .cpp?rev=239720&r1=239719&r2=239720&view=diff
>> 
>  
> ==========================================================
> 
> ============
> 
> ========
> --- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
> +++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015
> @@ -73,6 +73,11 @@ namespace {
> 
>    SparseBitVector<> RegsToClearKillFlags;
> 
> +    // Cache all successors to a MachineBasicBlock, sorted by
> + frequency info
> and
> 
> +    // loop depth. AllSuccessors is lazily populated.
> +    std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *,
> 4>>
> 
>> +        AllSuccessors;
>> +
>>  public:
>>    static char ID; // Pass identification
>>    MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@
>> namespace {
>> 
>>    bool PerformTrivialForwardCoalescing(MachineInstr *MI,
>>                                         MachineBasicBlock *MBB);
>> +
>> +    SmallVector<MachineBasicBlock *, 4> &
>> +    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock
>> + *MBB);
>>  };
>> } // end anonymous namespace
>> 
>> @@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio
>>    // Process all basic blocks.
>>    CEBCandidates.clear();
>>    ToSplit.clear();
>> -    for (MachineFunction::iterator I = MF.begin(), E = MF.end();
>> -         I != E; ++I)
>> -      MadeChange |= ProcessBlock(*I);
>> +    for (auto &MBB: MF)
>> +      MadeChange |= ProcessBlock(MBB);
>> 
>>    // If we have anything we marked as toSplit, split it now.
>>    for (auto &Pair : ToSplit) {
>> @@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin
>> 
>>  bool MadeChange = false;
>> 
>> +  // MBB changed, reset all cached information.
>> +  AllSuccessors.clear();
>> +
>>  // Walk the basic block bottom-up.  Remember if we saw a store.
>>  MachineBasicBlock::iterator I = MBB.end();
>>  --I;
>> @@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT
>>  return false;
>> }
>> 
>> +/// Get the sorted sequence of successors for this
>> +MachineBasicBlock, possibly /// computing it if it was not already
> cached.
> 
>> +SmallVector<MachineBasicBlock *, 4> &
>> +MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,
>> +MachineBasicBlock *MBB) {
>> +
>> +  // Do we have the sorted successors in cache ?
>> +  auto Succs = AllSuccessors.find(MBB);  if (Succs !=
>> + AllSuccessors.end())
>> +    return Succs->second;
>> +
>> +  SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),
>> +                                               MBB->succ_end());
>> +
>> +  // Handle cases where sinking can happen but where the sink point
>> + isn't a  // successor. For example:
>> +  //
>> +  //   x = computation
>> +  //   if () {} else {}
>> +  //   use x
>> +  //
>> +  const std::vector<MachineDomTreeNode *> &Children =
>> +    DT->getNode(MBB)->getChildren();  for (const auto &DTChild :
>> + Children)
>> +    // DomTree children of MBB that have MBB as immediate dominator
> are added.
> 
> +    if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> +        // Skip MBBs already added to the AllSuccs vector above.
> +        !MBB->isSuccessor(DTChild->getBlock()))
> +      AllSuccs.push_back(DTChild->getBlock());
> +
> +  // Sort Successors according to their loop depth or block frequency
> info.
> 
>> +  std::stable_sort(
>> +      AllSuccs.begin(), AllSuccs.end(),
>> +      [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>> +        uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :
> 0;
> 
>> +        uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :
> 0;
> 
> +        bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> +        return HasBlockFreq ? LHSFreq < RHSFreq
> +                            : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> +      });
> +
> +  auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));
> +
> +  return it.first->second;
> +}
> +
> /// FindSuccToSinkTo - Find a successor to sink this instruction to.
> MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr
> *MI,
> 
>>                                   MachineBasicBlock *MBB, @@
>> -579,38
>> +634,7 @@ MachineBasicBlock *MachineSinking::FindS
>>      // we should sink to. If we have reliable block frequency information
>>      // (frequency != 0) available, give successors with smaller frequencies
>>      // higher priority, otherwise prioritize smaller loop depths.
>> -      SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),
>> -                                               MBB->succ_end());
>> -
>> -      // Handle cases where sinking can happen but where the sink point
> isn't a
> 
> -      // successor. For example:
> -      //
> -      //   x = computation
> -      //   if () {} else {}
> -      //   use x
> -      //
> -      const std::vector<MachineDomTreeNode *> &Children =
> -        DT->getNode(MBB)->getChildren();
> -      for (const auto &DTChild : Children)
> -        // DomTree children of MBB that have MBB as immediate dominator
> are added.
> 
> -        if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> -            // Skip MBBs already added to the Succs vector above.
> -            !MBB->isSuccessor(DTChild->getBlock()))
> -          Succs.push_back(DTChild->getBlock());
> -
> -      // Sort Successors according to their loop depth or block frequency
> info.
> 
>> -      std::stable_sort(
>> -          Succs.begin(), Succs.end(),
>> -          [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>> -            uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency()
> :
> 
> 0;
> 
> -            uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency()
> :
> 
> 0;
> 
> -            bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> -            return HasBlockFreq ? LHSFreq < RHSFreq
> -                                : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> -          });
> -      for (SmallVectorImpl<MachineBasicBlock *>::iterator SI =
> Succs.begin(),
> 
>> -             E = Succs.end(); SI != E; ++SI) {
>> -        MachineBasicBlock *SuccBlock = *SI;
>> +      for (MachineBasicBlock *SuccBlock :
>> + GetAllSortedSuccessors(MI,
>> + MBB)) {
>>        bool LocalUse = false;
>>        if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,
>>                                    BreakPHIEdge, LocalUse)) {
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
> <0001-MachineSink-Address-post-commit-review-comments.patch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150616/4edf7ced/attachment.html>


More information about the llvm-commits mailing list