[llvm] r239720 - [MachineSink] Improve runtime performance. NFC.

Mon Jun 15 15:25:40 PDT 2015

Hi Arnaud,

Thanks for making a new patch that fast!

LGTM with one question:
-    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB);
+    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB,
+                           AllSuccsCache &AllSuccessors) const;

I haven’t checked but does this method need to access anything in MachineSinkPass or could it be marked as static?

> On Jun 15, 2015, at 2:25 PM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com> wrote:
> 
> Hi Quentin,
> 
> Here is a patch that I think will address your comments. Let me know if you are ok with it.
> 
> By the way, on a different (but related) subject, we find that llvm is 2x to 3x slower than gcc on a specific testscase (a big switch/case in a loop). llvm's codegen lasts about 2+ minutes, with about 40% spent in the Spiller. Although I understand regalloc can take a significant portion of time for this code, I was wondering whether you would be aware of any room for improvement there ?

There are things that can be improved, for instance, for the spiller we have this: https://llvm.org/bugs/show_bug.cgi?id=17409 <https://llvm.org/bugs/show_bug.cgi?id=17409>.
If you have more information on the profile of the case you want to solve, I may have some other suggestions. This is the only one I have in my mind for the spiller, but I have a few on the coalescer as well.

Thanks,
Q. 

> 
> Cheers,
> Arnaud
> 
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of Arnaud A. de Grandmaison
>> Sent: 15 June 2015 22:35
>> To: 'Quentin Colombet'
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: RE: [llvm] r239720 - [MachineSink] Improve runtime performance.
>> NFC.
>> 
>>> -----Original Message-----
>>> From: Quentin Colombet [mailto:qcolombet at apple.com]
>>> Sent: 15 June 2015 19:45
>>> To: Arnaud De Grandmaison
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.
>>> NFC.
>>> 
>>> Hi Arnaud,
>> 
>> Hi Quentin,
>> 
>>> 
>>> The current implementation of the caching mechanism seems confusing to
>>> me.
>>> Looking at the comment on the related field seems to imply that the
>>> cache is global to the pass.
>>> However, when we look at the implementation, the cache is reset each
>>> time we process a new block.
>>> 
>>> I found that confusing for two reasons:
>>> 1. We invalid the cache for apparently no good reason, i.e., why do we
>>> assume the list of successors changed when processing a new block?
>> 
>> The reason is this is in fact a double-entry cache, but we do not need to have
>> this double entry cache for the whole pass duration (I guess I'm answering to
>> your second question here as well).
>> Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic
>> blocks immediately dominated by MI. MI->getParent() will stay the same for
>> each instruction processed by ProcessBlock, thus the clearing when another
>> basic block is processed.
>> 
>>> 2. If this cache is only valid for the current process block (and the
>>> related traversed blocks), why is this a field global to the structure?
>>> 
>>> I see two ways of fixing that:
>>> 1. Have the map being local to process block function and pass it to
>>> the functions that use it.
>> 
>> I considered having the structure declared locally in ProcessBlock, and
>> passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink,
>> but it seemed a bit clumsy to me. I however agree it would have at least the
>> merit of making the cache lifetime explicit.
>> 
>>> 2. Have the cache being really global and invalid it only when it makes
>> sense.
>>> 
>>> Also, side remark, for caching/lazy structures, I expect those to be
>>> mutable and have the related function being const… My 2c.
>> 
>> Given my above comments, I'll modify the patch to make the cache local, as I
>> think it makes things clearer.
>> 
>>> 
>>> Cheers,
>>> -Quentin
>>> 
>>>> On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison
>>> <arnaud.degrandmaison at arm.com> wrote:
>>>> 
>>>> Author: aadg
>>>> Date: Mon Jun 15 04:09:06 2015
>>>> New Revision: 239720
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=239720&view=rev
>>>> Log:
>>>> [MachineSink] Improve runtime performance. NFC.
>>>> 
>>>> This patch fixes a compilation time issue, when MachineSink faces
>>>> PHIs with a huge number of operands. This can happen for example in
>>>> goto table based interpreters, where some basic blocks can have
>>>> several of those PHIs, each one with several hundreds operands.
>>>> MachineSink was spending a significant time re-building and
>>>> re-sorting the list of successors of the current MachineBasicBlock.
>>>> The computing and sorting of the current MachineBasicBlock successors is
>> now cached.
>>>> 
>>>> Modified:
>>>>   llvm/trunk/lib/CodeGen/MachineSink.cpp
>>>> 
>>>> Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi
>>>> nk .cpp?rev=239720&r1=239719&r2=239720&view=diff
>>>> 
>>> 
>> ==========================================================
>>> ============
>>>> ========
>>>> --- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015
>>>> @@ -73,6 +73,11 @@ namespace {
>>>> 
>>>>    SparseBitVector<> RegsToClearKillFlags;
>>>> 
>>>> +    // Cache all successors to a MachineBasicBlock, sorted by
>>>> + frequency info
>>> and
>>>> +    // loop depth. AllSuccessors is lazily populated.
>>>> +    std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *,
>> 4>>
>>>> +        AllSuccessors;
>>>> +
>>>>  public:
>>>>    static char ID; // Pass identification
>>>>    MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@
>>>> namespace {
>>>> 
>>>>    bool PerformTrivialForwardCoalescing(MachineInstr *MI,
>>>>                                         MachineBasicBlock *MBB);
>>>> +
>>>> +    SmallVector<MachineBasicBlock *, 4> &
>>>> +    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock
>>>> + *MBB);
>>>>  };
>>>> } // end anonymous namespace
>>>> 
>>>> @@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio
>>>>    // Process all basic blocks.
>>>>    CEBCandidates.clear();
>>>>    ToSplit.clear();
>>>> -    for (MachineFunction::iterator I = MF.begin(), E = MF.end();
>>>> -         I != E; ++I)
>>>> -      MadeChange |= ProcessBlock(*I);
>>>> +    for (auto &MBB: MF)
>>>> +      MadeChange |= ProcessBlock(MBB);
>>>> 
>>>>    // If we have anything we marked as toSplit, split it now.
>>>>    for (auto &Pair : ToSplit) {
>>>> @@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin
>>>> 
>>>>  bool MadeChange = false;
>>>> 
>>>> +  // MBB changed, reset all cached information.
>>>> +  AllSuccessors.clear();
>>>> +
>>>>  // Walk the basic block bottom-up.  Remember if we saw a store.
>>>>  MachineBasicBlock::iterator I = MBB.end();
>>>>  --I;
>>>> @@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT
>>>>  return false;
>>>> }
>>>> 
>>>> +/// Get the sorted sequence of successors for this
>>>> +MachineBasicBlock, possibly /// computing it if it was not already
>> cached.
>>>> +SmallVector<MachineBasicBlock *, 4> &
>>>> +MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,
>>>> +MachineBasicBlock *MBB) {
>>>> +
>>>> +  // Do we have the sorted successors in cache ?
>>>> +  auto Succs = AllSuccessors.find(MBB);  if (Succs !=
>>>> + AllSuccessors.end())
>>>> +    return Succs->second;
>>>> +
>>>> +  SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),
>>>> +                                               MBB->succ_end());
>>>> +
>>>> +  // Handle cases where sinking can happen but where the sink point
>>>> + isn't a  // successor. For example:
>>>> +  //
>>>> +  //   x = computation
>>>> +  //   if () {} else {}
>>>> +  //   use x
>>>> +  //
>>>> +  const std::vector<MachineDomTreeNode *> &Children =
>>>> +    DT->getNode(MBB)->getChildren();  for (const auto &DTChild :
>>>> + Children)
>>>> +    // DomTree children of MBB that have MBB as immediate dominator
>>> are added.
>>>> +    if (DTChild->getIDom()->getBlock() == MI->getParent() &&
>>>> +        // Skip MBBs already added to the AllSuccs vector above.
>>>> +        !MBB->isSuccessor(DTChild->getBlock()))
>>>> +      AllSuccs.push_back(DTChild->getBlock());
>>>> +
>>>> +  // Sort Successors according to their loop depth or block frequency
>> info.
>>>> +  std::stable_sort(
>>>> +      AllSuccs.begin(), AllSuccs.end(),
>>>> +      [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>>>> +        uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :
>> 0;
>>>> +        uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :
>>> 0;
>>>> +        bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
>>>> +        return HasBlockFreq ? LHSFreq < RHSFreq
>>>> +                            : LI->getLoopDepth(L) < LI->getLoopDepth(R);
>>>> +      });
>>>> +
>>>> +  auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));
>>>> +
>>>> +  return it.first->second;
>>>> +}
>>>> +
>>>> /// FindSuccToSinkTo - Find a successor to sink this instruction to.
>>>> MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr
>> *MI,
>>>>                                   MachineBasicBlock *MBB, @@
>>>> -579,38
>>>> +634,7 @@ MachineBasicBlock *MachineSinking::FindS
>>>>      // we should sink to. If we have reliable block frequency information
>>>>      // (frequency != 0) available, give successors with smaller frequencies
>>>>      // higher priority, otherwise prioritize smaller loop depths.
>>>> -      SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),
>>>> -                                               MBB->succ_end());
>>>> -
>>>> -      // Handle cases where sinking can happen but where the sink point
>>> isn't a
>>>> -      // successor. For example:
>>>> -      //
>>>> -      //   x = computation
>>>> -      //   if () {} else {}
>>>> -      //   use x
>>>> -      //
>>>> -      const std::vector<MachineDomTreeNode *> &Children =
>>>> -        DT->getNode(MBB)->getChildren();
>>>> -      for (const auto &DTChild : Children)
>>>> -        // DomTree children of MBB that have MBB as immediate dominator
>>> are added.
>>>> -        if (DTChild->getIDom()->getBlock() == MI->getParent() &&
>>>> -            // Skip MBBs already added to the Succs vector above.
>>>> -            !MBB->isSuccessor(DTChild->getBlock()))
>>>> -          Succs.push_back(DTChild->getBlock());
>>>> -
>>>> -      // Sort Successors according to their loop depth or block frequency
>> info.
>>>> -      std::stable_sort(
>>>> -          Succs.begin(), Succs.end(),
>>>> -          [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
>>>> -            uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency()
>> :
>>> 0;
>>>> -            uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency()
>> :
>>> 0;
>>>> -            bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
>>>> -            return HasBlockFreq ? LHSFreq < RHSFreq
>>>> -                                : LI->getLoopDepth(L) < LI->getLoopDepth(R);
>>>> -          });
>>>> -      for (SmallVectorImpl<MachineBasicBlock *>::iterator SI =
>> Succs.begin(),
>>>> -             E = Succs.end(); SI != E; ++SI) {
>>>> -        MachineBasicBlock *SuccBlock = *SI;
>>>> +      for (MachineBasicBlock *SuccBlock :
>>>> + GetAllSortedSuccessors(MI,
>>>> + MBB)) {
>>>>        bool LocalUse = false;
>>>>        if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,
>>>>                                    BreakPHIEdge, LocalUse)) {
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> <0001-MachineSink-Address-post-commit-review-comments.patch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150615/16b05ec9/attachment.html>