[llvm] r239720 - [MachineSink] Improve runtime performance. NFC.

Mon Jun 15 14:25:45 PDT 2015

Hi Quentin,

Here is a patch that I think will address your comments. Let me know if you are ok with it.

By the way, on a different (but related) subject, we find that llvm is 2x to 3x slower than gcc on a specific testscase (a big switch/case in a loop). llvm's codegen lasts about 2+ minutes, with about 40% spent in the Spiller. Although I understand regalloc can take a significant portion of time for this code, I was wondering whether you would be aware of any room for improvement there ?

Cheers,
Arnaud

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Arnaud A. de Grandmaison
> Sent: 15 June 2015 22:35
> To: 'Quentin Colombet'
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [llvm] r239720 - [MachineSink] Improve runtime performance.
> NFC.
> 
> > -----Original Message-----
> > From: Quentin Colombet [mailto:qcolombet at apple.com]
> > Sent: 15 June 2015 19:45
> > To: Arnaud De Grandmaison
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.
> > NFC.
> >
> > Hi Arnaud,
> 
> Hi Quentin,
> 
> >
> > The current implementation of the caching mechanism seems confusing to
> > me.
> > Looking at the comment on the related field seems to imply that the
> > cache is global to the pass.
> > However, when we look at the implementation, the cache is reset each
> > time we process a new block.
> >
> > I found that confusing for two reasons:
> > 1. We invalid the cache for apparently no good reason, i.e., why do we
> > assume the list of successors changed when processing a new block?
> 
> The reason is this is in fact a double-entry cache, but we do not need to have
> this double entry cache for the whole pass duration (I guess I'm answering to
> your second question here as well).
> Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic
> blocks immediately dominated by MI. MI->getParent() will stay the same for
> each instruction processed by ProcessBlock, thus the clearing when another
> basic block is processed.
> 
> > 2. If this cache is only valid for the current process block (and the
> > related traversed blocks), why is this a field global to the structure?
> >
> > I see two ways of fixing that:
> > 1. Have the map being local to process block function and pass it to
> > the functions that use it.
> 
> I considered having the structure declared locally in ProcessBlock, and
> passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink,
> but it seemed a bit clumsy to me. I however agree it would have at least the
> merit of making the cache lifetime explicit.
> 
> > 2. Have the cache being really global and invalid it only when it makes
> sense.
> >
> > Also, side remark, for caching/lazy structures, I expect those to be
> > mutable and have the related function being const… My 2c.
> 
> Given my above comments, I'll modify the patch to make the cache local, as I
> think it makes things clearer.
> 
> >
> > Cheers,
> > -Quentin
> >
> > > On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison
> > <arnaud.degrandmaison at arm.com> wrote:
> > >
> > > Author: aadg
> > > Date: Mon Jun 15 04:09:06 2015
> > > New Revision: 239720
> > >
> > > URL: http://llvm.org/viewvc/llvm-project?rev=239720&view=rev
> > > Log:
> > > [MachineSink] Improve runtime performance. NFC.
> > >
> > > This patch fixes a compilation time issue, when MachineSink faces
> > > PHIs with a huge number of operands. This can happen for example in
> > > goto table based interpreters, where some basic blocks can have
> > > several of those PHIs, each one with several hundreds operands.
> > > MachineSink was spending a significant time re-building and
> > > re-sorting the list of successors of the current MachineBasicBlock.
> > > The computing and sorting of the current MachineBasicBlock successors is
> now cached.
> > >
> > > Modified:
> > >    llvm/trunk/lib/CodeGen/MachineSink.cpp
> > >
> > > Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
> > > URL:
> > > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi
> > > nk .cpp?rev=239720&r1=239719&r2=239720&view=diff
> > >
> >
> ==========================================================
> > ============
> > > ========
> > > --- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
> > > +++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015
> > > @@ -73,6 +73,11 @@ namespace {
> > >
> > >     SparseBitVector<> RegsToClearKillFlags;
> > >
> > > +    // Cache all successors to a MachineBasicBlock, sorted by
> > > + frequency info
> > and
> > > +    // loop depth. AllSuccessors is lazily populated.
> > > +    std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *,
> 4>>
> > > +        AllSuccessors;
> > > +
> > >   public:
> > >     static char ID; // Pass identification
> > >     MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@
> > > namespace {
> > >
> > >     bool PerformTrivialForwardCoalescing(MachineInstr *MI,
> > >                                          MachineBasicBlock *MBB);
> > > +
> > > +    SmallVector<MachineBasicBlock *, 4> &
> > > +    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock
> > > + *MBB);
> > >   };
> > > } // end anonymous namespace
> > >
> > > @@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio
> > >     // Process all basic blocks.
> > >     CEBCandidates.clear();
> > >     ToSplit.clear();
> > > -    for (MachineFunction::iterator I = MF.begin(), E = MF.end();
> > > -         I != E; ++I)
> > > -      MadeChange |= ProcessBlock(*I);
> > > +    for (auto &MBB: MF)
> > > +      MadeChange |= ProcessBlock(MBB);
> > >
> > >     // If we have anything we marked as toSplit, split it now.
> > >     for (auto &Pair : ToSplit) {
> > > @@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin
> > >
> > >   bool MadeChange = false;
> > >
> > > +  // MBB changed, reset all cached information.
> > > +  AllSuccessors.clear();
> > > +
> > >   // Walk the basic block bottom-up.  Remember if we saw a store.
> > >   MachineBasicBlock::iterator I = MBB.end();
> > >   --I;
> > > @@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT
> > >   return false;
> > > }
> > >
> > > +/// Get the sorted sequence of successors for this
> > > +MachineBasicBlock, possibly /// computing it if it was not already
> cached.
> > > +SmallVector<MachineBasicBlock *, 4> &
> > > +MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,
> > > +MachineBasicBlock *MBB) {
> > > +
> > > +  // Do we have the sorted successors in cache ?
> > > +  auto Succs = AllSuccessors.find(MBB);  if (Succs !=
> > > + AllSuccessors.end())
> > > +    return Succs->second;
> > > +
> > > +  SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),
> > > +                                               MBB->succ_end());
> > > +
> > > +  // Handle cases where sinking can happen but where the sink point
> > > + isn't a  // successor. For example:
> > > +  //
> > > +  //   x = computation
> > > +  //   if () {} else {}
> > > +  //   use x
> > > +  //
> > > +  const std::vector<MachineDomTreeNode *> &Children =
> > > +    DT->getNode(MBB)->getChildren();  for (const auto &DTChild :
> > > + Children)
> > > +    // DomTree children of MBB that have MBB as immediate dominator
> > are added.
> > > +    if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> > > +        // Skip MBBs already added to the AllSuccs vector above.
> > > +        !MBB->isSuccessor(DTChild->getBlock()))
> > > +      AllSuccs.push_back(DTChild->getBlock());
> > > +
> > > +  // Sort Successors according to their loop depth or block frequency
> info.
> > > +  std::stable_sort(
> > > +      AllSuccs.begin(), AllSuccs.end(),
> > > +      [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
> > > +        uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :
> 0;
> > > +        uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :
> > 0;
> > > +        bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> > > +        return HasBlockFreq ? LHSFreq < RHSFreq
> > > +                            : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> > > +      });
> > > +
> > > +  auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));
> > > +
> > > +  return it.first->second;
> > > +}
> > > +
> > > /// FindSuccToSinkTo - Find a successor to sink this instruction to.
> > > MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr
> *MI,
> > >                                    MachineBasicBlock *MBB, @@
> > > -579,38
> > > +634,7 @@ MachineBasicBlock *MachineSinking::FindS
> > >       // we should sink to. If we have reliable block frequency information
> > >       // (frequency != 0) available, give successors with smaller frequencies
> > >       // higher priority, otherwise prioritize smaller loop depths.
> > > -      SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),
> > > -                                               MBB->succ_end());
> > > -
> > > -      // Handle cases where sinking can happen but where the sink point
> > isn't a
> > > -      // successor. For example:
> > > -      //
> > > -      //   x = computation
> > > -      //   if () {} else {}
> > > -      //   use x
> > > -      //
> > > -      const std::vector<MachineDomTreeNode *> &Children =
> > > -        DT->getNode(MBB)->getChildren();
> > > -      for (const auto &DTChild : Children)
> > > -        // DomTree children of MBB that have MBB as immediate dominator
> > are added.
> > > -        if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> > > -            // Skip MBBs already added to the Succs vector above.
> > > -            !MBB->isSuccessor(DTChild->getBlock()))
> > > -          Succs.push_back(DTChild->getBlock());
> > > -
> > > -      // Sort Successors according to their loop depth or block frequency
> info.
> > > -      std::stable_sort(
> > > -          Succs.begin(), Succs.end(),
> > > -          [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
> > > -            uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency()
> :
> > 0;
> > > -            uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency()
> :
> > 0;
> > > -            bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> > > -            return HasBlockFreq ? LHSFreq < RHSFreq
> > > -                                : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> > > -          });
> > > -      for (SmallVectorImpl<MachineBasicBlock *>::iterator SI =
> Succs.begin(),
> > > -             E = Succs.end(); SI != E; ++SI) {
> > > -        MachineBasicBlock *SuccBlock = *SI;
> > > +      for (MachineBasicBlock *SuccBlock :
> > > + GetAllSortedSuccessors(MI,
> > > + MBB)) {
> > >         bool LocalUse = false;
> > >         if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,
> > >                                     BreakPHIEdge, LocalUse)) {
> > >
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-MachineSink-Address-post-commit-review-comments.patch
Type: application/octet-stream
Size: 8113 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150615/cb176b47/attachment.obj>