[llvm] r239720 - [MachineSink] Improve runtime performance. NFC.

Arnaud A. de Grandmaison arnaud.degrandmaison at arm.com
Mon Jun 15 13:35:24 PDT 2015


> -----Original Message-----
> From: Quentin Colombet [mailto:qcolombet at apple.com]
> Sent: 15 June 2015 19:45
> To: Arnaud De Grandmaison
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.
> NFC.
> 
> Hi Arnaud,

Hi Quentin,

> 
> The current implementation of the caching mechanism seems confusing to
> me.
> Looking at the comment on the related field seems to imply that the cache is
> global to the pass.
> However, when we look at the implementation, the cache is reset each time
> we process a new block.
> 
> I found that confusing for two reasons:
> 1. We invalid the cache for apparently no good reason, i.e., why do we
> assume the list of successors changed when processing a new block?

The reason is this is in fact a double-entry cache, but we do not need to have this double entry cache for the whole pass duration (I guess I'm answering to your second question here as well).
Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic blocks immediately dominated by MI. MI->getParent() will stay the same for each instruction processed by ProcessBlock, thus the clearing when another basic block is processed.

> 2. If this cache is only valid for the current process block (and the related
> traversed blocks), why is this a field global to the structure?
> 
> I see two ways of fixing that:
> 1. Have the map being local to process block function and pass it to the
> functions that use it.

I considered having the structure declared locally in ProcessBlock, and passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink, but it seemed a bit clumsy to me. I however agree it would have at least the merit of making the cache lifetime explicit.

> 2. Have the cache being really global and invalid it only when it makes sense.
> 
> Also, side remark, for caching/lazy structures, I expect those to be mutable
> and have the related function being const… My 2c.

Given my above comments, I'll modify the patch to make the cache local, as I think it makes things clearer.

> 
> Cheers,
> -Quentin
> 
> > On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison
> <arnaud.degrandmaison at arm.com> wrote:
> >
> > Author: aadg
> > Date: Mon Jun 15 04:09:06 2015
> > New Revision: 239720
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=239720&view=rev
> > Log:
> > [MachineSink] Improve runtime performance. NFC.
> >
> > This patch fixes a compilation time issue, when MachineSink faces PHIs
> > with a huge number of operands. This can happen for example in goto
> > table based interpreters, where some basic blocks can have several of
> > those PHIs, each one with several hundreds operands. MachineSink was
> > spending a significant time re-building and re-sorting the list of
> > successors of the current MachineBasicBlock. The computing and sorting
> > of the current MachineBasicBlock successors is now cached.
> >
> > Modified:
> >    llvm/trunk/lib/CodeGen/MachineSink.cpp
> >
> > Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSink
> > .cpp?rev=239720&r1=239719&r2=239720&view=diff
> >
> ==========================================================
> ============
> > ========
> > --- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
> > +++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015
> > @@ -73,6 +73,11 @@ namespace {
> >
> >     SparseBitVector<> RegsToClearKillFlags;
> >
> > +    // Cache all successors to a MachineBasicBlock, sorted by frequency info
> and
> > +    // loop depth. AllSuccessors is lazily populated.
> > +    std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *, 4>>
> > +        AllSuccessors;
> > +
> >   public:
> >     static char ID; // Pass identification
> >     MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@
> > namespace {
> >
> >     bool PerformTrivialForwardCoalescing(MachineInstr *MI,
> >                                          MachineBasicBlock *MBB);
> > +
> > +    SmallVector<MachineBasicBlock *, 4> &
> > +    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB);
> >   };
> > } // end anonymous namespace
> >
> > @@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio
> >     // Process all basic blocks.
> >     CEBCandidates.clear();
> >     ToSplit.clear();
> > -    for (MachineFunction::iterator I = MF.begin(), E = MF.end();
> > -         I != E; ++I)
> > -      MadeChange |= ProcessBlock(*I);
> > +    for (auto &MBB: MF)
> > +      MadeChange |= ProcessBlock(MBB);
> >
> >     // If we have anything we marked as toSplit, split it now.
> >     for (auto &Pair : ToSplit) {
> > @@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin
> >
> >   bool MadeChange = false;
> >
> > +  // MBB changed, reset all cached information.
> > +  AllSuccessors.clear();
> > +
> >   // Walk the basic block bottom-up.  Remember if we saw a store.
> >   MachineBasicBlock::iterator I = MBB.end();
> >   --I;
> > @@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT
> >   return false;
> > }
> >
> > +/// Get the sorted sequence of successors for this MachineBasicBlock,
> > +possibly /// computing it if it was not already cached.
> > +SmallVector<MachineBasicBlock *, 4> &
> > +MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,
> > +MachineBasicBlock *MBB) {
> > +
> > +  // Do we have the sorted successors in cache ?
> > +  auto Succs = AllSuccessors.find(MBB);  if (Succs !=
> > + AllSuccessors.end())
> > +    return Succs->second;
> > +
> > +  SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),
> > +                                               MBB->succ_end());
> > +
> > +  // Handle cases where sinking can happen but where the sink point
> > + isn't a  // successor. For example:
> > +  //
> > +  //   x = computation
> > +  //   if () {} else {}
> > +  //   use x
> > +  //
> > +  const std::vector<MachineDomTreeNode *> &Children =
> > +    DT->getNode(MBB)->getChildren();
> > +  for (const auto &DTChild : Children)
> > +    // DomTree children of MBB that have MBB as immediate dominator
> are added.
> > +    if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> > +        // Skip MBBs already added to the AllSuccs vector above.
> > +        !MBB->isSuccessor(DTChild->getBlock()))
> > +      AllSuccs.push_back(DTChild->getBlock());
> > +
> > +  // Sort Successors according to their loop depth or block frequency info.
> > +  std::stable_sort(
> > +      AllSuccs.begin(), AllSuccs.end(),
> > +      [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
> > +        uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() : 0;
> > +        uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :
> 0;
> > +        bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> > +        return HasBlockFreq ? LHSFreq < RHSFreq
> > +                            : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> > +      });
> > +
> > +  auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));
> > +
> > +  return it.first->second;
> > +}
> > +
> > /// FindSuccToSinkTo - Find a successor to sink this instruction to.
> > MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr *MI,
> >                                    MachineBasicBlock *MBB, @@ -579,38
> > +634,7 @@ MachineBasicBlock *MachineSinking::FindS
> >       // we should sink to. If we have reliable block frequency information
> >       // (frequency != 0) available, give successors with smaller frequencies
> >       // higher priority, otherwise prioritize smaller loop depths.
> > -      SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),
> > -                                               MBB->succ_end());
> > -
> > -      // Handle cases where sinking can happen but where the sink point
> isn't a
> > -      // successor. For example:
> > -      //
> > -      //   x = computation
> > -      //   if () {} else {}
> > -      //   use x
> > -      //
> > -      const std::vector<MachineDomTreeNode *> &Children =
> > -        DT->getNode(MBB)->getChildren();
> > -      for (const auto &DTChild : Children)
> > -        // DomTree children of MBB that have MBB as immediate dominator
> are added.
> > -        if (DTChild->getIDom()->getBlock() == MI->getParent() &&
> > -            // Skip MBBs already added to the Succs vector above.
> > -            !MBB->isSuccessor(DTChild->getBlock()))
> > -          Succs.push_back(DTChild->getBlock());
> > -
> > -      // Sort Successors according to their loop depth or block frequency info.
> > -      std::stable_sort(
> > -          Succs.begin(), Succs.end(),
> > -          [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
> > -            uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :
> 0;
> > -            uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :
> 0;
> > -            bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
> > -            return HasBlockFreq ? LHSFreq < RHSFreq
> > -                                : LI->getLoopDepth(L) < LI->getLoopDepth(R);
> > -          });
> > -      for (SmallVectorImpl<MachineBasicBlock *>::iterator SI = Succs.begin(),
> > -             E = Succs.end(); SI != E; ++SI) {
> > -        MachineBasicBlock *SuccBlock = *SI;
> > +      for (MachineBasicBlock *SuccBlock : GetAllSortedSuccessors(MI,
> > + MBB)) {
> >         bool LocalUse = false;
> >         if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,
> >                                     BreakPHIEdge, LocalUse)) {
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits







More information about the llvm-commits mailing list