[llvm] r239720 - [MachineSink] Improve runtime performance. NFC.

Tue Jun 16 02:07:05 PDT 2015

Hi Quentin,

 

GetAllSortedSuccessors does use some members of the MachineSink pass (MBFI, LI, DT), so I think it makes sense to leave it as a private method of MachineSink.

 

I committed the fixes @ r239807.

 

Regarding the Spiller runtime, PR17409 is indeed the problem we are facing.

 

Cheers,

Arnaud

 

 

From: Quentin Colombet [mailto:qcolombet at apple.com] 
Sent: 16 June 2015 00:26
To: Arnaud De Grandmaison
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance. NFC.

 

Hi Arnaud,

 

Thanks for making a new patch that fast!

 

LGTM with one question:

-    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB);

+    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock *MBB,

+                           AllSuccsCache &AllSuccessors) const;

 

I haven’t checked but does this method need to access anything in MachineSinkPass or could it be marked as static?

 

 

 

On Jun 15, 2015, at 2:25 PM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com> wrote:

 

Hi Quentin,

Here is a patch that I think will address your comments. Let me know if you are ok with it.

By the way, on a different (but related) subject, we find that llvm is 2x to 3x slower than gcc on a specific testscase (a big switch/case in a loop). llvm's codegen lasts about 2+ minutes, with about 40% spent in the Spiller. Although I understand regalloc can take a significant portion of time for this code, I was wondering whether you would be aware of any room for improvement there ?

 

There are things that can be improved, for instance, for the spiller we have this: https://llvm.org/bugs/show_bug.cgi?id=17409.

If you have more information on the profile of the case you want to solve, I may have some other suggestions. This is the only one I have in my mind for the spiller, but I have a few on the coalescer as well.

 

Thanks,

Q. 






Cheers,
Arnaud




-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
bounces at cs.uiuc.edu] On Behalf Of Arnaud A. de Grandmaison
Sent: 15 June 2015 22:35
To: 'Quentin Colombet'
Cc: llvm-commits at cs.uiuc.edu
Subject: RE: [llvm] r239720 - [MachineSink] Improve runtime performance.
NFC.




-----Original Message-----
From: Quentin Colombet [mailto:qcolombet at apple.com]
Sent: 15 June 2015 19:45
To: Arnaud De Grandmaison
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r239720 - [MachineSink] Improve runtime performance.
NFC.

Hi Arnaud,


Hi Quentin,





The current implementation of the caching mechanism seems confusing to
me.
Looking at the comment on the related field seems to imply that the
cache is global to the pass.
However, when we look at the implementation, the cache is reset each
time we process a new block.

I found that confusing for two reasons:
1. We invalid the cache for apparently no good reason, i.e., why do we
assume the list of successors changed when processing a new block?


The reason is this is in fact a double-entry cache, but we do not need to have
this double entry cache for the whole pass duration (I guess I'm answering to
your second question here as well).
Looking at GetAllSortedSuccessors, beside MBB's successors, it contains basic
blocks immediately dominated by MI. MI->getParent() will stay the same for
each instruction processed by ProcessBlock, thus the clearing when another
basic block is processed.




2. If this cache is only valid for the current process block (and the
related traversed blocks), why is this a field global to the structure?

I see two ways of fixing that:
1. Have the map being local to process block function and pass it to
the functions that use it.


I considered having the structure declared locally in ProcessBlock, and
passing it around to SinkInstruction, FindSuccToSinkTo and isProfitableToSink,
but it seemed a bit clumsy to me. I however agree it would have at least the
merit of making the cache lifetime explicit.




2. Have the cache being really global and invalid it only when it makes

sense.




Also, side remark, for caching/lazy structures, I expect those to be
mutable and have the related function being const… My 2c.


Given my above comments, I'll modify the patch to make the cache local, as I
think it makes things clearer.





Cheers,
-Quentin




On Jun 15, 2015, at 2:09 AM, Arnaud A. de Grandmaison

<arnaud.degrandmaison at arm.com> wrote:




Author: aadg
Date: Mon Jun 15 04:09:06 2015
New Revision: 239720

URL: http://llvm.org/viewvc/llvm-project?rev=239720 <http://llvm.org/viewvc/llvm-project?rev=239720&view=rev> &view=rev
Log:
[MachineSink] Improve runtime performance. NFC.

This patch fixes a compilation time issue, when MachineSink faces
PHIs with a huge number of operands. This can happen for example in
goto table based interpreters, where some basic blocks can have
several of those PHIs, each one with several hundreds operands.
MachineSink was spending a significant time re-building and
re-sorting the list of successors of the current MachineBasicBlock.
The computing and sorting of the current MachineBasicBlock successors is

now cached.




Modified:
  llvm/trunk/lib/CodeGen/MachineSink.cpp

Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
URL:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineSi
nk .cpp?rev=239720&r1=239719&r2=239720&view=diff

 

==========================================================



============



========

--- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
+++ llvm/trunk/lib/CodeGen/MachineSink.cpp Mon Jun 15 04:09:06 2015
@@ -73,6 +73,11 @@ namespace {

   SparseBitVector<> RegsToClearKillFlags;

+    // Cache all successors to a MachineBasicBlock, sorted by
+ frequency info

and



+    // loop depth. AllSuccessors is lazily populated.
+    std::map<MachineBasicBlock *, SmallVector<MachineBasicBlock *,

4>>



+        AllSuccessors;
+
 public:
   static char ID; // Pass identification
   MachineSinking() : MachineFunctionPass(ID) { @@ -132,6 +137,9 @@
namespace {

   bool PerformTrivialForwardCoalescing(MachineInstr *MI,
                                        MachineBasicBlock *MBB);
+
+    SmallVector<MachineBasicBlock *, 4> &
+    GetAllSortedSuccessors(MachineInstr *MI, MachineBasicBlock
+ *MBB);
 };
} // end anonymous namespace

@@ -269,9 +277,8 @@ bool MachineSinking::runOnMachineFunctio
   // Process all basic blocks.
   CEBCandidates.clear();
   ToSplit.clear();
-    for (MachineFunction::iterator I = MF.begin(), E = MF.end();
-         I != E; ++I)
-      MadeChange |= ProcessBlock(*I);
+    for (auto &MBB: MF)
+      MadeChange |= ProcessBlock(MBB);

   // If we have anything we marked as toSplit, split it now.
   for (auto &Pair : ToSplit) {
@@ -310,6 +317,9 @@ bool MachineSinking::ProcessBlock(Machin

 bool MadeChange = false;

+  // MBB changed, reset all cached information.
+  AllSuccessors.clear();
+
 // Walk the basic block bottom-up.  Remember if we saw a store.
 MachineBasicBlock::iterator I = MBB.end();
 --I;
@@ -522,6 +532,51 @@ bool MachineSinking::isProfitableToSinkT
 return false;
}

+/// Get the sorted sequence of successors for this
+MachineBasicBlock, possibly /// computing it if it was not already

cached.



+SmallVector<MachineBasicBlock *, 4> &
+MachineSinking::GetAllSortedSuccessors(MachineInstr *MI,
+MachineBasicBlock *MBB) {
+
+  // Do we have the sorted successors in cache ?
+  auto Succs = AllSuccessors.find(MBB);  if (Succs !=
+ AllSuccessors.end())
+    return Succs->second;
+
+  SmallVector<MachineBasicBlock *, 4> AllSuccs(MBB->succ_begin(),
+                                               MBB->succ_end());
+
+  // Handle cases where sinking can happen but where the sink point
+ isn't a  // successor. For example:
+  //
+  //   x = computation
+  //   if () {} else {}
+  //   use x
+  //
+  const std::vector<MachineDomTreeNode *> &Children =
+    DT->getNode(MBB)->getChildren();  for (const auto &DTChild :
+ Children)
+    // DomTree children of MBB that have MBB as immediate dominator

are added.



+    if (DTChild->getIDom()->getBlock() == MI->getParent() &&
+        // Skip MBBs already added to the AllSuccs vector above.
+        !MBB->isSuccessor(DTChild->getBlock()))
+      AllSuccs.push_back(DTChild->getBlock());
+
+  // Sort Successors according to their loop depth or block frequency

info.



+  std::stable_sort(
+      AllSuccs.begin(), AllSuccs.end(),
+      [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
+        uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency() :

0;



+        uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency() :

0;



+        bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
+        return HasBlockFreq ? LHSFreq < RHSFreq
+                            : LI->getLoopDepth(L) < LI->getLoopDepth(R);
+      });
+
+  auto it = AllSuccessors.insert(std::make_pair(MBB, AllSuccs));
+
+  return it.first->second;
+}
+
/// FindSuccToSinkTo - Find a successor to sink this instruction to.
MachineBasicBlock *MachineSinking::FindSuccToSinkTo(MachineInstr

*MI,



                                  MachineBasicBlock *MBB, @@
-579,38
+634,7 @@ MachineBasicBlock *MachineSinking::FindS
     // we should sink to. If we have reliable block frequency information
     // (frequency != 0) available, give successors with smaller frequencies
     // higher priority, otherwise prioritize smaller loop depths.
-      SmallVector<MachineBasicBlock*, 4> Succs(MBB->succ_begin(),
-                                               MBB->succ_end());
-
-      // Handle cases where sinking can happen but where the sink point

isn't a



-      // successor. For example:
-      //
-      //   x = computation
-      //   if () {} else {}
-      //   use x
-      //
-      const std::vector<MachineDomTreeNode *> &Children =
-        DT->getNode(MBB)->getChildren();
-      for (const auto &DTChild : Children)
-        // DomTree children of MBB that have MBB as immediate dominator

are added.



-        if (DTChild->getIDom()->getBlock() == MI->getParent() &&
-            // Skip MBBs already added to the Succs vector above.
-            !MBB->isSuccessor(DTChild->getBlock()))
-          Succs.push_back(DTChild->getBlock());
-
-      // Sort Successors according to their loop depth or block frequency

info.



-      std::stable_sort(
-          Succs.begin(), Succs.end(),
-          [this](const MachineBasicBlock *L, const MachineBasicBlock *R) {
-            uint64_t LHSFreq = MBFI ? MBFI->getBlockFreq(L).getFrequency()

:



0;



-            uint64_t RHSFreq = MBFI ? MBFI->getBlockFreq(R).getFrequency()

:



0;



-            bool HasBlockFreq = LHSFreq != 0 && RHSFreq != 0;
-            return HasBlockFreq ? LHSFreq < RHSFreq
-                                : LI->getLoopDepth(L) < LI->getLoopDepth(R);
-          });
-      for (SmallVectorImpl<MachineBasicBlock *>::iterator SI =

Succs.begin(),



-             E = Succs.end(); SI != E; ++SI) {
-        MachineBasicBlock *SuccBlock = *SI;
+      for (MachineBasicBlock *SuccBlock :
+ GetAllSortedSuccessors(MI,
+ MBB)) {
       bool LocalUse = false;
       if (AllUsesDominatedByBlock(Reg, SuccBlock, MBB,
                                   BreakPHIEdge, LocalUse)) {


_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

<0001-MachineSink-Address-post-commit-review-comments.patch>

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150616/702d80fd/attachment.html>