[PATCH] Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.

Cong Hou congh at google.com
Fri Jun 26 16:08:15 PDT 2015


================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:250
@@ -249,3 +249,3 @@
   void buildLoopChains(MachineFunction &F, MachineLoop &L);
-  void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,
+  void rotateLoop(BlockChain &LoopChain, MachineLoop &L,
                   const BlockFilterSet &LoopBlockSet);
----------------
davidxl wrote:
> Why changing this interface at all? There does not seem to be necessary.
This is only an internal "interface". The reason why I moved the computation of ExitingBB to inside of this function is because ExitingBB is only needed by rotateLoop and it doesn't make sense to compute it outside of this function. This makes the logic more compact and clear: it is rotateLoop but not buildLoopChains who needs ExitingBB.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:250
@@ -249,3 +249,3 @@
   void buildLoopChains(MachineFunction &F, MachineLoop &L);
-  void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB,
+  void rotateLoop(BlockChain &LoopChain, MachineLoop &L,
                   const BlockFilterSet &LoopBlockSet);
----------------
congh wrote:
> davidxl wrote:
> > Why changing this interface at all? There does not seem to be necessary.
> This is only an internal "interface". The reason why I moved the computation of ExitingBB to inside of this function is because ExitingBB is only needed by rotateLoop and it doesn't make sense to compute it outside of this function. This makes the logic more compact and clear: it is rotateLoop but not buildLoopChains who needs ExitingBB.
I just checked this part again, and I found I was wrong: the exiting BB is calculated before building the loop chain. This should be guaranteed as when finding best exit BB it is assumed that BBs haven't been merged to the loop chain. I will update the patch by rolling this part back. 

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:761
@@ -759,1 +760,3 @@
+  auto F = (*LoopChain.begin())->getParent();
+  MachineBasicBlock *ExitingBB = findBestLoopExit(*F, L, LoopBlockSet);
   if (!ExitingBB)
----------------
davidxl wrote:
> ExistingBB is already computed in the caller, there is no need for the interface change here ..
Please see my comment above.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:797
@@ -793,1 +796,3 @@
 
+/// \brief Attempt to rotate a loop based on profile data to reduce branch cost.
+///
----------------
davidxl wrote:
> Though equivalent, but it might be more readable if it computes 'benefit/savings' in terms of the fall through opportunities instead of the cost.  For instance, in the following three items, 1 and 2 are really savings that come with the rotation, and 3 is the saving that will be lost with the candidate layout (including the original layout before the rotation).
Agree. I will update the comment.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:818
@@ +817,3 @@
+  double HeaderFallThruFreq = 0;
+  for (auto *Pred : HeaderBB->predecessors()) {
+    BlockChain *PredChain = BlockToChain[Pred];
----------------
davidxl wrote:
> Why is this loop outside of the following loop? I expect  each new candidate loop header to have a chance to check the fall through opportunity and related saving/cost.
Here we only consider natural loops, which have only one entry (loop header). As there is only one loop header, we only need to compute the fall-through once. We only check this benefit if the loop top is the loop header.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:821
@@ +820,3 @@
+    if (!LoopBlockSet.count(Pred) &&
+        (!PredChain || Pred == *std::prev(PredChain->end()))) {
+      double EdgeFreq = static_cast<double>(
----------------
davidxl wrote:
> If the previous chain does not end up connecting with this new top block in the current loop chain, then the benefit of the fall through won't be materializable, for instance when the Pred has more than 1 successors and the other successor is also a chain lead.
That is right. But at this point we haven't started to build chains outside of this loop. It is difficult to tell if Pred will or will not be connected to the loop header. This is just a heuristic-like optimization. We could not guarantee global optimal result.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:828
@@ +827,3 @@
+      if (Pred->succ_size() == 1)
+        EdgeFreq *= 2;
+      HeaderFallThruFreq = std::max(HeaderFallThruFreq, EdgeFreq);
----------------
davidxl wrote:
> I think the adjustment should be done the other way:
> 
> when num of successors > 1, the cost will be adjust to the minimal freq of the Pred's outgoing edge -- that is where the unconditional jump will be inserted (assuming the fall through path is the hottest one)
Consider a two-way branch from Pred with frequencies x and y, and x is the frequency from Pred to the loop header. When x > y, and the edge is not fall-thru, then the number of jumps is x + y + y, and branch cost is x + y; if it is a fall-thru, the number of jumps is x + y, and the branch cost is y. If we take the weight of branch cost and jump instruction to be the same, then we can save x+y by making that edge fall-thru.

If x < y, the edge from Pred to the loop header won't be fall-thru and we save nothing.

So the conclusion is: if num of successors > 1 and the edge from Pred to the loop header is the most frequent one, then making this edge fall-thru can save us the freq of Pred. If that edge is not the most frequent one, we could save nothing. If num of successors == 1, we can save 2x freq of Pred.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:846
@@ +845,3 @@
+    // the fall through edge from outside of the loop to the header.
+    if (Iter == HeaderIter)
+      Cost -= HeaderFallThruFreq;
----------------
davidxl wrote:
> See the previous comment. Even when Iter is not the current Head, it may still have fall through opportunity.
The fall-thru from Pred to another successor that is not the header? Then loop rotation won't affect that fall-thru.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:855
@@ +854,3 @@
+      if (!LoopBlockSet.count(Succ) &&
+          (!SuccChain || Succ == *SuccChain->begin())) {
+        auto EdgeFreq =
----------------
davidxl wrote:
> Do you also want to check if the Succ block has more than one incoming edges. If it has more than one incoming edges, need to make sure 'Succ' block's  other predecessor is not a tail of another chain that can be connnected with 'Succ'.
The function findBestLoopExit() doesn't consider this possibility. If we check this, should we go ahead and check if the source of the more frequent incoming edge has a more frequent outgoing edge... I am afraid this may make things too complicated.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:939
@@ +938,3 @@
+  // loop by heavily using the profile data for better layout.
+  if (F.getFunction()->getEntryCount())
+    rotateLoopWithProfile(LoopChain, L, LoopBlockSet);
----------------
davidxl wrote:
> I suggest adding another intern option to guard this.
> 
> if (UseProfileBasedCostAnalysis && F.getFunction()->getEntryCount()) ...
OK.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:941
@@ +940,3 @@
+    rotateLoopWithProfile(LoopChain, L, LoopBlockSet);
+  else if (LoopTop == L.getHeader()) {
+    // If we selected just the header for the loop top, look for a potentially
----------------
davidxl wrote:
> the check does not seem necessary. If LoopTop == header, ExitingBB will be null.
This is because I changed the position of findBestLoopExit(). As I will resume its position, I will also update this part.

http://reviews.llvm.org/D10717

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list