[PATCH] Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.

David davidxl at google.com
Fri Jun 26 17:11:41 PDT 2015


================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:818
@@ +817,3 @@
+  double HeaderFallThruFreq = 0;
+  for (auto *Pred : HeaderBB->predecessors()) {
+    BlockChain *PredChain = BlockToChain[Pred];
----------------
congh wrote:
> davidxl wrote:
> > Why is this loop outside of the following loop? I expect  each new candidate loop header to have a chance to check the fall through opportunity and related saving/cost.
> Here we only consider natural loops, which have only one entry (loop header). As there is only one loop header, we only need to compute the fall-through once. We only check this benefit if the loop top is the loop header.
Ok -- a comment there about that will be useful.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:821
@@ +820,3 @@
+    if (!LoopBlockSet.count(Pred) &&
+        (!PredChain || Pred == *std::prev(PredChain->end()))) {
+      double EdgeFreq = static_cast<double>(
----------------
congh wrote:
> davidxl wrote:
> > If the previous chain does not end up connecting with this new top block in the current loop chain, then the benefit of the fall through won't be materializable, for instance when the Pred has more than 1 successors and the other successor is also a chain lead.
> That is right. But at this point we haven't started to build chains outside of this loop. It is difficult to tell if Pred will or will not be connected to the loop header. This is just a heuristic-like optimization. We could not guarantee global optimal result.
Is it worthwhile considering doing loop rotation after all chains have been formed?

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:828
@@ +827,3 @@
+      if (Pred->succ_size() == 1)
+        EdgeFreq *= 2;
+      HeaderFallThruFreq = std::max(HeaderFallThruFreq, EdgeFreq);
----------------
congh wrote:
> davidxl wrote:
> > I think the adjustment should be done the other way:
> > 
> > when num of successors > 1, the cost will be adjust to the minimal freq of the Pred's outgoing edge -- that is where the unconditional jump will be inserted (assuming the fall through path is the hottest one)
> Consider a two-way branch from Pred with frequencies x and y, and x is the frequency from Pred to the loop header. When x > y, and the edge is not fall-thru, then the number of jumps is x + y + y, and branch cost is x + y; if it is a fall-thru, the number of jumps is x + y, and the branch cost is y. If we take the weight of branch cost and jump instruction to be the same, then we can save x+y by making that edge fall-thru.
> 
> If x < y, the edge from Pred to the loop header won't be fall-thru and we save nothing.
> 
> So the conclusion is: if num of successors > 1 and the edge from Pred to the loop header is the most frequent one, then making this edge fall-thru can save us the freq of Pred. If that edge is not the most frequent one, we could save nothing. If num of successors == 1, we can save 2x freq of Pred.
case 1:  num_successors (Pred) == 2, and x > y

if Pred->Header edge is not the fall through, assuming the other edge will become a fall through edge , then the costs will be:
  1) number of jumps executed: (x+y), cost of jump is C1
  2) number of taken branches: x, cost of branch taken is C2:
Total cost: C1* (x+y) + C2* x

if Pred->Header edge is the fall through, 
   1) number of jumps executed: (x+y)
   2) number of taken branches: y
Total cost: C1 * (x+y) + C2 * y

so the saving (of keeping the Pred->header fall through) is C2 * (x - y)

case 2: number of successors (Pred) == 2, x < y

The saving of keeping Pred->header to be fall through is still C2 * (x - y ), but negative.

case 3: number of successors (Pred) == 1

Keeping the fall through, the total cost is 0, while not keeping the fall through the cost is (C1+C2)*x


Conclusion
a) when number of successors > 1, the real cost of not doing the fall through should be adjusted to C2 * (x-y)
b) when the number of successors == 1, your estimate is correct, assuming C1 == C2
c) better introduce a cost parameter for type1 and type2

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:846
@@ +845,3 @@
+    // the fall through edge from outside of the loop to the header.
+    if (Iter == HeaderIter)
+      Cost -= HeaderFallThruFreq;
----------------
congh wrote:
> davidxl wrote:
> > See the previous comment. Even when Iter is not the current Head, it may still have fall through opportunity.
> The fall-thru from Pred to another successor that is not the header? Then loop rotation won't affect that fall-thru.
This is explained by you -- only natural loop is considered.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:855
@@ +854,3 @@
+      if (!LoopBlockSet.count(Succ) &&
+          (!SuccChain || Succ == *SuccChain->begin())) {
+        auto EdgeFreq =
----------------
congh wrote:
> davidxl wrote:
> > Do you also want to check if the Succ block has more than one incoming edges. If it has more than one incoming edges, need to make sure 'Succ' block's  other predecessor is not a tail of another chain that can be connnected with 'Succ'.
> The function findBestLoopExit() doesn't consider this possibility. If we check this, should we go ahead and check if the source of the more frequent incoming edge has a more frequent outgoing edge... I am afraid this may make things too complicated.
you need keep an eye on it and consider not doing loop rotation on the fly and chain merging on the fly. The decision of rotation and how chains are merged/connected need to be made together.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:941
@@ +940,3 @@
+    rotateLoopWithProfile(LoopChain, L, LoopBlockSet);
+  else if (LoopTop == L.getHeader()) {
+    // If we selected just the header for the loop top, look for a potentially
----------------
congh wrote:
> davidxl wrote:
> > the check does not seem necessary. If LoopTop == header, ExitingBB will be null.
> This is because I changed the position of findBestLoopExit(). As I will resume its position, I will also update this part.
Another question is: if the current layout algorithm finds a different 'best' loop top than the header, than the profile driven/cost based rotation will be suppressed, which is not desirable.

http://reviews.llvm.org/D10717

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list