# [PATCH] D10717: Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.

Fri Jul 24 21:03:50 PDT 2015

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:816-827
@@ +815,14 @@
+///
+/// With profile data, we can determine the benefit in terms of fall through
+/// opportunities when rotating a loop chain and select the best rotation.
+/// Basically, there are three kinds of benefit/cost to consider for each
+/// rotation:
+///    1. The fall through edge (if it exists) from BB out of the loop to the
+///    loop header (this is a benefit).
+///    2. The fall through edge (if it exists) from the loop exit to BB out of
+///    the loop (this is a benefit).
+///    3. The fall through edge (if it exists) from the last BB to the first BB
+///    in the loop chain (this is a cost as it is not falling through any more).
+///  Therefore, the benefit for a given rotation is 1 + 2 - 3. We select the
+///  best rotation with the largest benefit.
+void MachineBlockPlacement::rotateLoopWithProfile(
----------------
chandlerc wrote:
> I think it would be clearer to talk *only* in terms of cost. Below, when computing it, you can add and subtract costs to find the minimum.
>
> Consider wording along the lines of:
>
> There are three costs we want to minimize stemming from a jump rather than a fall through edge:
> 1) Entering the loop,
> 2) Exiting the loop, and
> 3) Continuing to loop by returning to the header.
>
> Does that make sense? If not, then maybe I'm misunderstanding what you're trying to say more fundamentally.
More precisely, the objective is to find a rotation with minimal overall cost or equivalently maximal the overall benefit. The cost/benefit function has 3 components.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:834-838
@@ +833,7 @@
+
+  // We calculate the benefit and cost of a loop rotation respectively, which
+  // are represented by BlockFrequency. When comparing the benefit/cost between
+  // two different rotations, we represent the actual benefit as benefit - cost,
+  // and check (benefit1 - cost1) > (benefit2 - cost2) from (benefit1 + cost2) >
+  // (benefit2 + cost1). Hopefully there is no overflow.
+  BlockFrequency BestRotationBenefit(0);
----------------
chandlerc wrote:
> As above, I think it would be useful to have just cost, and to use the natural addition and subtraction to find the minimum cost rotation.
>
> Also, "hopefully there is no overflow" seems a bad sign. We should *definitely* have no overflow, and I think it makes sense to add asserts and such to enforce that.
Cong is actually computing the overall benefit and maximize it.

Cong, It is equivalent to compute cost only -- simply change it to LoopExternalConnectionCost. For rotation with positive benefit in current patch, the connection cost will be 0, while for rotation with no benefit, the connection cost will be the branch freq to the head (or branch out freq)

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:847-850
@@ +846,6 @@
+  auto ScaleBlockFrequency = [](BlockFrequency Freq,
+                                unsigned Scale) -> BlockFrequency {
+    if (Scale == 0)
+      return 0;
+    return Freq / BranchProbability(1, Scale);
+  };
----------------
I believe I raised similar question before (see comment and replies at line 818). I think the problem is that we can not be very sure if this loop's header will or will not be the layout successor of E due to the loop based layout ordering constraint (In fact in your example, unless the branch is highly biased towards LH, it is likely X will be the layout successor of E).

That is why I proposed at some point that loop rotation be done when base layout is done for the function -- however that may require too big a change.

It might worth considering obvious beneficial conditional jump here.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:902
@@ +901,3 @@
+        LargestExitEdgeProb = std::max(LargestExitEdgeProb,
+                                       MBPI->getEdgeProbability(TailBB, Succ));
+    }
----------------
chandlerc wrote:
> Tragically, this loop is still quadratic in complexity.
>
> getEdgeProbability is really inefficient when queried in this way. You can find other code in this file to compute the best successor probability without hitting this quadratic behavior. We should really figure out an API for this in BPI but for now, I would suggest extracting a helper and using it here.
yes -- can consider computing TailBB's sum weight outside the loop.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:912-916
@@ +911,7 @@
+    //   JumpInstCost) * tail node frequency.
+    //   2. If the tail node has two successors, then we still get an additional
+    //   jmp instruction. Note that the more frequent executed jmp instruction
+    //   will be put ahead of the other one. Assume the frequency of those two
+    //   branches are x and y (x >= y), then the cost will be (x * MisfetechCost
+    //   + y * JumpInstCost) * tail node frequency.
+    //   3. If the tail node has more than two successors (this rarely happens),
----------------
chandlerc wrote:
> What if one of the successors is outside the loop?
>
this should not matter.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:915
@@ +914,3 @@
+    //   will be put ahead of the other one. Assume the frequency of those two
+    //   branches are x and y (x >= y), then the cost will be (x * MisfetechCost
+    //   + y * JumpInstCost) * tail node frequency.
----------------
I don't think this formula is correct. The correct formula is

(x-x')*MisfetchCost + y*JumpInstrCost

Where x' is the frequency of the edge from the tail node to the block which is not its layout successor (before the rotation).

http://reviews.llvm.org/D10717