[PATCH] D10717: Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.
chandlerc at gmail.com
Fri Jul 24 19:21:46 PDT 2015
chandlerc added inline comments.
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:846-849
@@ +845,6 @@
+ double FallThruBenefit = EdgeFreq * MisfetchCost;
+ // If the predecessor has only an unconditional jump to the header, we
+ // need to consider the cost of this jump.
+ if (Pred->succ_size() == 1)
+ FallThruBenefit += EdgeFreq * JumpInstCost;
+ HeaderFallThruBenefit = std::max(HeaderFallThruBenefit, FallThruBenefit);
> chandlerc wrote:
> > Why is a conditional jump's benefit not considered?
> You mean when Pred->succ_size() > 1? In that case we assume Pred will have the other successor as the layout successor, which is very likely to happen, and then we will save nothing (there are some cases that we can guess at this point that Pred won't have the other successor as fall-through. We could do this optimization here but I think it makes the analysis too complicated).
I'm curious what data makes you confident that the other successor will end up the layout successor?
It seems strange to me to assume that because there are more than one successors of the non-loop-predecessor of the header, the loop header won't be the layout-successor. Naively, I would expect a common pattern of:
| LH <-
| | |
| v |
| L2 --
Where LH is the loop header, and E is the entering block and X is the exit block. With this, it is perfectly conceivable that LH would be the layout successor of E.
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:816-827
@@ +815,14 @@
+/// With profile data, we can determine the benefit in terms of fall through
+/// opportunities when rotating a loop chain and select the best rotation.
+/// Basically, there are three kinds of benefit/cost to consider for each
+/// 1. The fall through edge (if it exists) from BB out of the loop to the
+/// loop header (this is a benefit).
+/// 2. The fall through edge (if it exists) from the loop exit to BB out of
+/// the loop (this is a benefit).
+/// 3. The fall through edge (if it exists) from the last BB to the first BB
+/// in the loop chain (this is a cost as it is not falling through any more).
+/// Therefore, the benefit for a given rotation is 1 + 2 - 3. We select the
+/// best rotation with the largest benefit.
I think it would be clearer to talk *only* in terms of cost. Below, when computing it, you can add and subtract costs to find the minimum.
Consider wording along the lines of:
There are three costs we want to minimize stemming from a jump rather than a fall through edge:
1) Entering the loop,
2) Exiting the loop, and
3) Continuing to loop by returning to the header.
Does that make sense? If not, then maybe I'm misunderstanding what you're trying to say more fundamentally.
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:834-838
@@ +833,7 @@
+ // We calculate the benefit and cost of a loop rotation respectively, which
+ // are represented by BlockFrequency. When comparing the benefit/cost between
+ // two different rotations, we represent the actual benefit as benefit - cost,
+ // and check (benefit1 - cost1) > (benefit2 - cost2) from (benefit1 + cost2) >
+ // (benefit2 + cost1). Hopefully there is no overflow.
+ BlockFrequency BestRotationBenefit(0);
As above, I think it would be useful to have just cost, and to use the natural addition and subtraction to find the minimum cost rotation.
Also, "hopefully there is no overflow" seems a bad sign. We should *definitely* have no overflow, and I think it makes sense to add asserts and such to enforce that.
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:902
@@ +901,3 @@
+ LargestExitEdgeProb = std::max(LargestExitEdgeProb,
+ MBPI->getEdgeProbability(TailBB, Succ));
Tragically, this loop is still quadratic in complexity.
getEdgeProbability is really inefficient when queried in this way. You can find other code in this file to compute the best successor probability without hitting this quadratic behavior. We should really figure out an API for this in BPI but for now, I would suggest extracting a helper and using it here.
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:912-916
@@ +911,7 @@
+ // JumpInstCost) * tail node frequency.
+ // 2. If the tail node has two successors, then we still get an additional
+ // jmp instruction. Note that the more frequent executed jmp instruction
+ // will be put ahead of the other one. Assume the frequency of those two
+ // branches are x and y (x >= y), then the cost will be (x * MisfetechCost
+ // + y * JumpInstCost) * tail node frequency.
+ // 3. If the tail node has more than two successors (this rarely happens),
What if one of the successors is outside the loop?
(Also, please add a test case covering this?)
More information about the llvm-commits