[PATCH] D10717: Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.

Cong Hou congh at google.com
Fri Jul 24 15:43:33 PDT 2015

congh marked 9 inline comments as done.

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:84-86
@@ -83,1 +83,5 @@
+static cl::opt<bool> UseProfileBasedCostAnalysis(
+    "use-profile-based-cost-analysis",
+    cl::desc("Use profile data to guide machine block placement."),
+    cl::init(false), cl::Hidden);
davidxl wrote:
> chandlerc wrote:
> > This seems confusing to me. We use profile data to guide many parts of block placement. This flag doesn't guard most of them. Could you select a more accurate name?
> yes -- something like 
> 'precise-rotation-cost' would be clearer. 
I have chosen the name suggested by David.

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:89-92
@@ +88,6 @@
+static cl::opt<unsigned> MisfetchCost(
+    "misfetch-cost",
+    cl::desc("The cost of misfetch due to conditional/unconditional jumps."),
+    cl::init(1), cl::Hidden);
chandlerc wrote:
> It might be useful to clarify that this cost models the probabalistic risk of an *instruction* misfetch due to a jump relative to falling through?
I have updated the description following your suggestion.

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:94-96
@@ +93,5 @@
+static cl::opt<unsigned>
+    JumpInstCost("jump-inst-cost", cl::desc("The cost of jump instructions."),
+                 cl::init(1), cl::Hidden);
chandlerc wrote:
> We have lots of existing measures of the cost of an instruction already. It would probably help to clarify this is a weight specifically designed to fit with the scale of the misfetch cost above? Not really sure how important it is to be able to tune these two costs independently.
OK. I have updated the description.

We actually only need one ratio between those two costs. But I think it would be more clear to give two costs separately in case we need other costs in the future (like mispredicate?).

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:269-271
@@ -252,2 +268,5 @@
   void buildCFGChains(MachineFunction &F);
+  bool useProfileData(MachineFunction &F) {
+    return UseProfileBasedCostAnalysis && F.getFunction()->getEntryCount();
+  }
chandlerc wrote:
> My assumption is that the flag is expected to go away at some point. As such, I wouldn't embed a predicate here. I would sink this into the code that actually needs to test these properties to make decisions.
OK. Now this is only called once and it is OK to "inline" this function there.

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:815-826
@@ +814,14 @@
+/// With profile data, we could determine the benefit in terms of fall through
+/// opportunities when rotating a loop chain and select the best rotation.
+/// Basically, there are three kinds of benefit costs to consider for each
+/// rotation:
+///    1. The fall through edge (if it exists) from BB out of the loop to the
+///    loop header.
+///    2. The fall through edge (if it exists) from the loop exit to BB out of
+///    the loop.
+///    3. The fall through edge (if it exists) from the last BB to the first BB
+///    in the loop chain.
+///  Therefore, the benefit for a given rotation is 1 + 2 - 3. We select the
+///  best rotation with the largest benefit.
+void MachineBlockPlacement::rotateLoopWithProfile(
chandlerc wrote:
> "we could determine" -> "we can determine"
> Also, I find the mixture of "benefit" and "cost" confusing in this comment. I would try to use more clear terms here.
You are right. I have updated the comment by clearly stating which item is a benefit and which one is a cost.

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:831
@@ +830,3 @@
+  auto HeaderIter = std::find(LoopChain.begin(), LoopChain.end(), HeaderBB);
+  double BestRotationBenefit = -std::numeric_limits<double>::max();
+  auto RotationPos = LoopChain.end();
chandlerc wrote:
> Please don't use floating point to compute these kinds of things.
> We would like LLVM to not depend no the vagaries of the FPU and would like to avoid any possible numerical instability manifesting as non-deterministic compiles.
> Instead, you should compute things using the BlockFrequency and BranchProbability abstract units which do the fixed-point math for you.
Thanks for explaining the background! I have used a pair of BlockFrequency to express benefit and cost respectively.

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:846-849
@@ +845,6 @@
+      double FallThruBenefit = EdgeFreq * MisfetchCost;
+      // If the predecessor has only an unconditional jump to the header, we
+      // need to consider the cost of this jump.
+      if (Pred->succ_size() == 1)
+        FallThruBenefit += EdgeFreq * JumpInstCost;
+      HeaderFallThruBenefit = std::max(HeaderFallThruBenefit, FallThruBenefit);
chandlerc wrote:
> Why is a conditional jump's benefit not considered?
You mean when Pred->succ_size() > 1? In that case we assume Pred will have the other successor as the layout successor, which is very likely to happen, and then we will save nothing (there are some cases that we can guess at this point that Pred won't have the other successor as fall-through. We could do this optimization here but I think it makes the analysis too complicated).

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:854-856
@@ +853,5 @@
+  for (auto Iter = LoopChain.begin(), TailIter = std::prev(LoopChain.end()),
+            EndIter = LoopChain.end();
+       Iter != EndIter; Iter++, TailIter++) {
+    if (TailIter == LoopChain.end())
chandlerc wrote:
> There is no comment explaining the actual algorithm at this point. Please add one. Especially with a loop structure this complex (three iterators??) it would help to give the reader some context for how this loop is supposed to operate.
OK. I have added a comment explains what this loop does. The TailIter is also described just below this statement.

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:868
@@ +867,3 @@
+    if (Iter == HeaderIter)
+      Benefit += HeaderFallThruBenefit;
chandlerc wrote:
> This seems to be a *cost* rather than a benefit?
This is a benefit because then we have a fallthrough from a block outside of the loop to the loop header. I think my comment here is incorrect and sorry about it (it is corrected now).

Comment at: lib/CodeGen/MachineBlockPlacement.cpp:877-879
@@ +876,5 @@
+          (!SuccChain || Succ == *SuccChain->begin())) {
+        auto EdgeFreq =
+            MBFI->getBlockFreq(TailBB) * MBPI->getEdgeProbability(TailBB, Succ);
+        ExitFallThruFreq = std::max(ExitFallThruFreq, EdgeFreq);
+      }
chandlerc wrote:
> You should compute the max edge probability, and factor in the frequency outside of the loop.
Yes, this is more efficient. Done.


More information about the llvm-commits mailing list