[PATCH] Propagation of profile samples through the CFG.

Thu Jan 9 04:35:23 PST 2014

  Ok, head thoroughly wrapped around what's going on here. =]

  I have a lot of ideas for different ways to do this, some of them more interesting than others, but I very much agree that it will be good to get something extremely similar to what is in GCC first, and we can iterate from there. The comments below are mostly implementation comments and some nit-picks.

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:550-551
@@ +549,4 @@
+///
+/// If \p CheckDT is false, the complementary check is
+/// performed with \p BB1's post-dominator tree.
+///
----------------
It was not clear to me at first that this is an inversion flag. I'd like a better name, but I don't have any particular ideas. I wonder if it would be more clear by passing in the domtree pointer to check?

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:563-565
@@ +562,5 @@
+    BasicBlock *BB2 = *I;
+    bool IsDomParent =
+        (CheckDT) ? DT->dominates(BB2, BB1) : PDT->dominates(BB2, BB1);
+    bool IsInSameLoop = LI->getLoopFor(BB1) == LI->getLoopFor(BB2);
+    if (BB1 != BB2 && VisitedBlocks.insert(BB2) && IsDomParent &&
----------------
I suspect there is a more direct way to compute this, but I don't want to hold up the review thinking about it -- maybe a FIXME?

Specifically, I think that we might be able to phrase this purely in terms of LoopInfo and DomTree.

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:578-581
@@ +577,6 @@
+      // members of its equivalence set.
+      uint32_t &bb1_weight = BlockWeights[BB1];
+      uint32_t &bb2_weight = BlockWeights[BB2];
+      if (bb2_weight > bb1_weight)
+        bb1_weight = bb2_weight;
+    }
----------------
bb1_weight = std::max(bb1_weight, bb2_weight);

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:620
@@ +619,3 @@
+    // class by making BB2's equivalence class be BB1.
+    SmallVector<BasicBlock *, 8> DominatedBBs;
+    DT->getDescendants(BB1, DominatedBBs);
----------------
You should probably put this outside the loop.

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:624-625
@@ +623,4 @@
+
+    // Repeat the same logic for all the blocks post-dominated by BB1.
+    // We are looking for every basic block BB2 such that:
+    //
----------------
Why do we need to do this?

If we visit every block as BB1, and then visit every block BB1 dominates as BB2, we will have considered all pairs of blocks where A dominates B, and tested for B post-dominating A. We don't need to consider pairs of blocks where B post-dominates A because we will reject the ones where A does not dominate B.

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:578
@@ +577,3 @@
+      // members of its equivalence set.
+      uint32_t &bb1_weight = BlockWeights[BB1];
+      uint32_t &bb2_weight = BlockWeights[BB2];
----------------
Chandler Carruth wrote:
> bb1_weight = std::max(bb1_weight, bb2_weight);
Naming convention here.

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:940
@@ +939,3 @@
+///
+///    - If all the edges are known except one, and the weight of the
+///      block is already known, the weight of the unknown edge will
----------------
Incoming edges or outgoing edges? Or both?

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:937-938
@@ +936,4 @@
+///
+///    - If B has a single predecessor/successor, then the weight
+///      of that edge is the weight of the block.
+///
----------------
How do you handle contradictions?

================
Comment at: lib/Transforms/Scalar/SampleProfile.cpp:951-953
@@ +950,5 @@
+///
+/// Since this propagation is not guaranteed to finalize for every CFG, we
+/// only allow it to proceed for a limited number of iterations (controlled
+/// by -sample-profile-max-propagate-iterations).
+///
----------------
This is a really frustrating property. Can it be easily avoided?

The first of the above steps is obviously deterministic.

For the second and third, I feel like the following is guaranteed to make forward progress and terminate:
0) Add all blocks to a list to process.
1) Take the first block out of the list
2) if we can add a weight to an edge connected to this block (including a self referential edge) do so and add the other block the edge connects to to the list (if not already in the list)
3) repeat #2 until we cannot add a weight to any edge connected to this block
4) if the list is not empty, go to #1
We only add a block to the list if we add a weight to an edge, so we can at most visit V+E blocks.

Thoughts?

http://llvm-reviews.chandlerc.com/D2274