[PATCH] D10825: Improvement on computing edge probabilities when choosing the best successor in machine block placement.

Mon Jul 27 00:36:48 PDT 2015

congh added inline comments.

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:365-371
@@ -364,4 +364,9 @@
   uint32_t WeightScale = 0;
   uint32_t SumWeight = MBPI->getSumForBlock(BB, WeightScale);
-  DEBUG(dbgs() << "Attempting merge from: " << getBlockName(BB) << "\n");
+
+  // Adjust sum of weights by excluding weights on edges pointing to blocks that
+  // cannot be BB's successor in the chain.
+  uint32_t AdjustedSumWeight = SumWeight;
+  SmallVector<MachineBasicBlock *, 4> Successors;
   for (MachineBasicBlock *Succ : BB->successors()) {
+    bool SkipSucc = false;
----------------
chandlerc wrote:
> This is made incredibly annoying because we don't rescale the branch probabilities in MBPI the way we do in BPI, and so we don't know for a fact that they fit in 32-bits.
> 
> I think we need to fix MBPI to automatically scale all the weights so that the sum of weights is known to fit in 32-bits. Once that's done, WeightScale and all the code in getSumForBlock to rescale things goes away.
> 
> Then, here, you should change this to just compute the sum directly rather than getting the sum from MBPI and then adjusting it after-the-fact.
> 
> Also, I would just embed the predicate into a lambda, and write two loops over BB->successors() rather than building up a vector of successors.
OK. Let me try to produce a patch doing edge weight normalization for MBPI. 

================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:381-383
@@ +380,5 @@
+        SkipSucc = true;
+      } else if (Succ != *SuccChain->begin()) {
+        DEBUG(dbgs() << "    " << getBlockName(Succ) << " -> Mid chain!\n");
+        SkipSucc = true;
+      }
----------------
davidxl wrote:
> chandlerc wrote:
> > This is a bigger change than the rest, and I think it should go in separately.
> > 
> > The rest of this change is a clear bug-fix for how this code works, and will make the "hot" detection actually reasonable below. This change, however, is shifting a core thing to prefer to outline all optional loop bodies which don't have fallthrough into them. That, to me, is a very significant change, and I'd rather look at it (and test cases and benchmarks showing it is clearly the right call) independently of fixing how we compute the probability of 1 of N successors.
> The core idea of Cong's patch is to make trace formation more correctly. The basic principle of the original trace formation (probability based) is that when a BB has multiple viable successors, but none of them is more dominating, then do not use outgoing succ edge frequency to determine layout successor
> 
> The bug in the current implementation is that when checking dominating outgoing edges, it also look at non-viable successors that have already being laid out.
> 
> The fix here is simply to eliminate the non-candidate when selecting trace successor.
> 
> Cong, as I mentioned before, it is much clearer to extract the adjustment code into a helper routine with documentation to make the intention clear.
I have some cases that this change is beneficial. But I agree we need to reevaluate its impact on loops without fall-through to header (from the aspect of icache locality). I can disable this "optimization" attempt from this patch and do more research on it.


http://reviews.llvm.org/D10825