[PATCH] D64376: [MBP] Avoid tail duplication if it can't bring benefit

David Li via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 16 15:40:58 PDT 2019


davidxl added inline comments.


================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:1143
+
+  // If the duplication candidate has more unplaced predecessors than
+  // successors, the extra duplication can't bring more fallthrough.
----------------
Carrot wrote:
> davidxl wrote:
> > This does not seem right.   Dup was Pred1's fall through successor and Succ1 was Dup's fall through (without tailDup). Also assume Pred2 and Pred3 has one single successor (otherwise Dup can not be tailDup ed into them).
> > 
> > Here, we should look at reduced taken branches, not total fallthrough --- because once Dup is tail duped into Pred2 and Pred3, it is merged with them thus has an implicit fall through there.
> > 
> > Let the incoming frequency to Dup from Pred1,2,3 are F1, F2, and F3, and assume the Probability from Dup to Succ1 is P(S1).  We also assume path insensitivity here in the calculation.
> > 
> > Before the tail duplication, the total taken branches are
> > 
> > T1 = F2 + F3 + (F1+F2+F3)*(1-P(S1))
> > 
> > After taildup, assuming Succ1 is a fall through of Pred1->Dup chain, and Succ2 is a fall through of Pred2,  then the total taken branches is
> > 
> > T2 = F1 * (1-(P(S1)) +  F2 * P(S1) + F3
> > 
> > so the difference is :
> > 
> > T1 - T2 = 2*(1-P(S1)) * F2 + (1-P(S1)) * F3   which is greater than 0.
> > 
> Current implementation considers only fallthrough from Dup's predecessor to its successors. It doesn't consider other factors such as taken branches and code size.
> 
> If we consider taken branches, in this example:
>     * Dup is not duplicated into Pred3, total taken branches through Pred3 is 
>                     Freq3 + taken branches to Succ1 or Succ2
>     * Dup is duplicated into Pred3, at the end of Pred3 it needs two branches to either Succ1 or Succ2, and the total number of taken branches is 
>                     Freq3
> 
> So duplicated into Pred3 really reduces taken branches. Similarly if Dup has more predecessors with single successor(Dup), tail duplicated Dup into them always reduces taken branches. Unfortunately Dup may have many predecessors, we have multiple cases with dozens of predecessors. Although duplicated Dup reduces taken branches, it also increases code size, reduces the efficiency of i-cache, and regress the performance finally.
> 
> So we need better method to model it. I can't see a simple trade off between code size and fallthroughs(taken branches) here, maybe we can apply machine learning here.
Some heuristics can be added. For instance of if the branch from Dup to Succ1,2 is highly biased, do not tail duplicate. If Pred3 (assuming it is the coldest) block is very cold, do not tail duplicate. Tunings like this can even be extended to 2 predecessor case -- with a good cost model + profile information, we can avoid bloating icache size.


================
Comment at: lib/CodeGen/MachineBlockPlacement.cpp:1138
+  // fallthrough from predecessors from Succ to its successors. We may need
+  // other machanism to handle different cases.
+  if (Succ->succ_size() == 0)
----------------
Can you explain this check (succ_size() ==0 )  a little more? 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64376/new/

https://reviews.llvm.org/D64376





More information about the llvm-commits mailing list