<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Aug 5, 2015 at 6:35 PM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'd like to summarize my proposal (interfaces) so we are on the same<br>

page.  I have also checked  the client usages of the MachineCFG update<br>

APIs, and found many potential problems. There are no evidences to<br>

show that using fixed point representation for branch probability will<br>

be a problem when successor edge is removed.<br>

<br>

Proposal summary:<br>

===============<br>

<br>

1. For BranchProbabilityInfo class<br>

<br>

1) Keep getEdgeProbablity(..) interface as it is<br>

2) Make getEdgeWeight(..) interfaces simply a wrapper using 1);<br>

deprecate its usage in the longer term<br>

3) Remove setEdgeWeight interface<br>

4) Add a new interface: SetEdgeProbability(..., const BranchProbability& prob);<br>

<br>

(Note this proposal does not cover how BranchProbabilty representation<br>

should be changed or not).<br>

<br>

2. For MachineBranchProbabilityInfo class<br>

<br>

1) Keep getEdgeProbability interface<br>

2) make getEdgeWeight a wrapper to 1) and deprecate them in the future<br>

3) Remove getSumForBlock() interface. No need to keep this interface<br>

in order to 'recalculate' branch probability on the fly.<br>

<br>

3. MachineBasicBlock class<br>

<br>

1) change addSuccessor(.., uint32_t weight) to addSuccessor(...,<br>

BranchProbability prob)<br>

2) Remove setSuccWeight(..)<br>

3) Add a new interface: setEdgeProbabiltity(..., const BranchProbability&)<br>

<br>

<br>

How Machine CFG update interfaces are used and potential issues with<br>

profile update<br>

=================================================================<br>

<br>

The interfaces include:<br>

MachineBasicBlock::addSuccessor(...)<br>

MachineBasicBlock::removeSuccessor(..)<br>

MachineBasicBlock::replaceSuccssor(..)<br>

MachineBasicBlock::transferSuccessor(..)<br>

MachineBasicBlock::transferSuccessorAndUpdatePHIs(..)<br>

<br>

<br>

1. MachineBasicBlock::addSuccessor<br>

<br>

The interface allows client to omit a weight by defining the default<br>

weight to be 0.  I think this is a mistake.  When control flow is<br>

synthensized during target lowering, the client has better knowledge<br>

of the branch bias. The current design allows client simply do:<br>

<br>

  allocMBB->addSuccessor(&PrologueMBB);<br>

<br>

  checkMBB->addSuccessor(allocMBB);<br>

  checkMBB->addSuccessor(&PrologueMBB);<br>

<br>

There are many such cases.<br>

<br>

  Another big problem is that the CFG update code freely transfer<br>

weight from one parent BB to another BB which can be totally wrong if<br>

their original weight scales are out of sync:<br>

<br>

  if (FuncInfo.BPI)<br>

        BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),<br>

                                                   TrueMBB->getBasicBlock());<br>

   FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);<br>

<br>

<br>

2. MachineBasicBlock::removeSuccessor<br></blockquote><div><br></div><div>When we remove a successor, should we adjust the probabilities of other successors? If yes, what if we later add another successor with a given probability? If not, the sum of all successor's probabilities will not be 1 and probabilities are more like weights (and probabilities don't make sense anymore).</div><div><br></div><div>The similar question is on addSuccessor: what if we add a successor when the sum of other successors' probabilities is already 1? Do we always need to check this? </div><div><br></div><div>Here are some candidate designs:</div><div><br></div><div>1. addSuccessor(MachineBasicBlock*, BranchProbability); </div><div><br></div><div><br></div><div>  a. Just assign the given probability to the internal one, without doing any check or normalization.</div><div><br></div><div>  Pros: </div><div>    - Fast as there is no additional overhead. </div><div>    - More natural when the user uses this interface to add several successors with precalculated probabilities.</div><div><br></div><div>  Cons: </div><div>    - The user need to make sure the sum of all successors' probabilities is one.</div><div>    - Fails to adjust probabilities if the user adds a successor while the sum of other successors' probabilities is one.</div><div><br></div><div><br></div><div>  b. Always normalize all successors' probabilities so that the sum of them is one. The given probability of the new successor should be guaranteed, and other successors' probabilities are scaled down.</div><div><br></div><div>  Pros:</div><div>    - It is guaranteed that the sum of all successors' probabilities is one.</div><div><br></div><div>  Cons:</div><div>    - Has overhead when adding successors, though the overhead is moderate.</div><div><br></div><div><br></div><div>  c. Provide a new interface addSuccessors() to add several successors at one time with precalculated probabilities. Let addSuccessor() behave as in b.</div><div><br></div><div>  Pros:</div><div>    - Faster than b when adding more than one successor.</div><div><br></div><div>  Cons:</div><div>    - The use has to make sure that the sum of all successors' probabilities is one when using this new interface.</div><div><br></div><div><br></div><div>  d. Add a new parameter to addSuccessor() to indicate whether to normalize probabilities.</div><div><br></div><div>  Pros:</div><div>    - If used properly, it is faster than b in some cases.</div><div><br></div><div>  Cons:</div><div>    - Add the complexity of this interface, and make it harder to be used by users.</div><div><br></div><div><br></div><div>2. removeSuccessor()</div><div><br></div><div>  This will have similar designs as a/b/d of addSuccessor() shown above.</div><div><br></div><div><br></div><div>Any suggestion?</div><div><br></div><div><br></div><div>thanks,</div><div>Cong</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

1) A common pattern of using removeSuccessor is to insert a trampoline<br>

BB with conditional jump to the old target.<br>

<br>

  JTBB->removeSuccessor(BB);<br>

  JTBB->addSuccessor(NewBB);<br>

<br>

After this, the outgoing edge probabilities of JTBB will be wrong.  If<br>

we enforce the interface to take branch probability, then problems<br>

like this won't occur.<br>

<br>

2) removing edge to landing pad -- in this case, it does not matter<br>

whether the edge probability is stored as weight over sum or as fixed<br>

point representation of the real probability.<br>

<br>

3) In IfConvert, remove successor edge to TBB. The branch is gone<br>

after the transformation, so there is no profile update issue<br>

<br>

4) Superfluous edge removal in MachineBasicBlock::CorrectExtraCFGEdges<br>

  should assert that superfluous edges have zero probability.<br>

<br>

5) Dead Block Removal in various passes such as BranchFolding,<br>

UnreachableBlockElim, removeSuccessor is invoked on the dead BB, so<br>

there is no update issue there.<br>

<br>

6) Block Merging in BranchFolding -- after merging, the branch is<br>

gone, so there is no profile update issue<br>

<br>

7) TailDuplication: current profile update is correct. Using<br>

probability based interface will work fine.<br>

<br>

    uint32_t Weight = MBPI->getEdgeWeight(PredBB, TailBB);<br>

    PredBB->removeSuccessor(TailBB);<br>

    unsigned NumSuccessors = PredBB->succ_size();<br>

    assert(NumSuccessors <= 1);<br>

    if (NumSuccessors == 0 || *PredBB->succ_begin() != NewTarget)<br>

      PredBB->addSuccessor(NewTarget, Weight);<br>

<br>

<br>

3. MachineBasicBlock::transferSuccessors (MachineBasicBlock *FromBB)<br>

<br>

The implementation uses removeSuccessor and addSuccessor and tries to<br>

update weight. The implementation in general sense is wrong, as the<br>

succ edge weights fetched from FromBB may have a different scale than<br>

weights owned by 'this' BB. The profile update is guaranteed to be<br>

right only when 'this' BB's successor list is empty. I have checked<br>

the sites that invoke this interface -- this condition (empty<br>

successor list) does seem to be always satisfied.<br>

<br>

In conclusion, it is safe to get rid of getSumForBlock. The potential<br>

bugs of missing profile/profile update in CodeGen also need to be<br>

further examined and fixed.<br>

<br>

thanks,<br>

<br>

David<br>

<br>

<br>

<br>

On Wed, Aug 5, 2015 at 1:59 PM, Duncan P. N. Exon Smith<br>

<div class="HOEnZb"><div class="h5"><<a href="mailto:dexonsmith@apple.com">dexonsmith@apple.com</a>> wrote:<br>

><br>

>> On 2015-Aug-05, at 13:48, Cong Hou <<a href="mailto:congh@google.com">congh@google.com</a>> wrote:<br>

>><br>

>> On Wed, Aug 5, 2015 at 12:24 PM, Duncan P. N. Exon Smith<br>

>> <<a href="mailto:dexonsmith@apple.com">dexonsmith@apple.com</a>> wrote:<br>

>>><br>

>>> In terms of changing branch weights to use fixed point arithmetic (and<br>

>>> always be normalized): I like this idea.  I'd prefer a number like<br>

>>> UINT32_MAX instead of 100000, but I agree with David that 32 bits ought<br>

>>> to be enough.  Something similar to the `BlockMass` design seems about<br>

>>> right.<br>

>>><br>

>>> The major downside is that passes that modify the CFG will always have<br>

>>> to update edge weights to renormalize them.  Currently, if a pass<br>

>>> removes a successor edge, it needn't worry about normalization at all.<br>

>>> Although that's convenient, it's probably not important.  (Or maybe<br>

>>> they could be normalized lazily...?)<br>

>><br>

>> One important idea from David's design is that we should pass branch<br>

>> probability instead of edge weight when adding new edges or updating<br>

>> edge weights. If we decide to use this design, then when edge weights<br>

>> (or probabilities) are updated, we should always update the weights on<br>

>> other out-edges from the same block. Removing edge doesn't need<br>

>> normalization but if we want a fast getEdgeProbability()<br>

>> implementation without storing the sum of edge weights, we should<br>

>> always keep edge weights normalized.<br>

><br>

> Wouldn't caching in BPI be fast enough for `getEdgeProbability()`?<br>

> And it slows down update speed (since, as you point out, every update<br>

> to one edge requires updates to all the others).<br>

><br>

>>> If you go ahead with this, then you should first change<br>

>>> `BranchProbability` to use this design, and then you can reuse that<br>

>>> class for branch weights.<br>

>><br>

>> OK. I think we can allow BranchProbability to be built either from<br>

>> uint32_t or BlockMass-like class.<br>

><br>

> I was suggesting: change `BranchProbability` to be a BlockMass-like<br>

> class.  (Or are we on the same page already?)<br>

><br>

>>> BTW, another approach would be to change `getSumForBlock()` to return a<br>

>>> 64-bit number, and just use 64-bit math everywhere (weights would still<br>

>>> be 32-bit).  Sounds like a lot less work.  Thoughts on that?<br>

>><br>

>> As BranchProbability only accepts uint32_t parameters, we still need<br>

>> to scale down the sum of 64-bit type when constructing probabilities.<br>

><br>

> Sure, but BPI could easily cache that result.  It caches the weight<br>

> right now, and could just as easily cache the probability instead (or<br>

> as well).<br>

</div></div></blockquote></div><br></div></div>