[PATCH] D11442: Provide an interface normalizeSuccWeights in MachineBasicBlock to normalize its successors' weights and use it in other places.

Wed Aug 12 18:30:07 PDT 2015

>
>
> When we remove a successor, should we adjust the probabilities of other
> successors? If yes, what if we later add another successor with a given
> probability? If not, the sum of all successor's probabilities will not be 1
> and probabilities are more like weights (and probabilities don't make sense
> anymore).
>
> The similar question is on addSuccessor: what if we add a successor when the
> sum of other successors' probabilities is already 1? Do we always need to
> check this?

There are two kinds of CFG changing interfaces:
1) creator interfaces
2) update interfaces

1) The creator interface is basically addSuccessor. It is an primitive
operation which does not need to do the profile adjustment -- under
the assumption that the client either know what prob to use or supply
a default (unknown).

2) update interfaces are more interesting. The creator interface
addSuccessor should not be directly used directly in the CFG update --
the update operation needs to be done 'atomically' so that the
compiler knows how to maintain the data associated with the edges. You
will need to study and find out all common update patterns and create
update interfaces for them. Here are a couple of examples:

a) Replace one successor with another:
  step-1: removeSuccessor
  step-2: addSuccessor if not already there

There should be an interface for this: MachineBasicBlock::replaceSuccessor

b) Splitting: replace one successor with a list of successors.
Example: in tailDup, remove tailBB and transfer tailBB's successors to
the prevBB.
   A high level interface is also desirable

c) Delete a dead target -- mostly just convert a cond jump into a direct jump.
  removeSuccessor interface can be used -- but it needs to have a
parameter to do profile update -- the dead target's prob needs to be
evenly distributed to the surviving targets. With PGO, this should be
an no-op, as the the dead target's original probability should be 0.

d) MachineBasicBlock::transferSuccessors -- existing interface,
nothing needs to be done.

Please do a more thorough study of the use patterns.

David

>
> Here are some candidate designs:
>
> 1. addSuccessor(MachineBasicBlock*, BranchProbability);
>
>
>   a. Just assign the given probability to the internal one, without doing
> any check or normalization.
>
>   Pros:
>     - Fast as there is no additional overhead.
>     - More natural when the user uses this interface to add several
> successors with precalculated probabilities.
>
>   Cons:
>     - The user need to make sure the sum of all successors' probabilities is
> one.
>     - Fails to adjust probabilities if the user adds a successor while the
> sum of other successors' probabilities is one.
>
>
>   b. Always normalize all successors' probabilities so that the sum of them
> is one. The given probability of the new successor should be guaranteed, and
> other successors' probabilities are scaled down.
>
>   Pros:
>     - It is guaranteed that the sum of all successors' probabilities is one.
>
>   Cons:
>     - Has overhead when adding successors, though the overhead is moderate.
>
>
>   c. Provide a new interface addSuccessors() to add several successors at
> one time with precalculated probabilities. Let addSuccessor() behave as in
> b.
>
>   Pros:
>     - Faster than b when adding more than one successor.
>
>   Cons:
>     - The use has to make sure that the sum of all successors' probabilities
> is one when using this new interface.
>
>
>   d. Add a new parameter to addSuccessor() to indicate whether to normalize
> probabilities.
>
>   Pros:
>     - If used properly, it is faster than b in some cases.
>
>   Cons:
>     - Add the complexity of this interface, and make it harder to be used by
> users.
>
>
> 2. removeSuccessor()
>
>   This will have similar designs as a/b/d of addSuccessor() shown above.
>
>
> Any suggestion?
>
>
> thanks,
> Cong
>
>>
>>
>> 1) A common pattern of using removeSuccessor is to insert a trampoline
>> BB with conditional jump to the old target.
>>
>>   JTBB->removeSuccessor(BB);
>>   JTBB->addSuccessor(NewBB);
>>
>> After this, the outgoing edge probabilities of JTBB will be wrong.  If
>> we enforce the interface to take branch probability, then problems
>> like this won't occur.
>>
>> 2) removing edge to landing pad -- in this case, it does not matter
>> whether the edge probability is stored as weight over sum or as fixed
>> point representation of the real probability.
>>
>> 3) In IfConvert, remove successor edge to TBB. The branch is gone
>> after the transformation, so there is no profile update issue
>>
>> 4) Superfluous edge removal in MachineBasicBlock::CorrectExtraCFGEdges
>>   should assert that superfluous edges have zero probability.
>>
>> 5) Dead Block Removal in various passes such as BranchFolding,
>> UnreachableBlockElim, removeSuccessor is invoked on the dead BB, so
>> there is no update issue there.
>>
>> 6) Block Merging in BranchFolding -- after merging, the branch is
>> gone, so there is no profile update issue
>>
>> 7) TailDuplication: current profile update is correct. Using
>> probability based interface will work fine.
>>
>>     uint32_t Weight = MBPI->getEdgeWeight(PredBB, TailBB);
>>     PredBB->removeSuccessor(TailBB);
>>     unsigned NumSuccessors = PredBB->succ_size();
>>     assert(NumSuccessors <= 1);
>>     if (NumSuccessors == 0 || *PredBB->succ_begin() != NewTarget)
>>       PredBB->addSuccessor(NewTarget, Weight);
>>
>>
>> 3. MachineBasicBlock::transferSuccessors (MachineBasicBlock *FromBB)
>>
>> The implementation uses removeSuccessor and addSuccessor and tries to
>> update weight. The implementation in general sense is wrong, as the
>> succ edge weights fetched from FromBB may have a different scale than
>> weights owned by 'this' BB. The profile update is guaranteed to be
>> right only when 'this' BB's successor list is empty. I have checked
>> the sites that invoke this interface -- this condition (empty
>> successor list) does seem to be always satisfied.
>>
>> In conclusion, it is safe to get rid of getSumForBlock. The potential
>> bugs of missing profile/profile update in CodeGen also need to be
>> further examined and fixed.
>>
>> thanks,
>>
>> David
>>
>>
>>
>> On Wed, Aug 5, 2015 at 1:59 PM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>> >
>> >> On 2015-Aug-05, at 13:48, Cong Hou <congh at google.com> wrote:
>> >>
>> >> On Wed, Aug 5, 2015 at 12:24 PM, Duncan P. N. Exon Smith
>> >> <dexonsmith at apple.com> wrote:
>> >>>
>> >>> In terms of changing branch weights to use fixed point arithmetic (and
>> >>> always be normalized): I like this idea.  I'd prefer a number like
>> >>> UINT32_MAX instead of 100000, but I agree with David that 32 bits
>> >>> ought
>> >>> to be enough.  Something similar to the `BlockMass` design seems about
>> >>> right.
>> >>>
>> >>> The major downside is that passes that modify the CFG will always have
>> >>> to update edge weights to renormalize them.  Currently, if a pass
>> >>> removes a successor edge, it needn't worry about normalization at all.
>> >>> Although that's convenient, it's probably not important.  (Or maybe
>> >>> they could be normalized lazily...?)
>> >>
>> >> One important idea from David's design is that we should pass branch
>> >> probability instead of edge weight when adding new edges or updating
>> >> edge weights. If we decide to use this design, then when edge weights
>> >> (or probabilities) are updated, we should always update the weights on
>> >> other out-edges from the same block. Removing edge doesn't need
>> >> normalization but if we want a fast getEdgeProbability()
>> >> implementation without storing the sum of edge weights, we should
>> >> always keep edge weights normalized.
>> >
>> > Wouldn't caching in BPI be fast enough for `getEdgeProbability()`?
>> > And it slows down update speed (since, as you point out, every update
>> > to one edge requires updates to all the others).
>> >
>> >>> If you go ahead with this, then you should first change
>> >>> `BranchProbability` to use this design, and then you can reuse that
>> >>> class for branch weights.
>> >>
>> >> OK. I think we can allow BranchProbability to be built either from
>> >> uint32_t or BlockMass-like class.
>> >
>> > I was suggesting: change `BranchProbability` to be a BlockMass-like
>> > class.  (Or are we on the same page already?)
>> >
>> >>> BTW, another approach would be to change `getSumForBlock()` to return
>> >>> a
>> >>> 64-bit number, and just use 64-bit math everywhere (weights would
>> >>> still
>> >>> be 32-bit).  Sounds like a lot less work.  Thoughts on that?
>> >>
>> >> As BranchProbability only accepts uint32_t parameters, we still need
>> >> to scale down the sum of 64-bit type when constructing probabilities.
>> >
>> > Sure, but BPI could easily cache that result.  It caches the weight
>> > right now, and could just as easily cache the probability instead (or
>> > as well).
>
>