[PATCH] D11442: Provide an interface normalizeSuccWeights in MachineBasicBlock to normalize its successors' weights and use it in other places.

Thu Aug 6 17:21:42 PDT 2015

On Thu, Aug 6, 2015 at 1:15 PM, Andrew Trick <atrick at apple.com> wrote:
>
>> On Aug 5, 2015, at 6:35 PM, Xinliang David Li <davidxl at google.com> wrote:
>>
>> I'd like to summarize my proposal (interfaces) so we are on the same
>> page.  I have also checked  the client usages of the MachineCFG update
>> APIs, and found many potential problems. There are no evidences to
>> show that using fixed point representation for branch probability will
>> be a problem when successor edge is removed.
>>
>> Proposal summary:
>> ===============
>>
>> 1. For BranchProbabilityInfo class
>>
>> 1) Keep getEdgeProbablity(..) interface as it is
>> 2) Make getEdgeWeight(..) interfaces simply a wrapper using 1);
>> deprecate its usage in the longer term
>> 3) Remove setEdgeWeight interface
>> 4) Add a new interface: SetEdgeProbability(..., const BranchProbability& prob);
>>
>> (Note this proposal does not cover how BranchProbabilty representation
>> should be changed or not).
>>
>> 2. For MachineBranchProbabilityInfo class
>>
>> 1) Keep getEdgeProbability interface
>> 2) make getEdgeWeight a wrapper to 1) and deprecate them in the future
>> 3) Remove getSumForBlock() interface. No need to keep this interface
>> in order to 'recalculate' branch probability on the fly.
>
> I never thought branch "weight" should be exposed. The first comment in BranchProbabilityInfo.cpp is "Weights are for internal use only."

I see. I guess the reason it got exposed is due to the slowness of
computing probability the way it is represented today.

> Clients should only need to reason about branch probability and block frequency. It's particularly confusing because block frequency analysis uses block weights, as a precursor to frequency, which are not directly related to branch weights. Incidentally, branch weights are not probabilities. They have no meaning in and of themselves.

Agree.

>
>>
>> 3. MachineBasicBlock class
>>
>> 1) change addSuccessor(.., uint32_t weight) to addSuccessor(...,
>> BranchProbability prob)
>> 2) Remove setSuccWeight(..)
>> 3) Add a new interface: setEdgeProbabiltity(..., const BranchProbability&)
>
> In general these are great improvements. I never felt the current BPI/MPI implementation was complete in the sense that we didn't have a good strategy for verifying that passes were well behaved.

yes. We need to figure out a way to do profile sanity check/validation
for any given transformation pass.

>
> When the current design was conceived, a couple of the important design goals were:
>
> - Impose no overhead on the IR/MI representation when profile data is
>   absent, which is by far the common case.

For MI, static profile data (MPI) is always available as it is
promoted to MBB level. MFI still needs to be computed on demand.

>
> - Allow partial profile data and distinguish between missing
>   probabilities, hueristic derived probabilities, and measured
>   probabilities. They will all coexist.

In reality, partial profile + heuristic based probability may not work
well together. Small changes in branch probability can greatly change
the result of frequency propagation leading to very large changes in
loop scales. The result BB frequency may be quite off. It is quite
important to introduce mechanism to make sure profile is properly
updated instead of being dropped.

>
> It looks like clang -O2 is always generating MBB weights when we lower from IR (even when profile data is absent), just because BranchProbabilityInfo is available. So the common case does already generate MBB weights. However, I still don't think all MI clients should be required to represent and maintain branch probabilities, including clang -O0. Also consider that MI should be useful as a standalone representation. For example, a disassembler should be able to generate MI.

yes -- we should allow clients to ignore it if it does not care or
does not know what to do with it.

>
> So, I think changing the MBB interface to take BranchProbability is an improvement (assuming the representation is still efficient). However, it needs to be possible to create MBB edges without providing BranchProbability.

Agree. This also allows easier migration from old to new interfaces.
On the other hand, performance critical code that drops profile needs
to be fixed on a case by case basis.

> I think it would be fine to convert standard passes to provide a BranchProbability. We should then have a MachineFunction level flag to control whether to preserve branch probabilities. When that flag is not set, we simply drop the probability and incur no overhead. When it is set, we can verify that all passes that ran in the current pipeline have provided branch probabilities.
>

or just check if dynamic profile data is ready or not (aka pgo is on)
if compile time overhead is a concern.

thanks,

David
> Andy
>
>> How Machine CFG update interfaces are used and potential issues with
>> profile update
>> =================================================================
>>
>> The interfaces include:
>> MachineBasicBlock::addSuccessor(...)
>> MachineBasicBlock::removeSuccessor(..)
>> MachineBasicBlock::replaceSuccssor(..)
>> MachineBasicBlock::transferSuccessor(..)
>> MachineBasicBlock::transferSuccessorAndUpdatePHIs(..)
>>
>>
>> 1. MachineBasicBlock::addSuccessor
>>
>> The interface allows client to omit a weight by defining the default
>> weight to be 0.  I think this is a mistake.  When control flow is
>> synthensized during target lowering, the client has better knowledge
>> of the branch bias. The current design allows client simply do:
>>
>>  allocMBB->addSuccessor(&PrologueMBB);
>>
>>  checkMBB->addSuccessor(allocMBB);
>>  checkMBB->addSuccessor(&PrologueMBB);
>>
>> There are many such cases.
>>
>>  Another big problem is that the CFG update code freely transfer
>> weight from one parent BB to another BB which can be totally wrong if
>> their original weight scales are out of sync:
>>
>>  if (FuncInfo.BPI)
>>        BranchWeight = FuncInfo.BPI->getEdgeWeight(BI->getParent(),
>>                                                   TrueMBB->getBasicBlock());
>>   FuncInfo.MBB->addSuccessor(TrueMBB, BranchWeight);
>>
>>
>> 2. MachineBasicBlock::removeSuccessor
>>
>> 1) A common pattern of using removeSuccessor is to insert a trampoline
>> BB with conditional jump to the old target.
>>
>>  JTBB->removeSuccessor(BB);
>>  JTBB->addSuccessor(NewBB);
>>
>> After this, the outgoing edge probabilities of JTBB will be wrong.  If
>> we enforce the interface to take branch probability, then problems
>> like this won't occur.
>>
>> 2) removing edge to landing pad -- in this case, it does not matter
>> whether the edge probability is stored as weight over sum or as fixed
>> point representation of the real probability.
>>
>> 3) In IfConvert, remove successor edge to TBB. The branch is gone
>> after the transformation, so there is no profile update issue
>>
>> 4) Superfluous edge removal in MachineBasicBlock::CorrectExtraCFGEdges
>>  should assert that superfluous edges have zero probability.
>>
>> 5) Dead Block Removal in various passes such as BranchFolding,
>> UnreachableBlockElim, removeSuccessor is invoked on the dead BB, so
>> there is no update issue there.
>>
>> 6) Block Merging in BranchFolding -- after merging, the branch is
>> gone, so there is no profile update issue
>>
>> 7) TailDuplication: current profile update is correct. Using
>> probability based interface will work fine.
>>
>>    uint32_t Weight = MBPI->getEdgeWeight(PredBB, TailBB);
>>    PredBB->removeSuccessor(TailBB);
>>    unsigned NumSuccessors = PredBB->succ_size();
>>    assert(NumSuccessors <= 1);
>>    if (NumSuccessors == 0 || *PredBB->succ_begin() != NewTarget)
>>      PredBB->addSuccessor(NewTarget, Weight);
>>
>>
>> 3. MachineBasicBlock::transferSuccessors (MachineBasicBlock *FromBB)
>>
>> The implementation uses removeSuccessor and addSuccessor and tries to
>> update weight. The implementation in general sense is wrong, as the
>> succ edge weights fetched from FromBB may have a different scale than
>> weights owned by 'this' BB. The profile update is guaranteed to be
>> right only when 'this' BB's successor list is empty. I have checked
>> the sites that invoke this interface -- this condition (empty
>> successor list) does seem to be always satisfied.
>>
>> In conclusion, it is safe to get rid of getSumForBlock. The potential
>> bugs of missing profile/profile update in CodeGen also need to be
>> further examined and fixed.
>>
>> thanks,
>>
>> David
>>
>>
>>
>> On Wed, Aug 5, 2015 at 1:59 PM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>>>
>>>> On 2015-Aug-05, at 13:48, Cong Hou <congh at google.com> wrote:
>>>>
>>>> On Wed, Aug 5, 2015 at 12:24 PM, Duncan P. N. Exon Smith
>>>> <dexonsmith at apple.com> wrote:
>>>>>
>>>>> In terms of changing branch weights to use fixed point arithmetic (and
>>>>> always be normalized): I like this idea.  I'd prefer a number like
>>>>> UINT32_MAX instead of 100000, but I agree with David that 32 bits ought
>>>>> to be enough.  Something similar to the `BlockMass` design seems about
>>>>> right.
>>>>>
>>>>> The major downside is that passes that modify the CFG will always have
>>>>> to update edge weights to renormalize them.  Currently, if a pass
>>>>> removes a successor edge, it needn't worry about normalization at all.
>>>>> Although that's convenient, it's probably not important.  (Or maybe
>>>>> they could be normalized lazily...?)
>>>>
>>>> One important idea from David's design is that we should pass branch
>>>> probability instead of edge weight when adding new edges or updating
>>>> edge weights. If we decide to use this design, then when edge weights
>>>> (or probabilities) are updated, we should always update the weights on
>>>> other out-edges from the same block. Removing edge doesn't need
>>>> normalization but if we want a fast getEdgeProbability()
>>>> implementation without storing the sum of edge weights, we should
>>>> always keep edge weights normalized.
>>>
>>> Wouldn't caching in BPI be fast enough for `getEdgeProbability()`?
>>> And it slows down update speed (since, as you point out, every update
>>> to one edge requires updates to all the others).
>>>
>>>>> If you go ahead with this, then you should first change
>>>>> `BranchProbability` to use this design, and then you can reuse that
>>>>> class for branch weights.
>>>>
>>>> OK. I think we can allow BranchProbability to be built either from
>>>> uint32_t or BlockMass-like class.
>>>
>>> I was suggesting: change `BranchProbability` to be a BlockMass-like
>>> class.  (Or are we on the same page already?)
>>>
>>>>> BTW, another approach would be to change `getSumForBlock()` to return a
>>>>> 64-bit number, and just use 64-bit math everywhere (weights would still
>>>>> be 32-bit).  Sounds like a lot less work.  Thoughts on that?
>>>>
>>>> As BranchProbability only accepts uint32_t parameters, we still need
>>>> to scale down the sum of 64-bit type when constructing probabilities.
>>>
>>> Sure, but BPI could easily cache that result.  It caches the weight
>>> right now, and could just as easily cache the probability instead (or
>>> as well).
>