[PATCH] D10979: Update the branch weight metadata in JumpThreading pass.

Cong Hou congh at google.com
Tue Jul 7 17:50:24 PDT 2015


On Tue, Jul 7, 2015 at 4:53 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>
> > On 2015-Jul-07, at 13:24, Xinliang David Li <davidxl at google.com> wrote:
> >
> > On Tue, Jul 7, 2015 at 10:39 AM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >>> On 2015-Jul-06, at 17:22, Cong Hou <congh at google.com> wrote:
> >>>
> >>>
> >>> congh added reviewers: chandlerc, davidxl.
> >>> congh added a subscriber: llvm-commits.
> >>>
> >>> Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). Two more analyses (BlockFrequencyInfo and BranchProbabilityInfo) are needed then.
> >>
> >> Thanks for working on this!
> >>
> >> Generally you don't need these analyses to keep branch weights
> >> up-to-date.  The design premise is that you don't need global
> >> information for local updates.
> >>
> >
> > This is true in general, but probably not the case for jump-threading.
> > When a new thread is formed (from NewBB to SuccBB), the  profile
> > update delta to the original edge BB->SuccBB comes from a different
> > edge (Pred->NewBB which is inherited from Pred->BB). Local update by
> > only looking at BB seems impossible.
>
> Ah, I see.   Yes, you need a global view to know predecessor
> probabilities.
>
> >
> >> You should be able to calculate the new !prof attachment based on the
> >> old ones, without running BFI.  (I'm skeptical of even running BPI -- if
> >> there's no !prof attachment on the old block, then you have no real
> >> information; what's the benefit in generating a new !prof attachment
> >> based on heuristics?  The new CFG will come with its own new
> >> heuristics.)
> >>
> >
> > I believe Cong's fix is targeting PGO.
>

Thank you very much for the detailed review, Duncan!

>
> Cong, I have a few concerns with the current approach.
>
> Firstly, this requires BFI even when there is no profile data (as David
> also noted in one of the Phabricator emails).  I don't think it's
> reasonable to require all users of jump threading to run an analysis
> that will never be used.  One option would be to `addOptional<>`, but
> I'm not sure that'll get you what you want.

Agree.

>
>
> To prevent everyone paying for profile data, we should do the following:
> if the basic block in question has branch weights, then (and only then)
> retrieve the BFI analysis so they can be updated.  I think you'll either
> have to port the analysis retrieval and caching logic from the new
> `PassManager` over to `LegacyPassManager`, or wait for it to be used in
> tree.

OK. I will try to update the patch following your suggestion.

>
>
> (On a related point, I doubt users of PGO + jump threading with partial
> profiles (e.g., JITs) care enough about fidelity to justify running BFI
> for this edge.  IMO, this update should be configurable.)
>
> Secondly, the current code requires BFI and then immediately invalidates
> it, but it should be straightforward to incrementally update (preserve)
> BFI here.

You are right. Let me find a way to incrementally update BFI.

>
>
> Moreover, I think your patch is incorrect as written.  The structure of
> the algorithm is basically:
>
>     do {
>       Changed = false;
>       for (BasicBlock &BB : F)
>         while (ProcessBlock(BB))
>           Changed = true;
>     } while (Changed);
>
> As soon as `ProcessBlock()` does something, the next call will have
> stale BFI, won't it?

I will update BFI every time branch weights are updated. This will
make sure that the next time we need BFI it is correct and up-to-date.

>
> Thirdly (a performance note), I don't think it makes sense to recompute
> BFI in `ProcessBlock()` (in the inner loop).  Instead:
>
>     do {
>       Changed = false;
>       for (BasicBlock &BB : F) {
>         bool BlockChanged = false;
>         while (ProcessBlock(BB))
>           BlockChanged = true;
>         if (BlockChanged) {
>           Changed = true;
>           updateBranchWeights(BB);
>         }
>       }
>     } while (Changed);
>
> This waits to update branch weights until all the relevant predecessors
> have been threaded through.


I think not all changes need branch weights update (like merging two
blocks). Currently I found we only need to do this for jump threading
(not other changes that are included in ProcessBlock). I will keep an
eye on the performance issue.




More information about the llvm-commits mailing list