[PATCH] D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining

Wed Dec 2 18:53:45 PST 2020

wenlei added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/SampleContextTracker.cpp:373
+
+ContextTrieNode *SampleContextTracker::getContextFor(const DILocation *DIL) {
+  assert(DIL);
----------------
wmi wrote:
> wenlei wrote:
> > wmi wrote:
> > > wenlei wrote:
> > > > wmi wrote:
> > > > > A question, the context in SampleContextTracker includes not only inline stack but also call stack. S vector below only contains the inline stack at the DIL location. How can it match with the full stack starting from RootContext?  
> > > > When we decided to not inline a call site, context profile will be promoted to root, so what remains in context tracker should reflect the accurate remainder context profile. E.g. if we start with A->(call) B->(inline) C in context tracker. At some point if we're looking at B->C from DIL, there're two scenarios:
> > > > 
> > > > 1. If A inlined B, in this case, we wouldn't be able to match B->C from DIL to anything in context tracker. But this is intentional and desired, because The remainder/base profile for B, or the context profile B->C shouldn't have anything if A->B inline happened. 
> > > > 2. If A not inlined B, in this case, B->C should be moved/promoted from child of A to be under root. Then we would be able to match B->C from DIL to B->C (under root) in context tracker. 
> > > I see, thanks. After compiler decides it won't inline at some callsite, the profile for the callsite will be promoted and some context information will be loss. This seems to assume the inlining happens in top-down order and happens only once. I remember the CSSPGO profile will be used to drive CGSCC Inliner in the future. CGSCC Inliner will need to do the inlining iteratively so how it supposes to work with profile promotion?
> > You're right that it currently assumes top-down order - that is the best way to leverage context sensitive profile. If we try to use CSSPGO profile to drive SCC inline, bottom-up order and iterative nature are two key differences. 
> > 
> > The bottom-up inlining means we can't promote context profile by moving them to be under root, instead, we will need to copy (and merge) context profile into the base profile under root. For the same example A->B->C, with SCC inline, we could end up processing B first before A. When processing B, we promote the not inlined context profile of B to be under root (B->C), and merge them together into a base profile of B. However, we still need to keep the original context profile tree (A->B->C) so later when we processing A, we will still see the B and C under A.
> > 
> > Actually the promotion happens when we try to access a function's base profile (getBaseSamplesFor calls promoteMergeContextSamplesTree for each not inlined context profile), so the difference between top-down and bottom-up inline is more about accuracy - with bottom-up inline, when getting base profile for B, we'd assume none of B's call sites is inlined even if later A inlines B.
> > 
> > For iterative inlining, we can getBaseSamplesFor every time we process a function again to redo the promotion and merge based on the up-to-date inline decisions. E.g. if we process B then A (which inlines B), then B again, the 2nd time we process B, we would not merge the B under A into B's current base profile, which makes the profile more accurate than first pass over B. (But it's still not as good as top-down inline because even if we can unmerge context profile, we can't undo inlining).
> > 
> >  
> Thanks for the detailed explanation. That makes sense to me. 
> 
> Talking about the profile permotion and merging, if function is still hot after the inlining of all its callsites has been decided, and if it still has different profiles under different contexts, it may be interesting to clone the function so we can still apply the context sensitive profile in group. It will be interesting to have some support to compare profiles under different contexts and split them into groups. I feel the full context sensitive profile opens up some new opportunity we can explore by maximizing its usage in the future.  
Yeah, that's a good point. We're also thinking about cloning as it's something clang is still behind gcc. 

I think it will take some time before we fully leverage the new opportunities. I will send another patch for priority based top-down inlining with CSSPGO, with that more inlining will be done during early top-down inline, but it will take more effort to rebalance inline between sample loader vs CGSCC.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90125/new/

https://reviews.llvm.org/D90125