[PATCH] D158891: [CSSPGO] Retire FlattenProfileForMatching

Hongtao Yu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 29 23:09:20 PDT 2023


hoy accepted this revision.
hoy added inline comments.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2402
+    // Use top-level nested FS for counting profile mismatch metrics since
+    // currently once a callsite is mismatched, all its children profile are
+    // dropped.
----------------
wlei wrote:
> hoy wrote:
> > wlei wrote:
> > > hoy wrote:
> > > > wlei wrote:
> > > > > hoy wrote:
> > > > > > nit: profile -> profiles
> > > > > > 
> > > > > > > once a callsite is mismatched, all its children profile are drooped
> > > > > > 
> > > > > > Why children profiles can be dropped? Counting happens before inlining right?
> > > > > yeah, In the current  `findCalleeFunctionSamples` implementation, if the mismatch happens, it would return null, but there is no extra process to reuse those samples, so it's "dropped".
> > > > > 
> > > > > I think as Wenlei has proposed a solution in other patch: 
> > > > > >Maybe what we should actually do is to reuse the nested profile from those mismatched parent functions, something like run promoteMergeNotInlinedContextSamples on top level functions samples not used? We do reuse nested profile for not inlined callees from parent function.
> > > > > 
> > > > > That sounds reasonable, but as we talked offline, we could leave it in future.
> > > > I'm a bit confused. `FlattenedProfiles` is immutable once initialized right? What do we lose when using `FSFlattened` for `countProfileMismatches`?
> > > > 
> > > > If we are not to count for external functions, the nested callees there will be skipped too? But using a flattened profile can keep them, if they are not external.
> > > Let me show you with an example, say we have a profile like
> > > ```
> > > A:102:1
> > >  1:1
> > >  2:B: 101:1
> > >    1:1
> > >    2:C:100:1
> > >      1:1
> > >       ...
> > >      100:1
> > > ```
> > > Say callsite B in A is changed to other location and the callsite is mismatched.  Here like in this patch, it use the non-flattened profile.
> > > the mismached Samples is 101, i,e, it not only count B's profile but also all the children profiles(`C`)
> > > 
> > > But if we use flattened profile:
> > > ```
> > > A:2:1
> > >  1:1
> > >  1: B:1
> > > B:2:1
> > >  1:1
> > >  1: C:1
> > > C:100:1
> > >  1:1
> > >   ...
> > >   100:1
> > > ```
> > > The callsite of B in A is mismatched, but now the callsite count is 1, the mismatched Samples is 1.
> > > 
> > > I know it feels counter-intuitive, there is no mismatch in B and C, but that is the current implementation, we don't reuse the children profile. Once the parent callsite is mismatched, all the children profiles are dropped.
> > > 
> > >  
> > > 
> > Thanks for the explanation. I have a better understanding now. 
> > 
> > But I still have a question about how to count mismatched samples. In your examples, the 101 samples for B & C may still be used if B & C have no source changes. Yes, I get that the inlining won't happen, but even if it happens it only saves the cost of one call instruction and perhaps corresponding parameter passing cost? In other words, whether the 101 samples can be applied to inlined B&C depends on how well the matching works for B & C?
> > 
> > BTW, how much difference in terms of the staleness number did you see with switching to using nested profile?
> Sorry I may not fully understand your question.  Here the metrics is computed before the fuzzy matching, i,e. if it's a mismatch in the source code, even later the fuzzy matching recover to the right location, we still count that location is a mismatch. We haven't done the post fuzzy matching metrics.
> 
> > In your examples, the 101 samples for B & C may still be used if B & C have no source changes.
> This can't happen in the current implementation, if the mismatch happens, the 101 samples is never used no matter whether B&C is inlined/matched or not, because In A, when `findCalleeFunctionSamples ` return null, it won't visit A's children profiles, the children B &C profile won't be merged into the base B & C profile.
> 
> > BTW, how much difference in terms of the staleness number did you see with switching to using nested profile?
> It should matter, I saw a case about 2 times differences, it depends on how hot the dropped children profile is. 
> This can't happen in the current implementation, if the mismatch happens, the 101 samples is never used no matter whether B&C is inlined/matched or not, because In A, when findCalleeFunctionSamples  return null, it won't visit A's children profiles, the children B &C profile won't be merged into the base B & C profile.

I see. It makes sense now. A callee profile can be returned only if it is discovered at some callsite in the caller. Thanks for the explanation!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158891/new/

https://reviews.llvm.org/D158891



More information about the llvm-commits mailing list