[PATCH] D158891: [CSSPGO] Retire FlattenProfileForMatching

Hongtao Yu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 29 21:14:10 PDT 2023


hoy added inline comments.


================
Comment at: llvm/lib/Transforms/IPO/SampleProfile.cpp:2402
+    // Use top-level nested FS for counting profile mismatch metrics since
+    // currently once a callsite is mismatched, all its children profile are
+    // dropped.
----------------
wlei wrote:
> hoy wrote:
> > wlei wrote:
> > > hoy wrote:
> > > > nit: profile -> profiles
> > > > 
> > > > > once a callsite is mismatched, all its children profile are drooped
> > > > 
> > > > Why children profiles can be dropped? Counting happens before inlining right?
> > > yeah, In the current  `findCalleeFunctionSamples` implementation, if the mismatch happens, it would return null, but there is no extra process to reuse those samples, so it's "dropped".
> > > 
> > > I think as Wenlei has proposed a solution in other patch: 
> > > >Maybe what we should actually do is to reuse the nested profile from those mismatched parent functions, something like run promoteMergeNotInlinedContextSamples on top level functions samples not used? We do reuse nested profile for not inlined callees from parent function.
> > > 
> > > That sounds reasonable, but as we talked offline, we could leave it in future.
> > I'm a bit confused. `FlattenedProfiles` is immutable once initialized right? What do we lose when using `FSFlattened` for `countProfileMismatches`?
> > 
> > If we are not to count for external functions, the nested callees there will be skipped too? But using a flattened profile can keep them, if they are not external.
> Let me show you with an example, say we have a profile like
> ```
> A:102:1
>  1:1
>  2:B: 101:1
>    1:1
>    2:C:100:1
>      1:1
>       ...
>      100:1
> ```
> Say callsite B in A is changed to other location and the callsite is mismatched.  Here like in this patch, it use the non-flattened profile.
> the mismached Samples is 101, i,e, it not only count B's profile but also all the children profiles(`C`)
> 
> But if we use flattened profile:
> ```
> A:2:1
>  1:1
>  1: B:1
> B:2:1
>  1:1
>  1: C:1
> C:100:1
>  1:1
>   ...
>   100:1
> ```
> The callsite of B in A is mismatched, but now the callsite count is 1, the mismatched Samples is 1.
> 
> I know it feels counter-intuitive, there is no mismatch in B and C, but that is the current implementation, we don't reuse the children profile. Once the parent callsite is mismatched, all the children profiles are dropped.
> 
>  
> 
Thanks for the explanation. I have a better understanding now. 

But I still have a question about how to count mismatched samples. In your examples, the 101 samples for B & C may still be used if B & C have no source changes. Yes, I get that the inlining won't happen, but even if it happens it only saves the cost of one call instruction and perhaps corresponding parameter passing cost? In other words, whether the 101 samples can be applied to inlined B&C depends on how well the matching works for B & C?

BTW, how much difference in terms of the staleness number did you see with switching to using nested profile?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158891/new/

https://reviews.llvm.org/D158891



More information about the llvm-commits mailing list