[PATCH] D121643: [llvm-profgen] On-demand pseudo probe decoding

Hongtao Yu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 22 23:58:36 PDT 2022


hoy added inline comments.


================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:374
 
+void ProfileGeneratorBase::collectProfiledFunctions() {
+  std::unordered_set<const BinaryFunction *> ProfiledFunctions;
----------------
wenlei wrote:
> wlei wrote:
> > hoy wrote:
> > > wlei wrote:
> > > > wlei wrote:
> > > > > For CS profile,  the input contains CallStack + LBRStack, but here it only considers the addresses from LBRStack(Range + Branch) to compute profiledFunctions. Does it miss some functions whose address(frame) is only from CallStack?
> > > > > 
> > > > > For example, if we only have one following entry:
> > > > > [main @ foo] : {rangeFrom : rangeTo}
> > > > >  
> > > > > Supposing rangeFrom and rangeTo only belong to `foo`(foo is not inlined to main), then the result of ProfiledFunctions only contains `foo` and misses `main`, right?
> > > > > 
> > > > > From the term "ProfiledFunctions", it should be `foo`, but for CS inlining, I guess we still need the frame `main`?
> > > > > 
> > > > Sorry, I missed your notes on the summary. 
> > > > > "Note that in on-demand mode, a calling context may be truncated at a caller which does not have sample. This is a discrepancy from the complete mode. However, with the CS preinliner such context will always be truncated from the compiler inlining point of view." 
> > > > 
> > > > So in on-demand mode, it will implicitly trim some contexts, even before CS-preinliner or cold-context-trimming. 
> > > Yes, and those contexts, if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function.
> > Thanks for the clarification. Then it looks like there can be  a future work to compute the ProfiledFunctions in early time and truncate the callStack during unwinding so that `SampleCounters` is reduced to save parsing time and memory.
> > if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function. 
> 
> Actually this is arguable. Top-down inlining could chose to also inline cold calls in certain cases. E.g. even if A->B is cold, inlining A->B->C provides an opportunity for B->C to specialize under A->B->C, without affecting D->B->C path. 
> 
> What you described could be benign at the moment, but I'm wondering if we keep main like Lei suggested, what's the extra memory overhead? Semantically, keeping the old behavior might be better because the generated profile is supposed to have complete context before preinliner/trimming. Ofc, if that behavior bloats memory too much, the way you optimize it here by trimming earlier helps.
For now the sample inliner only processes functions with profiles. So for cold function call like A->B, both A and B should have a profile in order for the call to be inlined. With this on-demand approach, if A is not sampled, we truncate the context to B only. Without this diff, we would generate a A:1 @ B like profile, but then during top-down inlining, context promotion would truncate the context as well. So for now we wouldn't lose any opportunity. 

This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions.

Good point on moving the profile function computation earlier so that we could do more truncation during the unwinding to further save some memory. So far our memory usage looks good.

I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful?



================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:382
               reinterpret_cast<const uint8_t *>(Contents.data()),
-              Contents.size()))
+              Contents.size(), ProfiledGuids))
         exitWithError("Pseudo Probe decoder fail in .pseudo_probe section");
----------------
wenlei wrote:
> hoy wrote:
> > wenlei wrote:
> > > wenlei wrote:
> > > > How about completely unify the workflow of on-demand vs full decoding? 
> > > > 
> > > > The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids. 
> > > > 
> > > > Correct me if I'm wrong, but I think this was we keep the same behavior for full decoding as of today (no context truncation before preinliner), and we can also remove ProbeStack and replace it with AddressStack everywhere. 
> > > > The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids.
> > > 
> > > missed this one? By only difference, I meant the only place that need to check `DecodeProbeForProfiledFunctionsOnly`. 
> > Right now for full decoding we provide an empty `ProfiledGuids` to the decoder instead of computing Guids for all disassembled functions which could be expansive.  The only place that requires full decoding is when --show-assembly-only is on. Otherwise it is kind of unified?
> > 
> > I'm adding an assert like this
> > 
> > ```
> >   assert((!ProfiledFunctions.empty() || !DecodeProbeForProfiledFunctionsOnly ||
> >           ShowDisassemblyOnly) &&
> >          "Profiled functions should not be empty in on-demand probe decoding "
> >          "mode");
> > ```
> > 
> > We can also use a 0 as a guid to represent all functions. WDYT?
> Passing empty guid for full decoding works too. Then can we only check `DecodeProbeForProfiledFunctionsOnly` within `decodePseudoProbe` and pass empty Guid when the flag is false?
> 
> Currently `DecodeProbeForProfiledFunctionsOnly` is checked at many places before calling `decodePseudoProbe`. What I meant was that regardless of that flag, `decodePseudoProbe` should always be called at the same place (except for `ShowDisassemblyOnly`), but the flag only control what is passed as GuidFilter.   
I see what you mean.  Moved the flag checks into  decodePseudoProbe.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121643/new/

https://reviews.llvm.org/D121643



More information about the llvm-commits mailing list