[PATCH] D121643: [llvm-profgen] On-demand pseudo probe decoding

Wed Mar 23 09:55:07 PDT 2022

wenlei added inline comments.

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:374

+void ProfileGeneratorBase::collectProfiledFunctions() {
+  std::unordered_set<const BinaryFunction *> ProfiledFunctions;
----------------
hoy wrote:
> wenlei wrote:
> > wlei wrote:
> > > hoy wrote:
> > > > wlei wrote:
> > > > > wlei wrote:
> > > > > > For CS profile,  the input contains CallStack + LBRStack, but here it only considers the addresses from LBRStack(Range + Branch) to compute profiledFunctions. Does it miss some functions whose address(frame) is only from CallStack?
> > > > > > 
> > > > > > For example, if we only have one following entry:
> > > > > > [main @ foo] : {rangeFrom : rangeTo}
> > > > > >  
> > > > > > Supposing rangeFrom and rangeTo only belong to `foo`(foo is not inlined to main), then the result of ProfiledFunctions only contains `foo` and misses `main`, right?
> > > > > > 
> > > > > > From the term "ProfiledFunctions", it should be `foo`, but for CS inlining, I guess we still need the frame `main`?
> > > > > > 
> > > > > Sorry, I missed your notes on the summary. 
> > > > > > "Note that in on-demand mode, a calling context may be truncated at a caller which does not have sample. This is a discrepancy from the complete mode. However, with the CS preinliner such context will always be truncated from the compiler inlining point of view." 
> > > > > 
> > > > > So in on-demand mode, it will implicitly trim some contexts, even before CS-preinliner or cold-context-trimming. 
> > > > Yes, and those contexts, if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function.
> > > Thanks for the clarification. Then it looks like there can be  a future work to compute the ProfiledFunctions in early time and truncate the callStack during unwinding so that `SampleCounters` is reduced to save parsing time and memory.
> > > if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function. 
> > 
> > Actually this is arguable. Top-down inlining could chose to also inline cold calls in certain cases. E.g. even if A->B is cold, inlining A->B->C provides an opportunity for B->C to specialize under A->B->C, without affecting D->B->C path. 
> > 
> > What you described could be benign at the moment, but I'm wondering if we keep main like Lei suggested, what's the extra memory overhead? Semantically, keeping the old behavior might be better because the generated profile is supposed to have complete context before preinliner/trimming. Ofc, if that behavior bloats memory too much, the way you optimize it here by trimming earlier helps.
> For now the sample inliner only processes functions with profiles. So for cold function call like A->B, both A and B should have a profile in order for the call to be inlined. With this on-demand approach, if A is not sampled, we truncate the context to B only. Without this diff, we would generate a A:1 @ B like profile, but then during top-down inlining, context promotion would truncate the context as well. So for now we wouldn't lose any opportunity. 
> 
> This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions.
> 
> Good point on moving the profile function computation earlier so that we could do more truncation during the unwinding to further save some memory. So far our memory usage looks good.
> 
> I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful?
> 
> So for now we wouldn't lose any opportunity. This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions.
Yes, we don't lose inlining now, but that's not the main concern. 

> I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful?
What I was thinking was that, if we keep all context probe decoded, this change will strictly not affect output in anyways and we can remove the switch which simplifies things and always same memory.

I think that having full context also helps with verification even if we don't utilize that info for optimization now. So if the extra probes doesn't cost us much for memory, I'm leaning towards keeping them and remove the switch to make the new behavior the default and only choice.

There's another alternative: we always only decode profiled functions (except for show-assembly), then there's a switch to control whether extra context probe is decoded.

================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:296
 template <class ELFT>
-void ProfiledBinary::setPreferredTextSegmentAddresses(const ELFFile<ELFT> &Obj, StringRef FileName) {
+void ProfiledBinary::setPreferredTextSegmentAddresses(const ELFFile<ELFT> &Obj,
+                                                      StringRef FileName) {
----------------
just fyi, you can run linter on your changes only instead of the whole file. Not a big deal though, and thanks for fixing it for entire files. 

================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:366-368
+  std::unordered_set<uint64_t> ProfiledGuids;
+  for (auto *F : ProfiledFunctions)
+    ProfiledGuids.insert(Function::getGUID(F->FuncName));
----------------
When `DecodeProbeForProfiledFunctionsOnly` is off, are we supposed to keep `ProfiledGuids` empty here for buildAddress2ProbeMap, so all functions will be decoded?

================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:370
+
+  // Bail out if no functions are profiled in on-demand mode.
+  if (ProfiledGuids.empty() && DecodeProbeForProfiledFunctionsOnly && !ShowDisassemblyOnly)
----------------
Others may not be aware what "on-demand" mode means, because with the current version, there's no concept of on-demand - it's decoding all functions or decoding profile functions only. Suggest rephase the wording "on-demand"

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121643/new/

https://reviews.llvm.org/D121643