[PATCH] D107800: [CSSPGO][llvm-profgen] Cap context stack to reduce memory usage

Tue Aug 10 20:51:34 PDT 2021

wlei added inline comments.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:23
 extern cl::opt<bool> ShowSourceLocations;
+extern cl::opt<int> CSProfCtxStackCap;

----------------
hoy wrote:
> Nit: move this into ProfileGenerator.h to reduct the number of declarations?
fixed!

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:53
+cl::opt<int> CSProfCtxStackCap(
+    "csprof-ctx-stack-cap", cl::init(20), cl::ZeroOrMore,
+    cl::desc("Cap context stack at a given depth. No cap if the input is -1."));
----------------
hoy wrote:
> wenlei wrote:
> > I think we could unify the switch names, e.g. `csprof-max-context-depth` and `csprof-max-cold-context-depth`? 
> Thanks for working on this. We probably do not inline so many levels of functions. But would be good to run through some perf testing or to turn this off by default.
Sounds good, will collect the statistic of the max inline depth in SampleProfile inliner on some benchmarks and change to that one, maybe 10 is good enough.  

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:615
   CSProfileGenerator::compressRecursionContext(ContextStrStack);
+  CSProfileGenerator::capContextStack(ContextStrStack, CSProfCtxStackCap);

----------------
wenlei wrote:
> Since `getExpandedContextStr` only covers line-based profile, for probe we rely on the trimming here in profile generation, which is later then where we do the trimming for line-based profile. Do we see peak memory drop if we trim the context in profile generation instead of during unwinder? 
Here it did for both, one during unwinder(see the one in PerfReader.cpp) and one here.

The answer is yes, it's better than unwinder only, here are some data:

10 depth for both: 17GB
10 depth for unwinder only:  26GB 
20 depth for both: 42GB
20 depth for unwinder only: 49GB

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.h:73

+  // Cap the context stack by cutting off from the bottom at a given depth.
+  template <typename T>
----------------
wenlei wrote:
> nit: bottom-up order in stack is usually callers-callee order, from bottom can be confusing as it means we trim callees which is not the case. 
> 
> also suggest rename capContextStack to trimContext.
Fixed!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107800/new/

https://reviews.llvm.org/D107800