[PATCH] D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

Lei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 25 13:31:00 PST 2021


wlei added inline comments.


================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:123
+      } else {
+        CallStack.push_back(reinterpret_cast<uint64_t>(CallProbe));
+      }
----------------
wenlei wrote:
> Would be nice to have a unified representation for frame stack and probe stack, but without replying on reinterpret_cast of probe pointer.. 
> 
> Using uint64_t then rely on Binary->usePseudoProbes() to decide how to interpret the value seem less than ideal.. Do this through template functions with the different part done through specialization helpers? 
Yes, changed to template functions and create two struct FrameStack and ProbeStack for this, thanks!


================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:204
+  // Call stack trie node
+  struct Frame {
+    const uint64_t Address = 0;
----------------
wenlei wrote:
> What about name it `ProfiledFrame`? (we have ProfileBinary)
Good point


================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:206
+    const uint64_t Address = 0;
+    Frame *Parent = nullptr;
+    SampleVector RangeSamples;
----------------
wenlei wrote:
> Raw pointer is used for parent, but unique_ptr is used for child. Would be good to keep consistent.
Oh, the child is a new allocated instance so we use smart pointer for it, but the parent pointer can always point to a pre-allocated instance or null(for root). Also seems it would cause a recursive deconstruction of unique_ptr error if making parent unique_ptr.


================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:211
+
+    Frame() {}
+    Frame(uint64_t Addr) : Address(Addr) {}
----------------
wenlei wrote:
> This ctor can be merged into the one below with default parameter.
Good catch!


================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:281
+
+  void initCallStackTrie(const SmallVectorImpl<uint64_t> &CallStack) {
+    Frame *Cur = &DummyTrieRoot;
----------------
wenlei wrote:
> A few more naming nits: 
> 
> initFrameTrie? CallStackTrie prompts a trie whose nodes are call stack.. 
> 
> appendFrame and popStackLeaf are functionally symmetric, would be good for them to follow similar names. e.g. pushFrame, popFrame. and then swithToFrame..
Yeah, that's very good naming suggestions, thanks!


================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:419
   void unwindBranchWithinFrame(UnwindState &State);
+  void recordSampleWithinFrame(UnwindState::Frame *Cur,
+                               SmallVectorImpl<uint64_t> &CallStack);
----------------
wenlei wrote:
> We "record" samples on trie (have recordRangeCount and recordBranchCount), then "collect" samples from trie into context sample maps via a DFS traversal.
> 
> So we could have a more intuitive convention for readability:
> 
> - ProfiledFrame::recordRangeCount
> - ProfiledFrame::recordBranchCount
> - VirtualUnwinder::collectSamplesFromFrame
> - VirtualUnwinder::collectSamplesFromFrameTrie
Cool, renamed


================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:441-442
   // ProfileGenerator
-  SampleCounter &getOrCreateCounterForProbe(const ProfiledBinary *Binary,
-                                            std::list<uint64_t> &CallStack);
+  SampleCounter &
+  getOrCreateCounterForProbe(SmallVectorImpl<uint64_t> &CallStack);
 
----------------
wenlei wrote:
> We can make this a private member since it's only a helper for getOrCreateCounter. (a few other functions can be private too). 
Yes, make all of them except `unwind()` to private


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94110/new/

https://reviews.llvm.org/D94110



More information about the llvm-commits mailing list