[PATCH] D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

Sun Jan 10 21:28:53 PST 2021

wmi added a comment.

> This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it.

5x speedup shows it is a really impressive improvement. I am wondering whether there is callstack overlap between different LBR samples so you can have further grouping of call frames -- by reusing unwindState. You may also save some cost by reusing the frame trie. IIUC although samples have been aggregated based on callstack, each LBR sample may have multiple callstacks inferred from unwindCall/unwindReturn. If there are callstack overlap between different LBR samples, you may be able to further group them.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:87
+void VirtualUnwinder::recordSampleWithinFrame(
+    UnwindState::Frame *Cur, SmallVector<uint64_t, 16> &CallStack) {
+  if (Cur->RangeSamples.empty() && Cur->BranchSamples.empty())
----------------
Use SmallVectorImpl<uint64_t>& as a parameter type instead of SmallVector<uint64_t, 16>&. There are some other places with the same issue.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94110/new/

https://reviews.llvm.org/D94110