[PATCH] D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation
Wei Mi via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 10 21:28:53 PST 2021
wmi added a comment.
> This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it.
5x speedup shows it is a really impressive improvement. I am wondering whether there is callstack overlap between different LBR samples so you can have further grouping of call frames -- by reusing unwindState. You may also save some cost by reusing the frame trie. IIUC although samples have been aggregated based on callstack, each LBR sample may have multiple callstacks inferred from unwindCall/unwindReturn. If there are callstack overlap between different LBR samples, you may be able to further group them.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:87
+void VirtualUnwinder::recordSampleWithinFrame(
+ UnwindState::Frame *Cur, SmallVector<uint64_t, 16> &CallStack) {
+ if (Cur->RangeSamples.empty() && Cur->BranchSamples.empty())
----------------
Use SmallVectorImpl<uint64_t>& as a parameter type instead of SmallVector<uint64_t, 16>&. There are some other places with the same issue.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94110/new/
https://reviews.llvm.org/D94110
More information about the llvm-commits
mailing list