[PATCH] D89723: [CSSPGO][llvm-profgen]Context-sensitive profile data generation
Wenlei He via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 20 10:55:59 PST 2020
wenlei added inline comments.
================
Comment at: llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript:2
+Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
+PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/noinline-cs-noprobe.perfbin
+
----------------
I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which case we just use preferred load address from ELF header.
Can you add a test case that doesn't have PERF_RECORD_MMAP2? Looks like currently we would just proceed with parsing without a base address set?
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:484-487
+ if (getPerfScriptType() == PERF_LBR_STACK) {
+ // Unwind samples if it's hybird sample
+ unwindSamples();
+ }
----------------
What would be the workflow for (non-CS) AutoFDO with this new implementation?
It looks like `parseTrace` is responsible for aggregation only, then even for AutoFDO, there'll be a post-process after that, to get range:count, right?
so it looks to me that a unified workflow could be something like this?
```
for (auto Filename : PerfTraceFilenames)
parseAndAggregateTrace(Filename);
generateRawProfile();
```
In side `generateRawProfile`, we would do simple range overlap computation for AutoFDO, or unwind for CSSPGO.
Also see comments on `AggregationCounter` - in addition to unifying the workflow, it would be good to unify data structure as well if possible. What do you think?
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:211
+// into this structure and the value is the sample counter.
+using AggregationCounter =
+ std::unordered_map<HybridSample, uint64_t, HybridSampleHash>;
----------------
The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in place that can cover both AutoFDO and CSSPGO in a generic way.
Perhaps we can treat non-CS AutoFDO profile (or regular LBR perf profile) just like a hybrid profile except stack part is always empty? Is that what you have in mind?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D89723/new/
https://reviews.llvm.org/D89723
More information about the llvm-commits
mailing list