[PATCH] D89723: [CSSPGO][llvm-profgen]Context-sensitive profile data generation

Fri Nov 20 10:55:59 PST 2020

wenlei added inline comments.

================
Comment at: llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript:2
+Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
+PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/noinline-cs-noprobe.perfbin
+
----------------
I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which case we just use preferred load address from ELF header. 

Can you add a test case that doesn't have PERF_RECORD_MMAP2? Looks like currently we would just proceed with parsing without a base address set?

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:484-487
+  if (getPerfScriptType() == PERF_LBR_STACK) {
+    // Unwind samples if it's hybird sample
+    unwindSamples();
+  }
----------------
What would be the workflow for (non-CS) AutoFDO with this new implementation? 

It looks like `parseTrace` is responsible for aggregation only, then even for AutoFDO, there'll be a post-process after that, to get range:count, right?

so it looks to me that a unified workflow could be something like this?

```
for (auto Filename : PerfTraceFilenames)
    parseAndAggregateTrace(Filename);

generateRawProfile(); 
```

In side `generateRawProfile`, we would do simple range overlap computation for AutoFDO, or unwind for CSSPGO.

Also see comments on `AggregationCounter` - in addition to unifying the workflow, it would be good to unify data structure as well if possible. What do you think?

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:211
+// into this structure and the value is the sample counter.
+using AggregationCounter =
+    std::unordered_map<HybridSample, uint64_t, HybridSampleHash>;
----------------
The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in place that can cover both AutoFDO and CSSPGO in a generic way.

Perhaps we can treat non-CS AutoFDO profile (or regular LBR perf profile) just like a hybrid profile except stack part is always empty? Is that what you have in mind?  

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89723/new/

https://reviews.llvm.org/D89723