[PATCH] D89723: [CSSPGO][llvm-profgen]Context-sensitive profile data generation

Fri Nov 20 13:14:23 PST 2020

wlei added inline comments.

================
Comment at: llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript:2
+Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
+PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/noinline-cs-noprobe.perfbin
+
----------------
wenlei wrote:
> I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which case we just use preferred load address from ELF header. 
> 
> Can you add a test case that doesn't have PERF_RECORD_MMAP2? Looks like currently we would just proceed with parsing without a base address set?
Yeah, currently PERF_RECORD_MMAP2 is required.
The problem using preferred load address for non-mmap event is one perf address might belong to multiple binaries, which will mess up the whole process. Also we need to one more perftrace scan to confirm there is no mmap2 event so that we can switch to use preferred address.
or we can have a switch like "--no-mmp2-events" to explicitly tell the tool use preferred address, also only support one binary under this switch. or we need some info in the perf trace tell which binary it belong to(I remembered we discuss this internally). any suggestion on this?

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:484-487
+  if (getPerfScriptType() == PERF_LBR_STACK) {
+    // Unwind samples if it's hybird sample
+    unwindSamples();
+  }
----------------
wenlei wrote:
> What would be the workflow for (non-CS) AutoFDO with this new implementation? 
> 
> It looks like `parseTrace` is responsible for aggregation only, then even for AutoFDO, there'll be a post-process after that, to get range:count, right?
> 
> so it looks to me that a unified workflow could be something like this?
> 
> ```
> for (auto Filename : PerfTraceFilenames)
>     parseAndAggregateTrace(Filename);
> 
> generateRawProfile(); 
> ```
> 
> In side `generateRawProfile`, we would do simple range overlap computation for AutoFDO, or unwind for CSSPGO.
> 
> Also see comments on `AggregationCounter` - in addition to unifying the workflow, it would be good to unify data structure as well if possible. What do you think?
Good suggestion! As you mention, we can incorporate all into unwinder by treating non-CS profile as hybrid sample with empty call stack. So how about we do that when implementing non-CS part, right now I will change to code like blow?

```
void generateRawProfile (..) {
  if(getPerfScriptType() == PERF_LBR) {
     // range overlap computation for regular AutoFdo
     ...
    } else if (getPerfScriptType() == PERF_LBR_STACK) {
    // Unwind samples if it's hybird sample
    unwindSamples();
  }
}
```

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:211
+// into this structure and the value is the sample counter.
+using AggregationCounter =
+    std::unordered_map<HybridSample, uint64_t, HybridSampleHash>;
----------------
wenlei wrote:
> The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in place that can cover both AutoFDO and CSSPGO in a generic way.
> 
> Perhaps we can treat non-CS AutoFDO profile (or regular LBR perf profile) just like a hybrid profile except stack part is always empty? Is that what you have in mind?  
Yeah, it should not specific to unwinder, I will move to PerfReader to support both AutoFDO and CSSPGO

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89723/new/

https://reviews.llvm.org/D89723