[PATCH] D89723: [CSSPGO][llvm-profgen]Context-sensitive profile data generation

Fri Oct 30 00:10:25 PDT 2020

wenlei added inline comments.

================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:499
+  // estimate it from inlinee's profile using entry body sample.
+  uint64_t getEstimatedEntryCount() const {
+    // If we have counts for branches/calles to function directly,
----------------
This can now be merged with `getEntrySamples`, with dispatching based on `ProfileIsCS`.

================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:506-507
+    // If we don't have samples with location, use 1 to indicate live.
+    if (BodySamples.size() == 0)
+      return 1;
+    // Otherwise, for now use the naive estimation of using the
----------------
We probably shouldn't arbitrarily assume live for a general helper function. The logic to assume live can be moved to caller if needed.

================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:509
+    // Otherwise, for now use the naive estimation of using the
+    // average of first body sample count. Sophisticated
+    // CFG based entry count inference can be added later.
----------------
nit: there's no "average" now with this version.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:126-129
+      // Unwind returns - we check whether src and dst belong to the same
+      // function to see if we're crossing function boundary here, and since
+      // calls are already covered, that gets us all returns. Note this is
+      // under the assumption that there's no tail call.
----------------
The comment is no longer accurate - now we don't check if src/dst crossing function binary, instead we check whether the IP is indeed at a return instruction and tail call is no longer a problem for this particular processing. (My bad I didn't update the comment in the prototype..)

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:241
+    // Filter out the branch sample if it is identical with the previous
+    // one. This is due to a bug in LBR stack on Skylake (T24431811).
+    // Sometimes the LBR stack contains duplicate entries, which largely
----------------
Remove internal task Id T24431811

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:302
+
+bool PerfReader::checkValidCallStack(UnwinderTrace &Trace) {
+  // Ignore if the entire stack does not belong to binary of interest
----------------
This function can take reference to `Trace.CallStack` directly since it's only checking call stack.

Actually this function doesn't seem needed, line 290 check non-empty already, and we just need a `return Trace.Binary->addressInPrologEpilog(Trace.CallStack.front())` at line 299.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:304
+  // Ignore if the entire stack does not belong to binary of interest
+
+  if (Trace.CallStack.size() == 0)
----------------
remove the blank line?

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:52
+// and LBR stack. It's also used as the key for trace aggregation
+struct UnwinderTrace {
+  // Profiled binary that current frame address belongs to
----------------
This is just an encapsulation for synchronized stack and lbr sample, and it's orthogonal to unwinder. Suggest decouple it from unwinder in naming, instead call it `HybridSample`.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:55
+  ProfiledBinary *Binary;
+  // Call stack recorded in anti-execution(leaf to entry) order
+  std::list<uint64_t> CallStack;
----------------
nit: I'd replace anti-execution with bottom-up order (or leaf to root).

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:57
+  std::list<uint64_t> CallStack;
+  // LBR stack recorded in anti-execution order
+  SmallVector<LBREntry, 32> LBRStack;
----------------
nit: anti-execution sounds a bit confusing. I think the canonical term is FIFO order (LBR sampling has two modes, FIFO for trace and FILO for call stack). 

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:90
+  // TODO: switch to use trie for call stack
+  std::list<uint64_t> CallStack;
+  // Used to fall through the LBR stack
----------------
Can this be a reference as well just like `LBRStack` and point to the input Sample? Note header comment also states "it doesn't hold the data but only keep the pointer/index of the data".

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:128
+// accumulate sample count with the same context.
+using ContextToSampleMap =
+    std::unordered_map<std::string,
----------------
1. To be accurate, this is actually a counter rather than a map. 
2. It can be confusing if this is used for both branch and range, even though branch and range share the bit-wise representation. I think we can typedef `BranchSample` and `RangeSample` both to `std::pair<uint64_t, uint64_t>`, and then typedef `ContextBranchCounter` and `ContextRangeCounter` to `std::unordered_map<std::string, std::map<BranchSample, uint64_t>`, ..

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:229
+  ContextToSampleMap &getRangeCounts() { return RangeCounts; }
+  ContextToSampleMap &getAddressCounts() { return AddressCounts; }
+  ContextToSampleMap &getBranchCounts() { return BranchCounts; }
----------------
Do we actually use address now? Can we remove all address and probe related stuff and add them properly in later patches?

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:234-238
+  void parseMMap2Event(line_iterator &Line);
+  void parseTrace(StringRef Filename);
+  void parseEvent(line_iterator &Index);
+  // Parse the hybrid trace including the call and LBR line
+  void parseHybridTrace(line_iterator &Line);
----------------
I suggest let's establish a consistent naming convention here wrt what is a trace and what is an event: 

- Trace is a series of perf events.
- Each perf event can be an mmap event or a sample.
- hybrid trace is a series of perf event: mmap events and hybrid samples.
- hybrid sample is lbr sample plus stack sample.

With that, we can rename the following:

```
void parseHybridTrace(line_iterator &Line);
->
void parseHybridSample(line_iterator &Line);

UnwinderTrace
->
HybridSample

TraceAggregation
->
SampleAggregation
```

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:236
+  void parseTrace(StringRef Filename);
+  void parseEvent(line_iterator &Index);
+  // Parse the hybrid trace including the call and LBR line
----------------
nit: why name line_iterator Index for this one, and different from others?

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:278-281
+  // Number of total LBR records
+  // TODO: uint64_t SampleCouter;
+  // Number of total groups of 32 LBR records + call stack
+  // TODO: uint64_t SampleGroupCounter;
----------------
I suggest we either include them properly or remove them from the patch. We still quite a few things not in upstream patch, and it's not possible to have TODOs for all of them. 

================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:2
+//===-- ProfileGenerator.cpp - Profile Generator  ---------------------*- C++
+//-*-===//
+//
----------------
nit: all header comments are screwed up by our internal linter.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89723/new/

https://reviews.llvm.org/D89723