[PATCH] D89723: [CSSPGO][llvm-profgen]Context-sensitive profile data generation
Wenlei He via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 30 00:10:25 PDT 2020
wenlei added inline comments.
================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:499
+ // estimate it from inlinee's profile using entry body sample.
+ uint64_t getEstimatedEntryCount() const {
+ // If we have counts for branches/calles to function directly,
----------------
This can now be merged with `getEntrySamples`, with dispatching based on `ProfileIsCS`.
================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:506-507
+ // If we don't have samples with location, use 1 to indicate live.
+ if (BodySamples.size() == 0)
+ return 1;
+ // Otherwise, for now use the naive estimation of using the
----------------
We probably shouldn't arbitrarily assume live for a general helper function. The logic to assume live can be moved to caller if needed.
================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:509
+ // Otherwise, for now use the naive estimation of using the
+ // average of first body sample count. Sophisticated
+ // CFG based entry count inference can be added later.
----------------
nit: there's no "average" now with this version.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:126-129
+ // Unwind returns - we check whether src and dst belong to the same
+ // function to see if we're crossing function boundary here, and since
+ // calls are already covered, that gets us all returns. Note this is
+ // under the assumption that there's no tail call.
----------------
The comment is no longer accurate - now we don't check if src/dst crossing function binary, instead we check whether the IP is indeed at a return instruction and tail call is no longer a problem for this particular processing. (My bad I didn't update the comment in the prototype..)
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:241
+ // Filter out the branch sample if it is identical with the previous
+ // one. This is due to a bug in LBR stack on Skylake (T24431811).
+ // Sometimes the LBR stack contains duplicate entries, which largely
----------------
Remove internal task Id T24431811
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:302
+
+bool PerfReader::checkValidCallStack(UnwinderTrace &Trace) {
+ // Ignore if the entire stack does not belong to binary of interest
----------------
This function can take reference to `Trace.CallStack` directly since it's only checking call stack.
Actually this function doesn't seem needed, line 290 check non-empty already, and we just need a `return Trace.Binary->addressInPrologEpilog(Trace.CallStack.front())` at line 299.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:304
+ // Ignore if the entire stack does not belong to binary of interest
+
+ if (Trace.CallStack.size() == 0)
----------------
remove the blank line?
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:52
+// and LBR stack. It's also used as the key for trace aggregation
+struct UnwinderTrace {
+ // Profiled binary that current frame address belongs to
----------------
This is just an encapsulation for synchronized stack and lbr sample, and it's orthogonal to unwinder. Suggest decouple it from unwinder in naming, instead call it `HybridSample`.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:55
+ ProfiledBinary *Binary;
+ // Call stack recorded in anti-execution(leaf to entry) order
+ std::list<uint64_t> CallStack;
----------------
nit: I'd replace anti-execution with bottom-up order (or leaf to root).
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:57
+ std::list<uint64_t> CallStack;
+ // LBR stack recorded in anti-execution order
+ SmallVector<LBREntry, 32> LBRStack;
----------------
nit: anti-execution sounds a bit confusing. I think the canonical term is FIFO order (LBR sampling has two modes, FIFO for trace and FILO for call stack).
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:90
+ // TODO: switch to use trie for call stack
+ std::list<uint64_t> CallStack;
+ // Used to fall through the LBR stack
----------------
Can this be a reference as well just like `LBRStack` and point to the input Sample? Note header comment also states "it doesn't hold the data but only keep the pointer/index of the data".
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:128
+// accumulate sample count with the same context.
+using ContextToSampleMap =
+ std::unordered_map<std::string,
----------------
1. To be accurate, this is actually a counter rather than a map.
2. It can be confusing if this is used for both branch and range, even though branch and range share the bit-wise representation. I think we can typedef `BranchSample` and `RangeSample` both to `std::pair<uint64_t, uint64_t>`, and then typedef `ContextBranchCounter` and `ContextRangeCounter` to `std::unordered_map<std::string, std::map<BranchSample, uint64_t>`, ..
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:229
+ ContextToSampleMap &getRangeCounts() { return RangeCounts; }
+ ContextToSampleMap &getAddressCounts() { return AddressCounts; }
+ ContextToSampleMap &getBranchCounts() { return BranchCounts; }
----------------
Do we actually use address now? Can we remove all address and probe related stuff and add them properly in later patches?
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:234-238
+ void parseMMap2Event(line_iterator &Line);
+ void parseTrace(StringRef Filename);
+ void parseEvent(line_iterator &Index);
+ // Parse the hybrid trace including the call and LBR line
+ void parseHybridTrace(line_iterator &Line);
----------------
I suggest let's establish a consistent naming convention here wrt what is a trace and what is an event:
- Trace is a series of perf events.
- Each perf event can be an mmap event or a sample.
- hybrid trace is a series of perf event: mmap events and hybrid samples.
- hybrid sample is lbr sample plus stack sample.
With that, we can rename the following:
```
void parseHybridTrace(line_iterator &Line);
->
void parseHybridSample(line_iterator &Line);
UnwinderTrace
->
HybridSample
TraceAggregation
->
SampleAggregation
```
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:236
+ void parseTrace(StringRef Filename);
+ void parseEvent(line_iterator &Index);
+ // Parse the hybrid trace including the call and LBR line
----------------
nit: why name line_iterator Index for this one, and different from others?
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:278-281
+ // Number of total LBR records
+ // TODO: uint64_t SampleCouter;
+ // Number of total groups of 32 LBR records + call stack
+ // TODO: uint64_t SampleGroupCounter;
----------------
I suggest we either include them properly or remove them from the patch. We still quite a few things not in upstream patch, and it's not possible to have TODOs for all of them.
================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:2
+//===-- ProfileGenerator.cpp - Profile Generator ---------------------*- C++
+//-*-===//
+//
----------------
nit: all header comments are screwed up by our internal linter.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D89723/new/
https://reviews.llvm.org/D89723
More information about the llvm-commits
mailing list