[PATCH] D108153: [llvm-profgen] Support LBR only perf script

Thu Aug 19 11:59:51 PDT 2021

hoy added inline comments.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:706
+  for (const auto &Item : AggregatedSamples) {
+    const HybridSample *Sample = dyn_cast<HybridSample>(Item.first.getPtr());
+    computeCounterFromLBR(Sample, Item.second);
----------------
wenlei wrote:
> wlei wrote:
> > hoy wrote:
> > > I'm thinking of separating LBR samples from `HybridSample` and making `HybridSample` inherit that. This way they are technically separate and could reduce confusion. What do you think?
> > Sounds good, changed!
> I actually find the inheritance isn't really necessary now. The hashing specialization in LBRSample does not add value because the general version for HybridSample can handle that too, and a separate hash for LBRSample just adds duplication. On the contrary, how about we rename `HybridSample` to `PerfSample` and remove the inheritance structure all together?
> 
> Also I don't think HybridSample being subclass of LBRSample is good because this is more about composition. What about if we have a perf sample that only has call stack but not LBR? inheritance wouldn't work well for that (and we don't want virtual inheritance). This is similar to why we don't make subclass for aggregation count. 
Removing the inheritance has a cost of stacking fields that are not necessarily used at the same time together. For example, the `  SmallVector<uint64_t, 16> CallStack` field will be there for `LBRSamples` and in member functions the unused fields have to be checked. But we don't have virtual method overhead as a result of removing inheritance, which is a cons of OOP.

Multi-inheritance can be use to a separate callstack samples. 

I still prefer OOP here from memory and code maintenance and extension point of view. Let me know your thoughts.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:573
+      // Skip the aggregated count
+      if (!TraceIt.getCurrentLine().getAsInteger(10, FrameAddr))
+        TraceIt.advance();
----------------
wlei wrote:
> hoy wrote:
> > Why need this? Can the count be just skipped since it does not reflect a call stack sample and LBR sample?
> Here it's in fact to skip it. To distinguish LBR and Hybrid sample, it used `Count` to indicate call stack exists, >0 means hybrid sample. When adding the aggregated count, it will also increase `Count` and treat LBR sample as hybrid sample incorrectly. 
I see. Do we always have an aggregated count for both hybrid and LBR samples? If not, a decimal number can also be a valid call frame addr in which case we may not be able to tell LBR samples apart from hybrid samples.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108153/new/

https://reviews.llvm.org/D108153