[PATCH] D111750: [llvm-profgen] Allow unsymbolized profile as perf input

Mon Oct 25 12:30:37 PDT 2021

wlei added inline comments.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:797
+    }
+    Key->genHashCode();
+    auto Ret =
----------------
wenlei wrote:
> wlei wrote:
> > wenlei wrote:
> > > wlei wrote:
> > > > wenlei wrote:
> > > > > Would it better if hash code is lazily generated instead of requiring an explicit call? 
> > > > > 
> > > > > ```
> > > > > getHashCode() {
> > > > >   if (HashCode == 0) 
> > > > >      genHashCode()
> > > > >   ...
> > > > > }
> > > > > ```
> > > > Here to explicitly call `genHashCode` is intentional, the reason is we want to avoid making `genHashCode` a virtual function, i, e, avoid call `genHashCode `after casting to base class. so we separate like:
> > > > ```
> > > > derived class:  HashCode = genHashCode();
> > > > 
> > > > base class :  getHashCode{return HashCode;}
> > > > ```
> > > > 
> > > > 
> > > Not sure if I understand. Why do we want to avoid call genHashCode after casting to base class?
> > Currently we have two type of key, `StrKey` and `ProbeKey` and they derived from the base  `ContextKey`.
> > 
> > We have a hash map to store them, like unordered_map<ContextKey, ...>, the logic of insert is like
> > ```
> > StrKey* key = ..
> > 
> > hashmap[key] = ...;   
> > ```
> > In the hashmap,  `StrKey*/ProbeKey*` implicitly be cast to a ContextKey* class then call `ContextKey->getHashCode` , so `getHashCode` should be a virtual function which has overhead because  `StrKey` and `ProbeKey` have different genHashCode. 
> > 
> > So to avoid this, we can explicitly call StrKey->genHashCode() before being casting to base and store it into a variable `HashCode`
> > 
> > then base's getHashCode just read the HasCode, no need a virtual function.
> > 
> > 
> > 
> > 
> > 
> > 
> > so getHashCode should be a virtual function which has overhead because StrKey and ProbeKey have different genHashCode.
> 
> > So to avoid this
> 
> Why do we want to avoid this? performance reason? It looks to me that having getHashCode as virtual function is natural and clean. 
> 
> This pattern perhaps is not related to this patch though. 
Yeah, my initial intention is performance reason. I can try it in a separate patch.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:788
+    // Read context stack for CS profile.
+    if (Line.startswith("[")) {
+      ProfileIsCS = true;
----------------
hoy wrote:
> Add a test for this?
We have the test, see noinline-cs-noprobe.test.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:654
+      src_n->dst_n:count_n
+    [context stack2]
+      ......
----------------
wenlei wrote:
> nit: this does not align with `[context stack1]`
updated comments

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:659
+*/
+class UnsymbolizedProfileReader : public PerfReaderBase {
+public:
----------------
hoy wrote:
> wenlei wrote:
> > hoy wrote:
> > > What is the main reason of making this type hierarchy? It looks like `UnsymbolizedProfileReader` doesn't need most of the interfaces `PerfReaderBase` provides. Conceptually it sounds to me that the two classes function independently, and if we'd like code sharing,  `UnsymbolizedProfileReader` can be made the base class of `PerfReaderBase` or make a new base class that simply reads in something and outputs an symbolized profile?
> > > UnsymbolizedProfileReader can be made the base class of PerfReaderBase
> > 
> > I don't think this is a good idea. Conceptually, PerfReader is not a special kind of UnsymbolizedProfileReader.
> Right, they are independent of each other. PerfReader really deals with perf input. They only share in the raw output writting. A new base class makes more sense?
It seems to me the hierarchy should be  like 
```
PerfReaderBase.  --->   PerfScriptReader.   ---> LBRPerfReader
                                            ---> HybridPerfReader

                 --->   UnsymbolizedProfileReader
```
This will make the hierarchy deep though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111750/new/

https://reviews.llvm.org/D111750