[PATCH] D127031: [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

Thu Jun 23 17:54:32 PDT 2022

wenlei added inline comments.

================
Comment at: llvm/include/llvm/ProfileData/SampleProf.h:701
+  // by the path from root to this node.
+  ContextTrieNode *ContextNode;
   // State of the associated sample profile
----------------
hoy wrote:
> wlei wrote:
> > hoy wrote:
> > > wlei wrote:
> > > > hoy wrote:
> > > > > wlei wrote:
> > > > > > hoy wrote:
> > > > > > > The field is only useful during context manipulation. Is it possible to store the link in an auxiliary map whereever it is used?
> > > > > > I think we use context for two things 1)context manipulation(inlining) 2) reader and writer based on ProfileMap.
> > > > > > My thought is `ContextNode` is the same thing to `FullContext`, i,e. we can always start from this node to iterate trie reversely to get the array of frames. I was wondering if we can use `ContextNode` to replace the `FullContext` in the future. And for sample reader and writer things, the key of the map can work as the FullContext, we might not need this field either. That's why I'm thinking to keep it here. WDYT?
> > > > > yeah, I had the same thought that the two fields are kinda redundant to each other. I'm a bit worried about the compile-time memory increase. Would it be possible to use a union for the two fields for now?
> > > > > 
> > > > > > 2) reader and writer based on ProfileMap.
> > > > > 
> > > > > What is this usage?
> > > > > 
> > > > > 
> > > > > > I was wondering if we can use ContextNode to replace the FullContext in the future.
> > > > > 
> > > > > It depends. For example, the profile merger probably does not want to always build a tri and merge multiple tries. The profile similarity checker may neither.
> > > > > 
> > > > > Would it be possible to use a union for the two fields for now?
> > > > Yeah, that's good to save memory. I had a try this, see the current version, it can pass all the test. However, I feel like it might be error-prone. The two shared fields have different way of initialization, currently I need to make sure `SampleContextFrames` is initialized then the ContextTrieNode field will be unknown status(not null). We might need to carefully use it in the future.
> > > > 
> > > > >reader and writer based on ProfileMap.
> > > > > What is this usage?
> > > > For CS profile, `SampleContext` is the key of ProfileMap, FullContext is always not empty here.  
> > > > https://github.com/llvm/llvm-project/blob/main/llvm/lib/ProfileData/SampleProfReader.cpp#L279
> > > > 
> > > > 
> > > > >It depends. For example, the profile merger probably does not want to always build a tri and merge multiple tries. The profile similarity checker may neither.
> > > > Yeah, right now ProfileMap always require the `FullContext` as the key. wondering if we can only use FullContext as the key, FunctionSample's SampleContext doesn't own the fullContext.
> > > > 
> > > > 
> > > Using union is error-prone, yes. My original thought is to use a separate map in ContextTracker that maintains the link from FunctionSamples to its ContextTrieNode. So far I don't see a use of getting ContextTrieNode from FunctionSamples out of the ContextTracker context. It seems encapsulating the map inside ContextTracker is enough. Does this make sense to you? 
> > > 
> > > 
> > I see, now I understand why to keep in a separate map, it's less intrusive to the FunctionSamples class. Thanks for your clarification.
> Yeah, it looks better now. Thanks for addressing this.
I also like this being an auxiliary data rather than extending with a field. Thanks for working on this

================
Comment at: llvm/include/llvm/Transforms/IPO/SampleContextTracker.h:93-94
 public:
-  struct ProfileComparer {
-    bool operator()(FunctionSamples *A, FunctionSamples *B) const {
-      // Sort function profiles by the number of total samples and their
-      // contexts.
-      if (A->getTotalSamples() == B->getTotalSamples())
-        return A->getContext() < B->getContext();
-      return A->getTotalSamples() > B->getTotalSamples();
-    }
-  };
-
-  // Keep profiles of a function sorted so that they will be processed/promoted
-  // deterministically.
-  using ContextSamplesTy = std::set<FunctionSamples *, ProfileComparer>;
+  // Traversal order of the trie will ensure that it will be processed
+  // deterministically, so use a vector as a container.
+  using ContextSamplesTy = std::vector<FunctionSamples *>;
----------------
nit: this comment is more like an explanation of the change (why we no longer need container to order elements), based on previous comment as context. but reader will never have previous comment/code as context. 

Saying this because when I read this comment, I actually got confused, and I only understand it after reading the original code..

================
Comment at: llvm/include/llvm/Transforms/IPO/SampleContextTracker.h:179
+  std::unordered_map<const FunctionSamples *, ContextTrieNode *>
+      FSamples2NodeMap;
+
----------------
nit: to be consistent with `FuncToCtxtProfiles`, name this `ProfileToNodeMap`

================
Comment at: llvm/lib/Transforms/IPO/SampleContextTracker.cpp:69
+                                    ContextTrieNode &&NodeToMove,
+                                    bool DeleteNode) {
   uint64_t Hash =
----------------
DeleteNode is always false and can be simplified? 

================
Comment at: llvm/lib/Transforms/IPO/SampleContextTracker.cpp:87
+ContextTrieNode &
+SampleContextTracker::moveToChildContext(ContextTrieNode &ToNodeParent,
+                                         const LineLocation &CallSite,
----------------
any reason to make this a member of SampleContextTracker rather than member of ContextTrieNode? asking because the naming `moveToChildContext` is assuming `this` is the parent to move to. 

================
Comment at: llvm/lib/Transforms/IPO/SampleContextTracker.cpp:227
+
+  while (!NodeQueue.empty()) {
+    ContextTrieNode *Node = NodeQueue.front();
----------------
I'm seeing many repetition of tree BFS and the boiler plate code. Perhaps it makes sense to create iterator API for SampleContextTracker to go over all nodes. 

================
Comment at: llvm/tools/llvm-profgen/CSPreInliner.cpp:65
+      ContextTracker(Tracker), Binary(Binary), Summary(Summary) {
+  ContextTracker.populateFuncToCtxtMap();
   // Set default preinliner hot/cold call site threshold tuned with CSSPGO.
----------------
populateFuncToCtxtMap isn't really the responsibility of preinliner. can we move this earlier to right after profile generation on trie?  

preinliner should be one client of context tracker and it can assume complete, fully functioning context tracker, including auxiliary maps.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127031/new/

https://reviews.llvm.org/D127031