[PATCH] D140908: [MemProf] Context disambiguation cloning pass [patch 1a/3]

Mon Mar 20 13:07:07 PDT 2023

tejohnson marked 20 inline comments as done.
tejohnson added a comment.

Address all comments other than the test reduction and graph traversal suggestions, which I will work on next.

In D140908#4198024 <https://reviews.llvm.org/D140908#4198024>, @snehasish wrote:

> Still working my way through `updateStackNodes`. Here are some of the comments I have so far --
>
> For the IR based tests, I noticed some comments which describe how the code was compiled. I wonder if it makes sense to use synthetic IR (derived from the original source) to make the tests more readable and less brittle. I took the indirect.ll test and ran it through llvm-reduce. and found that the reduced IR was much easier to follow. I attached my test.sh script and the reduced IR for indirect.ll. To run the reducer, `llvm-reduce --test=test.sh ../path/to/indirect.ll`. Let me know what you think.

This is a nice idea. Will do so for the tests but probably do that after addressing the other comments in case there is any more churn in the expected test output, and since it is more mechanical.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:17
+// The transformations can be performed either directly on IR (regular LTO), or
+// on a ThinLTO index (and later applied to the IR during the ThinLTO backend).
+// Both types of LTO operate on a the same base graph representation, which
----------------
snehasish wrote:
> nit: I don't think we introduce the ThinLTO changes in this patch so maybe update this text in the follow on patches instead? 
Instead of removing the ThinLTO comment, I added "(eventually)" in front of it. Otherwise the CRTP design choice doesn't make sense.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:312
+  /// iteration.
+  std::map<FuncTy *, std::vector<CallInfo>> FuncToCallsWithMetadata;
+
----------------
snehasish wrote:
> Iterating on the contents of this map (L935) is non-deterministic since we use a pointer as key. It would be nice to avoid non-determinism by using a MapVector even though I don't think there is any externally visible effect.
> 
> For NodeToCallingFunc below, we don't iterate on the contents so I think that's fine. 
Actually, the only time we index as a map is while building, and we don't really need to do this since we process each function once. I can simply change this to a vector of pairs.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:617
+  // Use LastContextId as a uniq id for MIB allocation nodes.
+  AllocNode->OrigStackOrAllocId = LastContextId;
+  // Alloc type should be updated as we add in the MIBs. We should assert
----------------
snehasish wrote:
> Are we marking the first AllocNode we encounter as a clone since
> * LastContextId starts from 0
> * LastContextId is incremented in addStackNodesForMIB (and duplicateContextIds)
> * Comment at L183 indicates that 0 is used to identify a clone
> 
> A simple fix would be to start LastContextId from 1 in the constructor? I found the update to LastContextId hard to follow but I don't have a suggestion on how this could be made simpler. 
> 
> Also if this is a bug, then maybe a unit test to check the consistency of internal data structures might be useful.
The first AllocNode indeed will have 0 value here. However, that isn't a bug or in conflict with the comment, although I could see how it might be confusing. For alloc nodes we simply use LastContextId to give them unique ids that are only used in dot dumping (see the first sentence in the comment for OrigStackOrAllocId. For stack nodes this is the stack id corresponding to the node (i.e. from the metadata). The comment is just noting that clones will retain a 0 value for this field, but that it isn't an issue since we only use it during matching of callsite metadata (when the graph is being built). I rewrote the comment on L183, hopefully that clarifies things.

The reason why we increment LastContextId before using it in addStackNodesForMIB is that that is where we are using it for its main purpose - to assign a unique (non-zero) id to each allocation context from memprof metadata.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1190
+
+  if (DumpCCG) {
+    dbgs() << "CCG before updating call stack chains:\n";
----------------
snehasish wrote:
> I think we should move the logic from L1190 to L1206 (end of the constructor) to the process method. It was a bit surprising for me to the process method not actually do any processing .. :)
> 
> Also this would bring all the dump and export calls in one place to follow sequentially.
I would prefer not to do this, since updateStackNodes called below is part of the graph building and should be here in the graph constructor.

Note that process() is currently empty only because the graph transformations were split into patches 2 and 3.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1378
+static void checkNode(const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
+                      bool CheckEdges = false) {
+  // Node's context ids should be the union of both its callee and caller edge
----------------
snehasish wrote:
> I don't see CheckEdges set to true in this diff. I think we should remove it altogether or make it always check by default if we are calling verify. What do you think?
It's used in follow on patches, when we are checking nodes in isolation, i.e. not checking the whole graph as we do here via check()/checkNodesRecursively() (the latter ensures we check each edge exactly once). I'll remove from this patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140908/new/

https://reviews.llvm.org/D140908