[PATCH] D141077: [MemProf] Context disambiguation cloning pass [patch 3/4]

Tue May 2 11:48:03 PDT 2023

snehasish accepted this revision.
snehasish added a comment.
This revision is now accepted and ready to land.

lgtm, remaining comments are (mostly) cosmetic.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2283
+         (Call.call().is<AllocInfo *>()
+              ? Call.call().dyn_cast<AllocInfo *>()->Versions.size()
+              : Call.call().dyn_cast<CallsiteInfo *>()->Clones.size()));
----------------
Could this and the one below just be a static cast since we've already checked the type with "is"? 

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2294
+    assert(Inst.cloneNo() == 0);
+    if (auto *AI = Inst.call().dyn_cast<AllocInfo *>()) {
+      assert(AI->Versions.size() == CloneNo);
----------------
I thought that the usage of dyn_cast required the addition of enum kind to the struct [1]?

https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/ModuleSummaryIndex.h#L380

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2329
+//              Continue
+//           Assign any calling the cloned version to new clone
+//
----------------
Missing a word here?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2400
+      // find additional cloning is required.
+      std::list<ContextNode *> ClonesWorklist;
+      // Ignore original Node if we moved all of its contexts to clones.
----------------
Prefer std::deque which has the same interface with better performance due to spatial locality of elements (I believe it uses small fixed size preallocated arrays internally).

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2484
+                AllocationCallToContextNodeMap.count(Call));
+            for (auto CE : Clone->CallerEdges) {
+              // Ignore any caller that does not have a recorded callsite Call.
----------------
nit: auto* here and a few uses below where the type is a pointer.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2751
+    auto CalleeFunc = CallsiteToCalleeFuncCloneMap[Node];
+    updateCall(Node->Call, CalleeFunc);
+  };
----------------
Should we enforce a tail call to avoid accidental memory usage increase? Is this a concern for the size of inputs we are working with?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2755
+  DenseSet<const ContextNode *> Visited;
+  for (auto &Entry : AllocationCallToContextNodeMap)
+    UpdateCalls(Entry.second, Visited, UpdateCalls);
----------------
Add a comment for the work done in this DFS?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2309
+template <typename DerivedCCG, typename FuncTy, typename CallTy>
+bool CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::assignFunctions() {
+  bool Changed = false;
----------------
tejohnson wrote:
> snehasish wrote:
> > Thanks for the detailed inline comments. Can we summarize the heuristic as pseudo-code in a function comment above the implementation too? 
> I wasn't sure how much detail you wanted here. PTAL and let me know what I added is helpful.
> I wasn't sure how much detail you wanted here. PTAL and let me know what I added is helpful.

Looks great!

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2385
+          // If this is the first callsite copy, assign to original function.
+          if (I == 0) {
+            // Since FuncClonesToCallMap is empty in this case, no clones have
----------------
tejohnson wrote:
> snehasish wrote:
> > Can we hoist the contents of this if condition outside the loop? Then we can start the iteration at index 1 and remove this check.
> No, because we don't always get here for I==0. We would only get here for the I==0 case if the FuncClonesToCallMap is empty, i.e. this callsite's function has not been cloned previously. The function might have already been cloned if we previously processed a different callsite from the same function, in which case we will handle this clone like any other clone in the later code that assigns it to an existing function version (which will be the original copy in practice, but we utilize the same code to do this as for all other clones).
> No, because we don't always get here for I==0. We would only get here for the I==0 case if the FuncClonesToCallMap is empty, i.e. this callsite's function has not been cloned previously. The function might have already been cloned if we previously processed a different callsite from the same function, in which case we will handle this clone like any other clone in the later code that assigns it to an existing function version (which will be the original copy in practice, but we utilize the same code to do this as for all other clones).

I assumed we could add `&& FuncClonesToCallMap.empty()` to the (new) condition outside the loop. If I'm still missing something then I think we don't have enough test coverage since I was able to move the contents outside (in the prior version of the patch) and all tests pass. 

Not a strong opinion since the worklist changes make it a easier to follow.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141077/new/

https://reviews.llvm.org/D141077