[PATCH] D141077: [MemProf] Context disambiguation cloning pass [patch 3/4]

Wed May 3 11:45:28 PDT 2023

tejohnson marked 3 inline comments as done.
tejohnson added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2283
+         (Call.call().is<AllocInfo *>()
+              ? Call.call().dyn_cast<AllocInfo *>()->Versions.size()
+              : Call.call().dyn_cast<CallsiteInfo *>()->Clones.size()));
----------------
snehasish wrote:
> Could this and the one below just be a static cast since we've already checked the type with "is"? 
CallTy is an IndexCall struct which derived from PointerUnion, and PointerUnion supports a dyn_cast() method but not a static_cast one.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2294
+    assert(Inst.cloneNo() == 0);
+    if (auto *AI = Inst.call().dyn_cast<AllocInfo *>()) {
+      assert(AI->Versions.size() == CloneNo);
----------------
snehasish wrote:
> I thought that the usage of dyn_cast required the addition of enum kind to the struct [1]?
> 
> https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/ModuleSummaryIndex.h#L380
This is the dyn_cast() implemented by PointerUnion (see above comment about CallTy here).

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2484
+                AllocationCallToContextNodeMap.count(Call));
+            for (auto CE : Clone->CallerEdges) {
+              // Ignore any caller that does not have a recorded callsite Call.
----------------
snehasish wrote:
> nit: auto* here and a few uses below where the type is a pointer.
These are actually shared_ptr not raw pointer. Some can be auto & however, I will add that.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2751
+    auto CalleeFunc = CallsiteToCalleeFuncCloneMap[Node];
+    updateCall(Node->Call, CalleeFunc);
+  };
----------------
snehasish wrote:
> Should we enforce a tail call to avoid accidental memory usage increase? Is this a concern for the size of inputs we are working with?
I don't see any uses of the musttail attribute currently in LLVM, so I'm not sure if that is a good idea. I looked at the optimized code, and this is not being marked as a tail call (however interestingly the earlier call to updateAllocationCall is). But this isn't a recursive call, so I'm not sure it matters too much.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:2385
+          // If this is the first callsite copy, assign to original function.
+          if (I == 0) {
+            // Since FuncClonesToCallMap is empty in this case, no clones have
----------------
snehasish wrote:
> tejohnson wrote:
> > snehasish wrote:
> > > Can we hoist the contents of this if condition outside the loop? Then we can start the iteration at index 1 and remove this check.
> > No, because we don't always get here for I==0. We would only get here for the I==0 case if the FuncClonesToCallMap is empty, i.e. this callsite's function has not been cloned previously. The function might have already been cloned if we previously processed a different callsite from the same function, in which case we will handle this clone like any other clone in the later code that assigns it to an existing function version (which will be the original copy in practice, but we utilize the same code to do this as for all other clones).
> > No, because we don't always get here for I==0. We would only get here for the I==0 case if the FuncClonesToCallMap is empty, i.e. this callsite's function has not been cloned previously. The function might have already been cloned if we previously processed a different callsite from the same function, in which case we will handle this clone like any other clone in the later code that assigns it to an existing function version (which will be the original copy in practice, but we utilize the same code to do this as for all other clones).
> 
> I assumed we could add `&& FuncClonesToCallMap.empty()` to the (new) condition outside the loop. If I'm still missing something then I think we don't have enough test coverage since I was able to move the contents outside (in the prior version of the patch) and all tests pass. 
> 
> Not a strong opinion since the worklist changes make it a easier to follow.
> I assumed we could add && FuncClonesToCallMap.empty() to the (new) condition outside the loop. If I'm still missing something then I think we don't have enough test coverage since I was able to move the contents outside (in the prior version of the patch) and all tests pass.
> 
> Not a strong opinion since the worklist changes make it a easier to follow.

Oh I see what you meant. I think since the worklist approach simplified the logic, I'd prefer to keep all of the handling inside the worklist loop.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141077/new/

https://reviews.llvm.org/D141077