[llvm] [MemProf] Merge callee clones as needed before assigning functions (PR #135702)

via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 14 16:56:41 PDT 2025


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

<details>
<summary>Changes</summary>

We perform cloning for each allocation node separately. However, this
sometimes results in a situation where the same node calls multiple
clones of the same callee, created for different allocations. This
causes issues when assigning functions to these clones, as each node can
in reality only call a single callee clone.

To address this, before assigning functions, merge callee clone nodes as
needed using a post order traversal from the allocations. We attempt to
use existing clones as the merge node when legal, and to share them
among callers with the same properties (callers calling the same set of
callee clone nodes for the same allocations).

Without this fix, in some cases incorrect function assignment will lead
to calling the wrong allocation clone. In fact, this showed up in an
existing test, that I didn't notice as it existed to test earlier parts
of the cloning process.


---

Patch is 58.87 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/135702.diff


4 Files Affected:

- (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+326) 
- (added) llvm/test/Transforms/MemProfContextDisambiguation/mergenodes.ll (+404) 
- (added) llvm/test/Transforms/MemProfContextDisambiguation/mergenodes2.ll (+474) 
- (modified) llvm/test/Transforms/MemProfContextDisambiguation/overlapping-contexts.ll (+17-12) 


``````````diff
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index f5ae204426170..e3dfe6f6a202b 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -90,6 +90,10 @@ STATISTIC(FoundProfiledCalleeMaxDepth,
 STATISTIC(FoundProfiledCalleeNonUniquelyCount,
           "Number of profiled callees found via multiple tail call chains");
 STATISTIC(DeferredBackedges, "Number of backedges with deferred cloning");
+STATISTIC(NewMergedNodes, "Number of new nodes created during merging");
+STATISTIC(NonNewMergedNodes, "Number of not new nodes created during merging");
+STATISTIC(MissingAllocForContextId,
+          "Number of missing alloc nodes for context ids");
 
 static cl::opt<std::string> DotFilePathPrefix(
     "memprof-dot-file-path-prefix", cl::init(""), cl::Hidden,
@@ -160,6 +164,13 @@ static cl::opt<bool> CloneRecursiveContexts(
     "memprof-clone-recursive-contexts", cl::init(true), cl::Hidden,
     cl::desc("Allow cloning of contexts through recursive cycles"));
 
+// Generally this is needed for correct assignment of allocation clones to
+// function clones, however, allow it to be disabled for debugging while the
+// functionality is new and being tested more widely.
+static cl::opt<bool>
+    MergeClones("memprof-merge-clones", cl::init(true), cl::Hidden,
+                cl::desc("Merge clones before assigning functions"));
+
 // When disabled, try to detect and prevent cloning of recursive contexts.
 // This is only necessary until we support cloning through recursive cycles.
 // Leave on by default for now, as disabling requires a little bit of compile
@@ -560,6 +571,10 @@ class CallsiteContextGraph {
   /// Mark backedges via the standard DFS based backedge algorithm.
   void markBackedges();
 
+  /// Merge clones generated during cloning for different allocations but that
+  /// are called by the same caller node, to ensure proper function assignment.
+  void mergeClones();
+
   // Try to partition calls on the given node (already placed into the AllCalls
   // array) by callee function, creating new copies of Node as needed to hold
   // calls with different callees, and moving the callee edges appropriately.
@@ -778,6 +793,15 @@ class CallsiteContextGraph {
   void markBackedges(ContextNode *Node, DenseSet<const ContextNode *> &Visited,
                      DenseSet<const ContextNode *> &CurrentStack);
 
+  /// Recursive helper for merging clones.
+  void
+  mergeClones(ContextNode *Node, DenseSet<const ContextNode *> &Visited,
+              DenseMap<uint32_t, ContextNode *> &ContextIdToAllocationNode);
+  /// Main worker for merging callee clones for a given node.
+  void mergeNodeCalleeClones(
+      ContextNode *Node, DenseSet<const ContextNode *> &Visited,
+      DenseMap<uint32_t, ContextNode *> &ContextIdToAllocationNode);
+
   /// Recursively perform cloning on the graph for the given Node and its
   /// callers, in order to uniquely identify the allocation behavior of an
   /// allocation given its context. The context ids of the allocation being
@@ -4016,6 +4040,306 @@ IndexCallsiteContextGraph::cloneFunctionForCallsite(
   return {Func.func(), CloneNo};
 }
 
+// We perform cloning for each allocation node separately. However, this
+// sometimes results in a situation where the same node calls multiple
+// clones of the same callee, created for different allocations. This
+// causes issues when assigning functions to these clones, as each node can
+// in reality only call a single callee clone.
+//
+// To address this, before assigning functions, merge callee clone nodes as
+// needed using a post order traversal from the allocations. We attempt to
+// use existing clones as the merge node when legal, and to share them
+// among callers with the same properties (callers calling the same set of
+// callee clone nodes for the same allocations).
+//
+// Without this fix, in some cases incorrect function assignment will lead
+// to calling the wrong allocation clone.
+template <typename DerivedCCG, typename FuncTy, typename CallTy>
+void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::mergeClones() {
+  if (!MergeClones)
+    return;
+
+  // Generate a map from context id to the associated allocation node for use
+  // when merging clones.
+  DenseMap<uint32_t, ContextNode *> ContextIdToAllocationNode;
+  for (auto &Entry : AllocationCallToContextNodeMap) {
+    auto *Node = Entry.second;
+    for (auto Id : Node->getContextIds())
+      ContextIdToAllocationNode[Id] = Node->getOrigNode();
+    for (auto *Clone : Node->Clones) {
+      for (auto Id : Clone->getContextIds())
+        ContextIdToAllocationNode[Id] = Clone->getOrigNode();
+    }
+  }
+
+  // Post order traversal starting from allocations to ensure each callsite
+  // calls a single clone of its callee. Callee nodes that are clones of each
+  // other are merged (via new merge nodes if needed) to achieve this.
+  DenseSet<const ContextNode *> Visited;
+  for (auto &Entry : AllocationCallToContextNodeMap) {
+    auto *Node = Entry.second;
+
+    mergeClones(Node, Visited, ContextIdToAllocationNode);
+
+    // Make a copy so the recursive post order traversal that may create new
+    // clones doesn't mess up iteration.
+    auto Clones = Node->Clones;
+    for (auto *Clone : Clones)
+      mergeClones(Clone, Visited, ContextIdToAllocationNode);
+  }
+
+  if (DumpCCG) {
+    dbgs() << "CCG after merging:\n";
+    dbgs() << *this;
+  }
+  if (ExportToDot)
+    exportToDot("aftermerge");
+
+  if (VerifyCCG) {
+    check();
+  }
+}
+
+// Recursive helper for above mergeClones method.
+template <typename DerivedCCG, typename FuncTy, typename CallTy>
+void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::mergeClones(
+    ContextNode *Node, DenseSet<const ContextNode *> &Visited,
+    DenseMap<uint32_t, ContextNode *> &ContextIdToAllocationNode) {
+  auto Inserted = Visited.insert(Node);
+  if (!Inserted.second)
+    return;
+
+  // Make a copy since the recursive call may move a caller edge to a new
+  // callee, messing up the iterator.
+  auto CallerEdges = Node->CallerEdges;
+  for (auto CallerEdge : CallerEdges) {
+    // Skip any caller edge moved onto a different callee during recursion.
+    if (CallerEdge->Callee != Node)
+      continue;
+    mergeClones(CallerEdge->Caller, Visited, ContextIdToAllocationNode);
+  }
+
+  // Merge for this node after we handle its callers.
+  mergeNodeCalleeClones(Node, Visited, ContextIdToAllocationNode);
+}
+
+template <typename DerivedCCG, typename FuncTy, typename CallTy>
+void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::mergeNodeCalleeClones(
+    ContextNode *Node, DenseSet<const ContextNode *> &Visited,
+    DenseMap<uint32_t, ContextNode *> &ContextIdToAllocationNode) {
+  // Ignore Node if we moved all of its contexts to clones.
+  if (Node->emptyContextIds())
+    return;
+
+  // First identify groups of clones among Node's callee edges, by building
+  // a map from each callee base node to the associated callee edges from Node.
+  MapVector<ContextNode *, std::vector<std::shared_ptr<ContextEdge>>>
+      OrigNodeToCloneEdges;
+  for (const auto &E : Node->CalleeEdges) {
+    auto *Callee = E->Callee;
+    if (!Callee->CloneOf && Callee->Clones.empty())
+      continue;
+    ContextNode *Base = Callee->getOrigNode();
+    OrigNodeToCloneEdges[Base].push_back(E);
+  }
+
+  // Process each set of callee clones called by Node, performing the needed
+  // merging.
+  for (auto Entry : OrigNodeToCloneEdges) {
+    // CalleeEdges is the set of edges from Node reaching callees that are
+    // mutual clones of each other.
+    auto CalleeEdges = Entry.second;
+    auto NumCalleeClones = CalleeEdges.size();
+    // A single edge means there is no merging needed.
+    if (NumCalleeClones == 1)
+      continue;
+    // Sort the CalleeEdges calling this group of clones in ascending order of
+    // their caller edge counts, putting the original non-clone node first in
+    // cases of a tie. This simplifies finding an existing node to use as the
+    // merge node.
+    std::stable_sort(CalleeEdges.begin(), CalleeEdges.end(),
+                     [&](const std::shared_ptr<ContextEdge> &A,
+                         const std::shared_ptr<ContextEdge> &B) {
+                       if (A->Callee->CallerEdges.size() !=
+                           B->Callee->CallerEdges.size())
+                         return A->Callee->CallerEdges.size() <
+                                B->Callee->CallerEdges.size();
+                       if (A->Callee->CloneOf && !B->Callee->CloneOf)
+                         return true;
+                       else if (!A->Callee->CloneOf && B->Callee->CloneOf)
+                         return false;
+                       // Use the first context id for each edge as a
+                       // tie-breaker.
+                       return *A->ContextIds.begin() < *B->ContextIds.begin();
+                     });
+
+    // Next look for other nodes that have edges to the same set of callee
+    // clones as the current Node. Those can share the eventual merge node
+    // (reducing cloning and binary size overhead) iff:
+    // - they have edges to the same set of callee clones
+    // - each callee edge reaches a subset of the same allocations as Node's
+    //   corresponding edge to the same callee clone.
+    // The second requirement is to ensure that we don't undo any of the
+    // necessary cloning to distinguish contexts with different allocation
+    // behavior.
+    // FIXME: This is somewhat conservative, as we really just need to ensure
+    // that they don't reach the same allocations as contexts on edges from Node
+    // going to any of the *other* callee clones being merged. However, that
+    // requires more tracking and checking to get right.
+    //
+    // This map counts how many edges to the same callee clone exist for other
+    // caller nodes of each callee clone.
+    DenseMap<ContextNode *, unsigned> OtherCallersToSharedCalleeEdgeCount;
+    // Counts the number of other caller nodes that have edges to all callee
+    // clones that don't violate the allocation context checking.
+    unsigned PossibleOtherCallerNodes = 0;
+    // We only need to look at other Caller nodes if the first callee edge has
+    // multiple callers (recall they are sorted in ascending order above).
+    if (CalleeEdges[0]->Callee->CallerEdges.size() > 1) {
+      // For each callee edge:
+      // - Collect the count of other caller nodes calling the same calles.
+      // - Collect the alloc nodes reached by contexts on each callee edge.
+      DenseMap<ContextEdge *, DenseSet<ContextNode *>> CalleeEdgeToAllocNodes;
+      for (auto CalleeEdge : CalleeEdges) {
+        assert(CalleeEdge->Callee->CallerEdges.size() > 1);
+        // For each other caller of the same callee, increment the count of
+        // edges reaching the same callee clone.
+        for (auto CalleeCallerEdges : CalleeEdge->Callee->CallerEdges) {
+          if (CalleeCallerEdges->Caller == Node) {
+            assert(CalleeCallerEdges == CalleeEdge);
+            continue;
+          }
+          OtherCallersToSharedCalleeEdgeCount[CalleeCallerEdges->Caller]++;
+          // If this caller edge now reaches all of the same callee clones,
+          // increment the count of candidate other caller nodes.
+          if (OtherCallersToSharedCalleeEdgeCount[CalleeCallerEdges->Caller] ==
+              NumCalleeClones)
+            PossibleOtherCallerNodes++;
+        }
+        // Collect the alloc nodes reached by contexts on each callee edge, for
+        // later analysis.
+        for (auto Id : CalleeEdge->getContextIds()) {
+          auto *Alloc = ContextIdToAllocationNode.lookup(Id);
+          if (!Alloc) {
+            // FIXME: unclear why this happens occasionally, presumably
+            // imperfect graph updates possibly with recursion.
+            MissingAllocForContextId++;
+            continue;
+          }
+          CalleeEdgeToAllocNodes[CalleeEdge.get()].insert(Alloc);
+        }
+      }
+      // Now walk the callee edges again, and make sure that for each candidate
+      // caller node all of its edges to the callees reach the same allocs (or
+      // a subset) as those along the corresponding callee edge from Node.
+      for (auto CalleeEdge : CalleeEdges) {
+        assert(CalleeEdge->Callee->CallerEdges.size() > 1);
+        // Stop if we do not have any (more) candidate other caller nodes.
+        if (!PossibleOtherCallerNodes)
+          break;
+        auto &CurCalleeAllocNodes = CalleeEdgeToAllocNodes[CalleeEdge.get()];
+        // Check each other caller of this callee clone.
+        for (auto CalleeCallerE : CalleeEdge->Callee->CallerEdges) {
+          // Not interested in the callee edge from Node itself.
+          if (CalleeCallerE == CalleeEdge)
+            continue;
+          // Skip any callers that didn't have callee edges to all the same
+          // callee clones.
+          if (OtherCallersToSharedCalleeEdgeCount[CalleeCallerE->Caller] !=
+              NumCalleeClones)
+            continue;
+          // Make sure that each context along edge from candidate caller node
+          // reaches an allocation also reached by this callee edge from Node.
+          for (auto Id : CalleeCallerE->getContextIds()) {
+            auto *Alloc = ContextIdToAllocationNode.lookup(Id);
+            if (!Alloc)
+              continue;
+            // If not, simply reset the map entry to 0 so caller is ignored, and
+            // reduce the count of candidate other caller nodes.
+            if (!CurCalleeAllocNodes.contains(Alloc)) {
+              OtherCallersToSharedCalleeEdgeCount[CalleeCallerE->Caller] = 0;
+              PossibleOtherCallerNodes--;
+              break;
+            }
+          }
+        }
+      }
+    }
+
+    // Now do the actual merging. Identify existing or create a new MergeNode
+    // during the first iteration. Move each callee over, along with edges from
+    // other callers we've determined above can share the same merge node.
+    ContextNode *MergeNode = nullptr;
+    DenseMap<ContextNode *, unsigned> CallerToMoveCount;
+    for (auto CalleeEdge : CalleeEdges) {
+      auto *OrigCallee = CalleeEdge->Callee;
+      // If we don't have a MergeNode yet (only happens on the first iteration,
+      // as a new one will be created when we go to move the first callee edge
+      // over as needed), see if we can use this callee.
+      if (!MergeNode) {
+        // If there are no other callers, simply use this callee.
+        if (CalleeEdge->Callee->CallerEdges.size() == 1) {
+          MergeNode = OrigCallee;
+          NonNewMergedNodes++;
+          continue;
+        }
+        // Otherwise, if we have identified other caller nodes that can share
+        // the merge node with Node, see if all of OrigCallee's callers are
+        // going to share the same merge node. In that case we can use callee
+        // (since all of its callers would move to the new merge node).
+        if (PossibleOtherCallerNodes) {
+          bool MoveAllCallerEdges = true;
+          for (auto CalleeCallerE : OrigCallee->CallerEdges) {
+            if (CalleeCallerE == CalleeEdge)
+              continue;
+            if (OtherCallersToSharedCalleeEdgeCount[CalleeCallerE->Caller] !=
+                NumCalleeClones) {
+              MoveAllCallerEdges = false;
+              break;
+            }
+          }
+          // If we are going to move all callers over, we can use this callee as
+          // the MergeNode.
+          if (MoveAllCallerEdges) {
+            MergeNode = OrigCallee;
+            NonNewMergedNodes++;
+            continue;
+          }
+        }
+      }
+      // Move this callee edge, creating a new merge node if necessary.
+      if (MergeNode) {
+        assert(MergeNode != OrigCallee);
+        moveEdgeToExistingCalleeClone(CalleeEdge, MergeNode,
+                                      /*NewClone*/ false);
+      } else {
+        MergeNode = moveEdgeToNewCalleeClone(CalleeEdge);
+        NewMergedNodes++;
+      }
+      // Now move all identified edges from other callers over to the merge node
+      // as well.
+      if (PossibleOtherCallerNodes) {
+        // Make and iterate over a copy of OrigCallee's caller edges because
+        // some of these will be moved off of the OrigCallee and that would mess
+        // up the iteration from OrigCallee.
+        auto OrigCalleeCallerEdges = OrigCallee->CallerEdges;
+        for (auto CalleeCallerE : OrigCalleeCallerEdges) {
+          if (CalleeCallerE == CalleeEdge)
+            continue;
+          if (OtherCallersToSharedCalleeEdgeCount[CalleeCallerE->Caller] !=
+              NumCalleeClones)
+            continue;
+          CallerToMoveCount[CalleeCallerE->Caller]++;
+          moveEdgeToExistingCalleeClone(CalleeCallerE, MergeNode,
+                                        /*NewClone*/ false);
+        }
+      }
+      removeNoneTypeCalleeEdges(OrigCallee);
+      removeNoneTypeCalleeEdges(MergeNode);
+    }
+  }
+}
+
 // This method assigns cloned callsites to functions, cloning the functions as
 // needed. The assignment is greedy and proceeds roughly as follows:
 //
@@ -4051,6 +4375,8 @@ template <typename DerivedCCG, typename FuncTy, typename CallTy>
 bool CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::assignFunctions() {
   bool Changed = false;
 
+  mergeClones();
+
   // Keep track of the assignment of nodes (callsites) to function clones they
   // call.
   DenseMap<ContextNode *, FuncInfo> CallsiteToCalleeFuncCloneMap;
diff --git a/llvm/test/Transforms/MemProfContextDisambiguation/mergenodes.ll b/llvm/test/Transforms/MemProfContextDisambiguation/mergenodes.ll
new file mode 100644
index 0000000000000..990a4a4e4d064
--- /dev/null
+++ b/llvm/test/Transforms/MemProfContextDisambiguation/mergenodes.ll
@@ -0,0 +1,404 @@
+;; Test that correct clones are generated and reached when we need to
+;; re-merge clone nodes before function assignment.
+;;
+;; The code is similar to that of basic.ll, but with a second allocation.
+
+; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new \
+; RUN:	-memprof-verify-ccg -memprof-dump-ccg %s -S 2>&1 | FileCheck %s \
+; RUN:  --check-prefix=IR --check-prefix=DUMP
+
+;; Make sure the option to disable merging causes the expected regression.
+; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new \
+; RUN:  -memprof-merge-clones=false %s -S 2>&1 | FileCheck %s --check-prefix=NOMERGE
+;; main should incorrectly call the same clone of foo.
+; NOMERGE: define {{.*}} @main
+; NOMERGE-NEXT: entry:
+; NOMERGE-NEXT: call {{.*}} @_Z3foov.memprof.1()
+; NOMERGE-NEXT: call {{.*}} @_Z3foov.memprof.1()
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @main() {
+entry:
+  ;; Ultimately calls bar and allocates notcold memory from first call to new
+  ;; and cold memory from second call to new.
+  %call = call noundef ptr @_Z3foov(), !callsite !0
+  ;; Ultimately calls bar and allocates cold memory from first call to new
+  ;; and notcold memory from second call to new.
+  %call1 = call noundef ptr @_Z3foov(), !callsite !1
+  ret i32 0
+}
+
+define internal ptr @_Z3barv() {
+entry:
+  ;; notcold when called from first call to foo from main, cold when called from second.
+  %call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #0, !memprof !2, !callsite !7
+  ;; cold when called from first call to foo from main, notcold when called from second.
+  %call2 = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #0, !memprof !13, !callsite !18
+  ret ptr null
+}
+
+declare ptr @_Znam(i64)
+
+define internal ptr @_Z3bazv() {
+entry:
+  %call = call noundef ptr @_Z3barv(), !callsite !8
+  ret ptr null
+}
+
+; Function Attrs: noinline
+define internal ptr @_Z3foov() {
+entry:
+  %call = call noundef ptr @_Z3bazv(), !callsite !9
+  ret ptr null
+}
+
+attributes #0 = { builtin }
+
+!0 = !{i64 8632435727821051414}
+!1 = !{i64 -3421689549917153178}
+!2 = !{!3, !5}
+!3 = !{!4, !"notcold"}
+!4 = !{i64 9086428284934609951, i64 -5964873800580613432, i64 2732490490862098848, i64 8632435727821051414}
+!5 = !{!6, !"cold...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/135702


More information about the llvm-commits mailing list