[llvm] [MemProf] Fix clone edge comparison (PR #113753)

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 25 21:53:38 PDT 2024


https://github.com/teresajohnson created https://github.com/llvm/llvm-project/pull/113753

The issue fixed in PR113337 exposed a bug in the comparisons done in
allocTypesMatch, which compares a vector of alloc types to those in the
given vector of Edges. The form of std::equal used, which didn't provide
the end iterator for the Edges vector, will iterate through as many
entries in the Edges vector as in the InAllocTypes vector, which can
fail if there are fewer entries in the Edges vector, because we may
dereference a bogus Edge pointer. This function is called twice, once
for the Node, with its callee edges, in which case the number of edges
should always match the number of entries in allocTypesMatch, which is
computed from the Node's callee edges. It was also called for Node's
clones, and it turns out that after cloning and edge modifications done
for other allocations, the number of callee edges in Node and its clones
may no longer match. In some cases, more common with memprof ICP before
the PR113337, the number of clone edges can be smaller leading to a bad
dereference. I found for a large application even before adding memprof
ICP support we sometimes call this with fewer entries in the clone's
callee edges, but were getting lucky as they had allocation type None,
and we didn't end up attempting to dereference the bad edge pointer.

Fix this by passing Edges.end() to std::equal, which means std::equal
will fail if the number of entries in the 2 vectors are not equal.
However, this is too conservative, as clone edges may have been added or
removed since it was initially cloned, and in fact can be wrong as we
may not be comparing allocation types corresponding to the same callee.

Therefore, a couple of enhancements are made to avoid regressing and
improve the checking and cloning:
- Don't bother calling the alloc type comparison when the clone and the
  Node's alloc type for the current allocation are precise (have a
  single allocation type) and are the same (which is guaranteed by an
  earlier check, and an assert is added to confirm that). In that case
  we can trivially determine that the clone can be used.
- Split the alloc type matching handling into a separate function for
  the clone case. In that case, for each of the InAllocType entries,
  attempt to find and compare to the clone callee edge with the same
  callee as the corresponding original node callee.

To create a test case I needed to take a spec application (xalancbmk),
and repeatedly apply random hot/cold-ness to the memprof contexts
when building, until I hit the problematic case. I then reduced that
full LTO IR using llvm-reduce and then manually.


>From 373a29043be36d4249b9d95a3ae816b9477b58f2 Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Fri, 25 Oct 2024 10:30:57 -0700
Subject: [PATCH] [MemProf] Fix clone edge comparison

The issue fixed in PR113337 exposed a bug in the comparisons done in
allocTypesMatch, which compares a vector of alloc types to those in the
given vector of Edges. The form of std::equal used, which didn't provide
the end iterator for the Edges vector, will iterate through as many
entries in the Edges vector as in the InAllocTypes vector, which can
fail if there are fewer entries in the Edges vector, because we may
dereference a bogus Edge pointer. This function is called twice, once
for the Node, with its callee edges, in which case the number of edges
should always match the number of entries in allocTypesMatch, which is
computed from the Node's callee edges. It was also called for Node's
clones, and it turns out that after cloning and edge modifications done
for other allocations, the number of callee edges in Node and its clones
may no longer match. In some cases, more common with memprof ICP before
the PR113337, the number of clone edges can be smaller leading to a bad
dereference. I found for a large application even before adding memprof
ICP support we sometimes call this with fewer entries in the clone's
callee edges, but were getting lucky as they had allocation type None,
and we didn't end up attempting to dereference the bad edge pointer.

Fix this by passing Edges.end() to std::equal, which means std::equal
will fail if the number of entries in the 2 vectors are not equal.
However, this is too conservative, as clone edges may have been added or
removed since it was initially cloned, and in fact can be wrong as we
may not be comparing allocation types corresponding to the same callee.

Therefore, a couple of enhancements are made to avoid regressing and
improve the checking and cloning:
- Don't bother calling the alloc type comparison when the clone and the
  Node's alloc type for the current allocation are precise (have a
  single allocation type) and are the same (which is guaranteed by an
  earlier check, and an assert is added to confirm that). In that case
  we can trivially determine that the clone can be used.
- Split the alloc type matching handling into a separate function for
  the clone case. In that case, for each of the InAllocType entries,
  attempt to find and compare to the clone callee edge with the same
  callee as the corresponding original node callee.

To create a test case I needed to take a spec application (xalancbmk),
and repeatedly apply random hot/cold-ness to the memprof contexts
when building, until I hit the problematic case. I then reduced that
full LTO IR using llvm-reduce and then manually.
---
 .../IPO/MemProfContextDisambiguation.cpp      | 66 +++++++++++--
 .../fix_clone_checking.ll                     | 99 +++++++++++++++++++
 2 files changed, 159 insertions(+), 6 deletions(-)
 create mode 100644 llvm/test/Transforms/MemProfContextDisambiguation/fix_clone_checking.ll

diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index 905186edcbecc4..cb8fb5eac60861 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -928,8 +928,11 @@ bool allocTypesMatch(
     const std::vector<uint8_t> &InAllocTypes,
     const std::vector<std::shared_ptr<ContextEdge<DerivedCCG, FuncTy, CallTy>>>
         &Edges) {
+  // This should be called only when the InAllocTypes vector was computed for
+  // this set of Edges. Make sure the sizes are the same.
+  assert(InAllocTypes.size() == Edges.size());
   return std::equal(
-      InAllocTypes.begin(), InAllocTypes.end(), Edges.begin(),
+      InAllocTypes.begin(), InAllocTypes.end(), Edges.begin(), Edges.end(),
       [](const uint8_t &l,
          const std::shared_ptr<ContextEdge<DerivedCCG, FuncTy, CallTy>> &r) {
         // Can share if one of the edges is None type - don't
@@ -942,6 +945,46 @@ bool allocTypesMatch(
       });
 }
 
+// Helper to check if the alloc types for all edges recorded in the
+// InAllocTypes vector match the alloc types for callee edges in the given
+// Clone. Because the InAllocTypes were computed from the original node's callee
+// edges, and other cloning could have happened after this clone was created, we
+// need to find the matching clone callee edge, which may or may not exist.
+template <typename DerivedCCG, typename FuncTy, typename CallTy>
+bool allocTypesMatchClone(
+    const std::vector<uint8_t> &InAllocTypes,
+    const ContextNode<DerivedCCG, FuncTy, CallTy> *Clone) {
+  const ContextNode<DerivedCCG, FuncTy, CallTy> *Node = Clone->CloneOf;
+  assert(Node);
+  // InAllocTypes should have been computed for the original node's callee
+  // edges.
+  assert(InAllocTypes.size() == Node->CalleeEdges.size());
+  // First create a map of the clone callee edge callees to the edge alloc type.
+  DenseMap<const ContextNode<DerivedCCG, FuncTy, CallTy> *, uint8_t>
+      EdgeCalleeMap;
+  for (const auto &E : Clone->CalleeEdges) {
+    assert(!EdgeCalleeMap.contains(E->Callee));
+    EdgeCalleeMap[E->Callee] = E->AllocTypes;
+  }
+  // Next, walk the original node's callees, and look for the corresponding
+  // clone edge to that callee.
+  for (unsigned I = 0; I < Node->CalleeEdges.size(); I++) {
+    auto Iter = EdgeCalleeMap.find(Node->CalleeEdges[I]->Callee);
+    // Not found is ok, we will simply add an edge if we use this clone.
+    if (Iter == EdgeCalleeMap.end())
+      continue;
+    // Can share if one of the edges is None type - don't
+    // care about the type along that edge as it doesn't
+    // exist for those context ids.
+    if (InAllocTypes[I] == (uint8_t)AllocationType::None ||
+        Iter->second == (uint8_t)AllocationType::None)
+      continue;
+    if (allocTypeToUse(Iter->second) != allocTypeToUse(InAllocTypes[I]))
+      return false;
+  }
+  return true;
+}
+
 } // end anonymous namespace
 
 template <typename DerivedCCG, typename FuncTy, typename CallTy>
@@ -3364,11 +3407,22 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::identifyClones(
           allocTypeToUse(CallerAllocTypeForAlloc))
         continue;
 
-      if (!allocTypesMatch<DerivedCCG, FuncTy, CallTy>(
-              CalleeEdgeAllocTypesForCallerEdge, CurClone->CalleeEdges))
-        continue;
-      Clone = CurClone;
-      break;
+      bool BothSingleAlloc = hasSingleAllocType(CurClone->AllocTypes) &&
+                             hasSingleAllocType(CallerAllocTypeForAlloc);
+      // The above check should mean that if both have single alloc types that
+      // they should be equal.
+      assert(!BothSingleAlloc ||
+             CurClone->AllocTypes == CallerAllocTypeForAlloc);
+
+      // If either both have a single alloc type (which are the same), or if the
+      // clone's callee edges have the same alloc types as those for the current
+      // allocation on Node's callee edges (CalleeEdgeAllocTypesForCallerEdge),
+      // then we can reuse this clone.
+      if (BothSingleAlloc || allocTypesMatchClone<DerivedCCG, FuncTy, CallTy>(
+                                 CalleeEdgeAllocTypesForCallerEdge, CurClone)) {
+        Clone = CurClone;
+        break;
+      }
     }
 
     // The edge iterator is adjusted when we move the CallerEdge to the clone.
diff --git a/llvm/test/Transforms/MemProfContextDisambiguation/fix_clone_checking.ll b/llvm/test/Transforms/MemProfContextDisambiguation/fix_clone_checking.ll
new file mode 100644
index 00000000000000..a1cb68138537b1
--- /dev/null
+++ b/llvm/test/Transforms/MemProfContextDisambiguation/fix_clone_checking.ll
@@ -0,0 +1,99 @@
+;; Test to make sure we don't fail when cloning in a case where we end up with
+;; a clone that has fewer edges than the node it was initially cloned from.
+;; This test was reduced and simplified from xalancbmk with some random hotness
+;; applied to the profile that reproduced the issue.
+
+; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new \
+; RUNL		-memprof-verify-ccg -memprof-verify-nodes \
+; RUN: 		-pass-remarks=memprof-context-disambiguation %s -S | FileCheck %s
+
+;; Make sure we created some clones
+; CHECK: created clone A.memprof.1
+; CHECK: created clone C.memprof.1
+; CHECK: created clone D.memprof.1
+; CHECK: created clone E.memprof.1
+; CHECK: created clone B.memprof.1
+; CHECK: created clone F.memprof.1
+; CHECK: created clone G.memprof.1
+
+; ModuleID = '<stdin>'
+source_filename = "reduced.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-grtev4-linux-gnu"
+
+define void @A() {
+  call void @B(), !callsite !0
+  ret void
+}
+
+define void @C() {
+  call void @D(), !callsite !1
+  ret void
+}
+
+define void @D() {
+  call void @A(), !callsite !2
+  ret void
+}
+
+define void @E() {
+  %1 = call ptr @_Znwm(i64 0), !memprof !3, !callsite !20
+  ret void
+}
+
+define void @B() {
+  call void @F(), !callsite !21
+  ret void
+}
+
+define void @F() {
+  call void @E(), !callsite !22
+  call void @G(), !callsite !23
+  ret void
+}
+
+define void @G() {
+  %1 = call ptr @_Znwm(i64 0), !memprof !24, !callsite !37
+  ret void
+}
+
+declare ptr @_Znwm(i64)
+
+!0 = !{i64 1995602625719775354}
+!1 = !{i64 4312698517630782220}
+!2 = !{i64 5516454029445989383}
+!3 = !{!4, !6, !8, !10, !12, !14, !16, !18}
+!4 = !{!5, !"notcold"}
+!5 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 1995602625719775354}
+!6 = !{!7, !"cold"}
+!7 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 2077908580042347045, i64 4312698517630782220, i64 5379466077518675850}
+!8 = !{!9, !"cold"}
+!9 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 2077908580042347045, i64 4312698517630782220, i64 -7632894069000375689}
+!10 = !{!11, !"cold"}
+!11 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 2939944783060497247}
+!12 = !{!13, !"notcold"}
+!13 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 5642549674080861567, i64 5516454029445989383, i64 4312698517630782220, i64 -7632894069000375689}
+!14 = !{!15, !"cold"}
+!15 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 5642549674080861567, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
+!16 = !{!17, !"notcold"}
+!17 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 -4746997736434041076, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
+!18 = !{!19, !"notcold"}
+!19 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862, i64 7147584705143805656, i64 -6456074186195384663, i64 -4637272929643682959}
+!20 = !{i64 -2282665745786859978, i64 -6758300505622211768, i64 2938340307832638862}
+!21 = !{i64 -6456074186195384663}
+!22 = !{i64 7147584705143805656}
+!23 = !{i64 3938822378769440754}
+!24 = !{!25, !27, !29, !31, !33, !35}
+!25 = !{!26, !"cold"}
+!26 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 1995602625719775354, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
+!27 = !{!28, !"notcold"}
+!28 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 2077908580042347045, i64 4312698517630782220, i64 -7632894069000375689}
+!29 = !{!30, !"cold"}
+!30 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4746997736434041076, i64 5516454029445989383, i64 4312698517630782220, i64 -7632894069000375689}
+!31 = !{!32, !"notcold"}
+!32 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4746997736434041076, i64 5516454029445989383, i64 4312698517630782220, i64 -1805555115991223293}
+!33 = !{!34, !"cold"}
+!34 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4637272929643682959}
+!35 = !{!36, !"notcold"}
+!36 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196, i64 3938822378769440754, i64 -6456074186195384663, i64 -4409412896859835674}
+!37 = !{i64 -2282665745786859978, i64 -3548918226713766361, i64 4077289288013931196}



More information about the llvm-commits mailing list