[llvm] [MemProf] Prune unneeded non-cold contexts (PR #124823)

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 29 07:25:05 PST 2025


https://github.com/teresajohnson updated https://github.com/llvm/llvm-project/pull/124823

>From 7e566125b21d9a8c290ed60b4cfac32afb7ebbb4 Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Mon, 27 Jan 2025 18:25:47 -0800
Subject: [PATCH 1/4] [MemProf] Prune unneeded non-cold contexts

We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
---
 .../include/llvm/Analysis/MemoryProfileInfo.h |  34 ++++-
 llvm/lib/Analysis/MemoryProfileInfo.cpp       |  58 ++++++--
 llvm/test/Transforms/PGOProfile/memprof.ll    |  19 +--
 .../memprof_match_hot_cold_new_calls.ll       |   4 +-
 .../Analysis/MemoryProfileInfoTest.cpp        | 127 +++++++++++++++---
 5 files changed, 196 insertions(+), 46 deletions(-)

diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index deb7ab134c1617..f75783a4fef50e 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -56,6 +56,37 @@ class CallStackTrie {
     // Allocation types for call context sharing the context prefix at this
     // node.
     uint8_t AllocTypes;
+    // Updated as we add allocations to note if this is the deepest point in the
+    // trie that has an ambiguous allocation type (both Cold and NotCold). It is
+    // used to prune unneeded NotCold contexts, taking advantage of the fact
+    // that we later will only clone Cold contexts, as NotCold is the allocation
+    // default. We only need to keep as metadata the NotCold contexts that
+    // overlap the longest with Cold allocations, so that we know how deeply we
+    // need to clone. For example, assume we add the following contexts to the
+    // trie:
+    //    1 3 (notcold)
+    //    1 2 4 (cold)
+    //    1 2 5 (notcold)
+    //    1 2 6 (notcold)
+    // the trie looks like:
+    //         1
+    //        / \
+    //       2   3
+    //      /|\
+    //     4 5 6
+    //
+    // It is sufficient to prune all but one not cold contexts (either 1,2,5 or
+    // 1,2,6, we arbitrarily keep the first one we encounter which will be
+    // 1,2,5). We'll initially have DeepestAmbiguousAllocType set false for trie
+    // node 1 after the trie is built, and true for node 2. This indicates that
+    // the not cold context ending in 3 is not needed (its immediate callee has
+    // this value set false). The first notcold context we encounter when
+    // iterating the callers of node 2 will be the context ending in 5 (since
+    // std::map iteration is in sorted order of key), which will see that this
+    // field is true for its callee, so we will keep it. But at that point we
+    // set the callee's flag to false which prevents adding the not cold context
+    // ending in 6 unnecessarily.
+    bool DeepestAmbiguousAllocType = true;
     // If the user has requested reporting of hinted sizes, keep track of the
     // associated full stack id and profiled sizes. Can have more than one
     // after trimming (e.g. when building from metadata). This is only placed on
@@ -103,7 +134,8 @@ class CallStackTrie {
   bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                      std::vector<uint64_t> &MIBCallStack,
                      std::vector<Metadata *> &MIBNodes,
-                     bool CalleeHasAmbiguousCallerContext);
+                     bool CalleeHasAmbiguousCallerContext,
+                     bool &CalleeDeepestAmbiguousAllocType);
 
 public:
   CallStackTrie() = default;
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 5553a2e2dd24ba..409748ec80fe79 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -51,6 +51,13 @@ cl::opt<bool> MemProfReportHintedSizes(
     "memprof-report-hinted-sizes", cl::init(false), cl::Hidden,
     cl::desc("Report total allocation sizes of hinted allocations"));
 
+// This is useful if we have enabled reporting of hinted sizes, and want to get
+// information from the indexing step for all contexts (especially for testing),
+// or have specified a value less than 100% for -memprof-cloning-cold-threshold.
+cl::opt<bool> MemProfKeepAllNotColdContexts(
+    "memprof-keep-all-not-cold-contexts", cl::init(false), cl::Hidden,
+    cl::desc("Disable pruning of non-cold contexts unneeded for cold cloning"));
+
 AllocationType llvm::memprof::getAllocType(uint64_t TotalLifetimeAccessDensity,
                                            uint64_t AllocCount,
                                            uint64_t TotalLifetime) {
@@ -156,10 +163,16 @@ void CallStackTrie::addCallStack(
       continue;
     }
     // Update existing caller node if it exists.
+    CallStackTrieNode *Prev = nullptr;
     auto Next = Curr->Callers.find(StackId);
     if (Next != Curr->Callers.end()) {
+      Prev = Curr;
       Curr = Next->second;
       Curr->addAllocType(AllocType);
+      // If this node has an ambiguous alloc type, its callee is not the deepest
+      // point where we have an ambigous allocation type.
+      if (!hasSingleAllocType(Curr->AllocTypes))
+        Prev->DeepestAmbiguousAllocType = false;
       continue;
     }
     // Otherwise add a new caller node.
@@ -243,14 +256,35 @@ void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
 bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                                   std::vector<uint64_t> &MIBCallStack,
                                   std::vector<Metadata *> &MIBNodes,
-                                  bool CalleeHasAmbiguousCallerContext) {
+                                  bool CalleeHasAmbiguousCallerContext,
+                                  bool &CalleeDeepestAmbiguousAllocType) {
   // Trim context below the first node in a prefix with a single alloc type.
   // Add an MIB record for the current call stack prefix.
   if (hasSingleAllocType(Node->AllocTypes)) {
-    std::vector<ContextTotalSize> ContextSizeInfo;
-    collectContextSizeInfo(Node, ContextSizeInfo);
-    MIBNodes.push_back(createMIBNode(
-        Ctx, MIBCallStack, (AllocationType)Node->AllocTypes, ContextSizeInfo));
+    // Because we only clone cold contexts (we don't clone for exposing NotCold
+    // contexts as that is the default allocation behavior), we create MIB
+    // metadata for this context iff:
+    // 1) It is cold.
+    // 2) The immediate callee is the deepest point where we have an ambiguous
+    //    allocation type (i.e. the other callers that are cold need to know
+    //    that we have a not cold context overlapping to this point so that we
+    //    know how deep to clone).
+    // 3) MemProfKeepAllNotColdContexts is enabled, which is useful if we are
+    //    reporting hinted sizes, and want to get information from the indexing
+    //    step for all contexts, or have specified a value less than 100% for
+    //    -memprof-cloning-cold-threshold.
+    if ((AllocationType)Node->AllocTypes == AllocationType::Cold ||
+        CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
+      std::vector<ContextTotalSize> ContextSizeInfo;
+      collectContextSizeInfo(Node, ContextSizeInfo);
+      MIBNodes.push_back(createMIBNode(Ctx, MIBCallStack,
+                                       (AllocationType)Node->AllocTypes,
+                                       ContextSizeInfo));
+      // If we just emitted an MIB for a not cold caller, don't need to emit
+      // another one for the callee to correctly disambiguate its cold callers.
+      if ((AllocationType)Node->AllocTypes != AllocationType::Cold)
+        CalleeDeepestAmbiguousAllocType = false;
+    }
     return true;
   }
 
@@ -261,9 +295,9 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
     bool AddedMIBNodesForAllCallerContexts = true;
     for (auto &Caller : Node->Callers) {
       MIBCallStack.push_back(Caller.first);
-      AddedMIBNodesForAllCallerContexts &=
-          buildMIBNodes(Caller.second, Ctx, MIBCallStack, MIBNodes,
-                        NodeHasAmbiguousCallerContext);
+      AddedMIBNodesForAllCallerContexts &= buildMIBNodes(
+          Caller.second, Ctx, MIBCallStack, MIBNodes,
+          NodeHasAmbiguousCallerContext, Node->DeepestAmbiguousAllocType);
       // Remove Caller.
       MIBCallStack.pop_back();
     }
@@ -338,9 +372,11 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
   std::vector<Metadata *> MIBNodes;
   assert(!Alloc->Callers.empty() && "addCallStack has not been called yet");
   // The last parameter is meant to say whether the callee of the given node
-  // has more than one caller. Here the node being passed in is the alloc
-  // and it has no callees. So it's false.
-  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false)) {
+  // has is the deepest point where we have ambiguous alloc types. Here the
+  // node being passed in is the alloc and it has no callees. So it's false.
+  bool DeepestAmbiguousAllocType = true;
+  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false,
+                    DeepestAmbiguousAllocType)) {
     assert(MIBCallStack.size() == 1 &&
            "Should only be left with Alloc's location in stack");
     CI->setMetadata(LLVMContext::MD_memprof, MDNode::get(Ctx, MIBNodes));
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index acf70880becd19..f0421ba60cffca 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -64,14 +64,14 @@
 ; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO
 
 ;; Check that the total sizes are reported if requested.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES
 
 ;; Check that we hint additional allocations with a threshold < 100%
 ; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -memprof-matching-cold-threshold=60 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZESTHRESH60
 
 ;; Make sure that the -memprof-cloning-cold-threshold flag is enough to cause
 ;; the size metadata to be generated for the LTO link.
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-cloning-cold-threshold=80 -memprof-keep-all-not-cold-contexts 2>&1 | FileCheck %s --check-prefixes=TOTALSIZES
 
 ;; Make sure we emit a random hotness seed if requested.
 ; RUN: llvm-profdata merge -memprof-random-hotness %S/Inputs/memprof.memprofraw --profiled-binary %S/Inputs/memprof.exe -o %t.memprofdatarand 2>&1 | FileCheck %s --check-prefix=RAND
@@ -339,7 +339,7 @@ for.end:                                          ; preds = %for.cond
 
 ; MEMPROF: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
 ; MEMPROF: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
-; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
 ; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
@@ -348,8 +348,6 @@ for.end:                                          ; preds = %for.cond
 ; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
 ; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
-; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
-; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
 ; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
 ; MEMPROF: ![[C2]] = !{i64 8467819354083268568}
 ; MEMPROF: ![[C3]] = !{i64 9086428284934609951}
@@ -390,17 +388,15 @@ for.end:                                          ; preds = %for.cond
 
 ; MEMPROFNOCOLINFO: #[[A1]] = { builtin allocsize(0) "memprof"="notcold" }
 ; MEMPROFNOCOLINFO: #[[A2]] = { builtin allocsize(0) "memprof"="cold" }
-; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROFNOCOLINFO: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROFNOCOLINFO: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROFNOCOLINFO: ![[STACK1]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 3577763375057267810}
 ; MEMPROFNOCOLINFO: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"notcold"}
 ; MEMPROFNOCOLINFO: ![[STACK2]] = !{i64 5281664982037379640, i64 6362220161075421157, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790, i64 -5772587307814069790}
-; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"notcold"}
-; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6896091699916449732}
+; MEMPROFNOCOLINFO: ![[MIB3]] = !{![[STACK3:[0-9]+]], !"cold"}
+; MEMPROFNOCOLINFO: ![[STACK3]] = !{i64 5281664982037379640, i64 -6871734214936418908}
 ; MEMPROFNOCOLINFO: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
-; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6871734214936418908}
-; MEMPROFNOCOLINFO: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"cold"}
-; MEMPROFNOCOLINFO: ![[STACK5]] = !{i64 5281664982037379640, i64 -6201180255894224618}
+; MEMPROFNOCOLINFO: ![[STACK4]] = !{i64 5281664982037379640, i64 -6201180255894224618}
 ; MEMPROFNOCOLINFO: ![[C1]] = !{i64 5281664982037379640}
 ; MEMPROFNOCOLINFO: ![[C2]] = !{i64 -6871734214936418908}
 ; MEMPROFNOCOLINFO: ![[C3]] = !{i64 -5588766871448036195}
@@ -420,7 +416,6 @@ for.end:                                          ; preds = %for.cond
 ; MEMPROFRAND2: !"cold"
 ; MEMPROFRAND2: !"cold"
 ; MEMPROFRAND2: !"notcold"
-; MEMPROFRAND2: !"notcold"
 
 ; MEMPROFSTATS:  8 memprof - Number of alloc contexts in memory profile.
 ; MEMPROFSTATS: 10 memprof - Number of callsites in memory profile.
diff --git a/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll b/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
index 83753ed6376d3f..4aa05116226623 100644
--- a/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof_match_hot_cold_new_calls.ll
@@ -105,7 +105,7 @@ entry:
 
 declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 noundef, i64 noundef, ptr noundef nonnull align 1 dereferenceable(1), i8 noundef zeroext)
 
-; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]], ![[MIB5:[0-9]+]]}
+; MEMPROF: ![[M1]] = !{![[MIB1:[0-9]+]], ![[MIB2:[0-9]+]], ![[MIB3:[0-9]+]], ![[MIB4:[0-9]+]]}
 ; MEMPROF: ![[MIB1]] = !{![[STACK1:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK1]] = !{i64 2732490490862098848, i64 748269490701775343}
 ; MEMPROF: ![[MIB2]] = !{![[STACK2:[0-9]+]], !"cold"}
@@ -114,8 +114,6 @@ declare noundef ptr @_ZnwmSt11align_val_tRKSt9nothrow_t12__hot_cold_t(i64 nounde
 ; MEMPROF: ![[STACK3]] = !{i64 2732490490862098848, i64 2104812325165620841, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934, i64 6281715513834610934}
 ; MEMPROF: ![[MIB4]] = !{![[STACK4:[0-9]+]], !"cold"}
 ; MEMPROF: ![[STACK4]] = !{i64 2732490490862098848, i64 8467819354083268568}
-; MEMPROF: ![[MIB5]] = !{![[STACK5:[0-9]+]], !"notcold"}
-; MEMPROF: ![[STACK5]] = !{i64 2732490490862098848, i64 8690657650969109624}
 ; MEMPROF: ![[C1]] = !{i64 2732490490862098848}
 
 !llvm.dbg.cu = !{!0}
diff --git a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
index b4e81e69116e8a..17dbafff8ee940 100644
--- a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
+++ b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
@@ -15,6 +15,7 @@
 #include "llvm/IR/ModuleSummaryIndex.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/SourceMgr.h"
+#include "gmock/gmock.h"
 #include "gtest/gtest.h"
 #include <cstring>
 #include <sys/types.h>
@@ -220,6 +221,48 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_EQ(Call4->getFnAttr("memprof").getValueAsString(), "notcold");
 }
 
+// TODO: Use this matcher in existing tests.
+// ExpectedVals should be a vector of expected MIBs and their allocation type
+// and stack id contents in order, of type:
+//  std::vector<std::pair<AllocationType, std::vector<unsigned>>>
+MATCHER_P(MemprofMetadataEquals, ExpectedVals, "Matching !memprof contents") {
+  auto PrintAndFail = [&]() {
+    std::string Buffer;
+    llvm::raw_string_ostream OS(Buffer);
+    OS << "Expected:\n";
+    for (auto &[ExpectedAllocType, ExpectedStackIds] : ExpectedVals) {
+      OS << "\t" << getAllocTypeAttributeString(ExpectedAllocType) << " { ";
+      for (auto Id : ExpectedStackIds)
+        OS << Id << " ";
+      OS << "}\n";
+    }
+    OS << "Got:\n";
+    arg->printTree(OS);
+    *result_listener << "!memprof metadata differs!\n" << Buffer;
+    return false;
+  };
+
+  if (ExpectedVals.size() != arg->getNumOperands())
+    return PrintAndFail();
+
+  for (size_t I = 0; I < ExpectedVals.size(); I++) {
+    const auto &[ExpectedAllocType, ExpectedStackIds] = ExpectedVals[I];
+    MDNode *MIB = dyn_cast<MDNode>(arg->getOperand(I));
+    if (getMIBAllocType(MIB) != ExpectedAllocType)
+      return PrintAndFail();
+    MDNode *StackMD = getMIBStackNode(MIB);
+    EXPECT_NE(StackMD, nullptr);
+    if (StackMD->getNumOperands() != ExpectedStackIds.size())
+      return PrintAndFail();
+    for (size_t J = 0; J < ExpectedStackIds.size(); J++) {
+      auto *StackId = mdconst::dyn_extract<ConstantInt>(StackMD->getOperand(J));
+      if (StackId->getZExtValue() != ExpectedStackIds[J])
+        return PrintAndFail();
+    }
+  }
+  return true;
+}
+
 // Test CallStackTrie::addCallStack interface taking allocation type and list of
 // call stack ids.
 // Test that an allocation call reached by both cold and non cold call stacks
@@ -344,6 +387,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   CallStackTrie Trie;
   Trie.addCallStack(AllocationType::Cold, {1, 2});
   Trie.addCallStack(AllocationType::NotCold, {1, 3});
+  // This will be pruned as it is unnecessary to determine how to clone the
+  // cold allocation.
   Trie.addCallStack(AllocationType::Hot, {1, 4});
 
   CallBase *Call = findCall(*Func, "call");
@@ -352,7 +397,7 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_FALSE(Call->hasFnAttr("memprof"));
   EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
   MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
-  ASSERT_EQ(MemProfMD->getNumOperands(), 3u);
+  ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
   for (auto &MIBOp : MemProfMD->operands()) {
     MDNode *MIB = dyn_cast<MDNode>(MIBOp);
     MDNode *StackMD = getMIBStackNode(MIB);
@@ -365,10 +410,6 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Cold);
     } else if (StackId->getZExtValue() == 3u) {
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
-    } else {
-      ASSERT_EQ(StackId->getZExtValue(), 4u);
-      // Hot contexts are converted to NotCold when building the metadata.
-      EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
     }
   }
 }
@@ -404,8 +445,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   // with the non-cold context {1, 5}.
   Trie.addCallStack(AllocationType::NotCold, {1, 5, 6});
   Trie.addCallStack(AllocationType::NotCold, {1, 5, 7});
-  // We should be able to trim the following two and combine into a single MIB
-  // with the hot context {1, 8}.
+  // These will be pruned as they are unnecessary to determine how to clone the
+  // cold allocation.
   Trie.addCallStack(AllocationType::Hot, {1, 8, 9});
   Trie.addCallStack(AllocationType::Hot, {1, 8, 10});
 
@@ -415,7 +456,7 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   EXPECT_FALSE(Call->hasFnAttr("memprof"));
   EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
   MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
-  ASSERT_EQ(MemProfMD->getNumOperands(), 3u);
+  ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
   for (auto &MIBOp : MemProfMD->operands()) {
     MDNode *MIB = dyn_cast<MDNode>(MIBOp);
     MDNode *StackMD = getMIBStackNode(MIB);
@@ -428,14 +469,65 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Cold);
     else if (StackId->getZExtValue() == 5u)
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
-    else {
-      ASSERT_EQ(StackId->getZExtValue(), 8u);
-      // Hot contexts are converted to NotCold when building the metadata.
-      EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
-    }
   }
 }
 
+// Test to ensure that we prune NotCold contexts that are unneeded for
+// determining where Cold contexts need to be cloned to enable correct hinting.
+TEST_F(MemoryProfileInfoTest, PruneUnneededNotColdContexts) {
+  LLVMContext C;
+  std::unique_ptr<Module> M = makeLLVMModule(C,
+                                             R"IR(
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-linux-gnu"
+define i32* @test() {
+entry:
+  %call = call noalias dereferenceable_or_null(40) i8* @malloc(i64 noundef 40)
+  %0 = bitcast i8* %call to i32*
+  ret i32* %0
+}
+declare dso_local noalias noundef i8* @malloc(i64 noundef)
+)IR");
+
+  Function *Func = M->getFunction("test");
+
+  CallStackTrie Trie;
+
+  Trie.addCallStack(AllocationType::Cold, {1, 2, 3, 4});
+  Trie.addCallStack(AllocationType::Cold, {1, 2, 3, 5, 6, 7});
+  // This NotCold context is needed to know where the above two Cold contexts
+  // must be cloned from:
+  Trie.addCallStack(AllocationType::NotCold, {1, 2, 3, 5, 6, 13});
+
+  Trie.addCallStack(AllocationType::Cold, {1, 2, 3, 8, 9, 10});
+  // This NotCold context is needed to know where the above Cold context must be
+  // cloned from:
+  Trie.addCallStack(AllocationType::NotCold, {1, 2, 3, 8, 9, 14});
+  // This NotCold context is not needed since the above is sufficient (we pick
+  // the first in sorted order).
+  Trie.addCallStack(AllocationType::NotCold, {1, 2, 3, 8, 9, 15});
+
+  // Neither of these two NotCold contexts are needed as the Cold contexts they
+  // overlap with are covered by longer overlapping NotCold contexts.
+  Trie.addCallStack(AllocationType::NotCold, {1, 2, 3, 12});
+  Trie.addCallStack(AllocationType::NotCold, {1, 2, 11});
+
+  std::vector<std::pair<AllocationType, std::vector<unsigned>>> ExpectedVals = {
+      {AllocationType::Cold, {1, 2, 3, 4}},
+      {AllocationType::Cold, {1, 2, 3, 5, 6, 7}},
+      {AllocationType::NotCold, {1, 2, 3, 5, 6, 13}},
+      {AllocationType::Cold, {1, 2, 3, 8, 9, 10}},
+      {AllocationType::NotCold, {1, 2, 3, 8, 9, 14}}};
+
+  CallBase *Call = findCall(*Func, "call");
+  Trie.buildAndAttachMIBMetadata(Call);
+
+  EXPECT_FALSE(Call->hasFnAttr("memprof"));
+  EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
+  MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
+  EXPECT_THAT(MemProfMD, MemprofMetadataEquals(ExpectedVals));
+}
+
 // Test CallStackTrie::addCallStack interface taking memprof MIB metadata.
 // Check that allocations annotated with memprof metadata with a single
 // allocation type get simplified to an attribute.
@@ -555,11 +647,13 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
   // with the cold context {1, 2}.
   // We should be able to trim the second two and combine into a single MIB
   // with the non-cold context {1, 5}.
+  // The hot allocations will be converted to NotCold and pruned as they
+  // are unnecessary to determine how to clone the cold allocation.
 
   EXPECT_FALSE(Call->hasFnAttr("memprof"));
   EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
   MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
-  ASSERT_EQ(MemProfMD->getNumOperands(), 3u);
+  ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
   for (auto &MIBOp : MemProfMD->operands()) {
     MDNode *MIB = dyn_cast<MDNode>(MIBOp);
     MDNode *StackMD = getMIBStackNode(MIB);
@@ -572,11 +666,6 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Cold);
     else if (StackId->getZExtValue() == 5u)
       EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
-    else {
-      ASSERT_EQ(StackId->getZExtValue(), 8u);
-      // Hot contexts are converted to NotCold when building the new metadata.
-      EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
-    }
   }
 }
 

>From 8ea9e2ed496d1be285f22e1c555de239383fced5 Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Tue, 28 Jan 2025 11:25:08 -0800
Subject: [PATCH 2/4] Fix some comments

---
 llvm/lib/Analysis/MemoryProfileInfo.cpp | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 409748ec80fe79..8456af8d373e87 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -263,7 +263,7 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
   if (hasSingleAllocType(Node->AllocTypes)) {
     // Because we only clone cold contexts (we don't clone for exposing NotCold
     // contexts as that is the default allocation behavior), we create MIB
-    // metadata for this context iff:
+    // metadata for this context if any of the following are true:
     // 1) It is cold.
     // 2) The immediate callee is the deepest point where we have an ambiguous
     //    allocation type (i.e. the other callers that are cold need to know
@@ -371,11 +371,15 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
   MIBCallStack.push_back(AllocStackId);
   std::vector<Metadata *> MIBNodes;
   assert(!Alloc->Callers.empty() && "addCallStack has not been called yet");
-  // The last parameter is meant to say whether the callee of the given node
-  // has is the deepest point where we have ambiguous alloc types. Here the
-  // node being passed in is the alloc and it has no callees. So it's false.
+  // The CalleeHasAmbiguousCallerContext flag is meant to say whether the
+  // callee of the given node has more than one caller. Here the node being
+  // passed in is the alloc and it has no callees. So it's false.
+  // Similarly, the last parameter is meant to say whether the callee of the
+  // given node is the deepest point where we have ambiguous alloc types, which
+  // is also false as the alloc has no callees.
   bool DeepestAmbiguousAllocType = true;
-  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes, false,
+  if (buildMIBNodes(Alloc, Ctx, MIBCallStack, MIBNodes,
+                    /*CalleeHasAmbiguousCallerContext=*/false,
                     DeepestAmbiguousAllocType)) {
     assert(MIBCallStack.size() == 1 &&
            "Should only be left with Alloc's location in stack");

>From ed2c505a230ee34feaf4c70199e93b72cc659d04 Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Wed, 29 Jan 2025 07:20:01 -0800
Subject: [PATCH 3/4] Simplify flag description per review

---
 llvm/lib/Analysis/MemoryProfileInfo.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 8456af8d373e87..b932ef3a14c87b 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -56,7 +56,7 @@ cl::opt<bool> MemProfReportHintedSizes(
 // or have specified a value less than 100% for -memprof-cloning-cold-threshold.
 cl::opt<bool> MemProfKeepAllNotColdContexts(
     "memprof-keep-all-not-cold-contexts", cl::init(false), cl::Hidden,
-    cl::desc("Disable pruning of non-cold contexts unneeded for cold cloning"));
+    cl::desc("Keep all non-cold contexts (increases cloning overheads)"));
 
 AllocationType llvm::memprof::getAllocType(uint64_t TotalLifetimeAccessDensity,
                                            uint64_t AllocCount,

>From f7aba99cc3fe11954618916462cd5c23d8ae9f41 Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Wed, 29 Jan 2025 07:24:32 -0800
Subject: [PATCH 4/4] Use new hasAllocType helper

---
 llvm/lib/Analysis/MemoryProfileInfo.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index b932ef3a14c87b..a22344e19d0450 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -273,7 +273,7 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
     //    reporting hinted sizes, and want to get information from the indexing
     //    step for all contexts, or have specified a value less than 100% for
     //    -memprof-cloning-cold-threshold.
-    if ((AllocationType)Node->AllocTypes == AllocationType::Cold ||
+    if (Node->hasAllocType(AllocationType::Cold) ||
         CalleeDeepestAmbiguousAllocType || MemProfKeepAllNotColdContexts) {
       std::vector<ContextTotalSize> ContextSizeInfo;
       collectContextSizeInfo(Node, ContextSizeInfo);
@@ -282,7 +282,7 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
                                        ContextSizeInfo));
       // If we just emitted an MIB for a not cold caller, don't need to emit
       // another one for the callee to correctly disambiguate its cold callers.
-      if ((AllocationType)Node->AllocTypes != AllocationType::Cold)
+      if (!Node->hasAllocType(AllocationType::Cold))
         CalleeDeepestAmbiguousAllocType = false;
     }
     return true;



More information about the llvm-commits mailing list