[llvm] [MemProf] Disable hot hints by default, otherwise convert to NotCold (PR #124219)
Teresa Johnson via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 23 18:38:57 PST 2025
https://github.com/teresajohnson created https://github.com/llvm/llvm-project/pull/124219
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloning to expose hot allocation contexts, and none is planned for the
forseeable future.
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.
Additionally the generation of any hot hints has been disabled by
default, as few allocations were unambiguously hot and this avoids any
overhead from converting the hot contexts to notcold.
This change resulted in significant overhead reductions for a large
target:
~48% reduction in the per-module ThinLTO bitcode summary sizes
~72% reduction in the distributed ThinLTO bitcode combined summary sizes
~68% reduction in thin link time
~34% reduction in thin link peak memory
Note the same effect occurs from either disabling hot hints by default
or the early conversion to notcold.
>From 21bdbcf14f526139eaf83e6624a8b0814a69b66d Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Thu, 23 Jan 2025 18:23:21 -0800
Subject: [PATCH] [MemProf] Disable hot hints by default, otherwise convert to
NotCold
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloning to expose hot allocation contexts, and none is planned for the
forseeable future.
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.
Additionally the generation of any hot hints has been disabled by
default, as few allocations were unambiguously hot and this avoids any
overhead from converting the hot contexts to notcold.
This change resulted in significant overhead reductions for a large
target:
~48% reduction in the per-module ThinLTO bitcode summary sizes
~72% reduction in the distributed ThinLTO bitcode combined summary sizes
~68% reduction in thin link time
~34% reduction in thin link peak memory
Note the same effect occurs from either disabling hot hints by default
or the early conversion to notcold.
---
.../include/llvm/Analysis/MemoryProfileInfo.h | 5 ++
llvm/lib/Analysis/MemoryProfileInfo.cpp | 35 +++++++-
llvm/test/Transforms/PGOProfile/memprof.ll | 4 +-
.../PGOProfile/memprof_loop_unroll.ll | 4 +-
.../Analysis/MemoryProfileInfoTest.cpp | 90 +++++++------------
5 files changed, 77 insertions(+), 61 deletions(-)
diff --git a/llvm/include/llvm/Analysis/MemoryProfileInfo.h b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
index 215139caef696d..11325da3b86461 100644
--- a/llvm/include/llvm/Analysis/MemoryProfileInfo.h
+++ b/llvm/include/llvm/Analysis/MemoryProfileInfo.h
@@ -85,6 +85,11 @@ class CallStackTrie {
void collectContextSizeInfo(CallStackTrieNode *Node,
std::vector<ContextTotalSize> &ContextSizeInfo);
+ // Recursively convert hot allocation types to notcold, since we don't
+ // actually do any cloning for hot contexts, to facilitate more aggressive
+ // pruning of contexts.
+ void convertHotToNotCold(CallStackTrieNode *Node);
+
// Recursive helper to trim contexts and create metadata nodes.
bool buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,
std::vector<uint64_t> &MIBCallStack,
diff --git a/llvm/lib/Analysis/MemoryProfileInfo.cpp b/llvm/lib/Analysis/MemoryProfileInfo.cpp
index 2f3c87a89f9f98..d140ac08fa49e2 100644
--- a/llvm/lib/Analysis/MemoryProfileInfo.cpp
+++ b/llvm/lib/Analysis/MemoryProfileInfo.cpp
@@ -42,6 +42,11 @@ cl::opt<unsigned> MemProfMinAveLifetimeAccessDensityHotThreshold(
cl::desc("The minimum TotalLifetimeAccessDensity / AllocCount for an "
"allocation to be considered hot"));
+cl::opt<bool>
+ MemProfUseHotHints("memprof-use-hot-hints", cl::init(false), cl::Hidden,
+ cl::desc("Enable use of hot hints (only supported for "
+ "unambigously hot allocations)"));
+
cl::opt<bool> MemProfReportHintedSizes(
"memprof-report-hinted-sizes", cl::init(false), cl::Hidden,
cl::desc("Report total allocation sizes of hinted allocations"));
@@ -60,8 +65,9 @@ AllocationType llvm::memprof::getAllocType(uint64_t TotalLifetimeAccessDensity,
// The access densities are multiplied by 100 to hold 2 decimal places of
// precision, so need to divide by 100.
- if (((float)TotalLifetimeAccessDensity) / AllocCount / 100 >
- MemProfMinAveLifetimeAccessDensityHotThreshold)
+ if (MemProfUseHotHints &&
+ ((float)TotalLifetimeAccessDensity) / AllocCount / 100 >
+ MemProfMinAveLifetimeAccessDensityHotThreshold)
return AllocationType::Hot;
return AllocationType::NotCold;
@@ -222,6 +228,15 @@ void CallStackTrie::collectContextSizeInfo(
collectContextSizeInfo(Caller.second, ContextSizeInfo);
}
+void CallStackTrie::convertHotToNotCold(CallStackTrieNode *Node) {
+ if (Node->AllocTypes & static_cast<uint8_t>(AllocationType::Hot)) {
+ Node->AllocTypes &= ~static_cast<uint8_t>(AllocationType::Hot);
+ Node->AllocTypes |= static_cast<uint8_t>(AllocationType::NotCold);
+ }
+ for (auto &Caller : Node->Callers)
+ convertHotToNotCold(Caller.second);
+}
+
// Recursive helper to trim contexts and create metadata nodes.
// Caller should have pushed Node's loc to MIBCallStack. Doing this in the
// caller makes it simpler to handle the many early returns in this method.
@@ -301,6 +316,22 @@ bool CallStackTrie::buildAndAttachMIBMetadata(CallBase *CI) {
"single");
return false;
}
+ // If there were any hot allocation contexts, the Alloc trie node would have
+ // the Hot type set. If so, because we don't currently support cloning for hot
+ // contexts, they should be converted to NotCold. This happens in the cloning
+ // support anyway, however, doing this now enables more aggressive context
+ // trimming when building the MIB metadata (and possibly may make the
+ // allocation have a single NotCold allocation type), greatly reducing
+ // overheads in bitcode, cloning memory and cloning time.
+ if (Alloc->AllocTypes & static_cast<uint8_t>(AllocationType::Hot)) {
+ convertHotToNotCold(Alloc);
+ // Check whether we now have a single alloc type.
+ if (hasSingleAllocType(Alloc->AllocTypes)) {
+ addSingleAllocTypeAttribute(CI, (AllocationType)Alloc->AllocTypes,
+ "single");
+ return false;
+ }
+ }
auto &Ctx = CI->getContext();
std::vector<uint64_t> MIBCallStack;
MIBCallStack.push_back(AllocStackId);
diff --git a/llvm/test/Transforms/PGOProfile/memprof.ll b/llvm/test/Transforms/PGOProfile/memprof.ll
index c0e44cccbf16ff..6aa2d307a1dc88 100644
--- a/llvm/test/Transforms/PGOProfile/memprof.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof.ll
@@ -408,8 +408,8 @@ for.end: ; preds = %for.cond
; MEMPROFRAND2: !"cold"
; MEMPROFRAND2: !"cold"
; MEMPROFRAND2: !"cold"
-; MEMPROFRAND2: !"hot"
-; MEMPROFRAND2: !"hot"
+; MEMPROFRAND2: !"notcold"
+; MEMPROFRAND2: !"notcold"
; MEMPROFSTATS: 8 memprof - Number of alloc contexts in memory profile.
; MEMPROFSTATS: 10 memprof - Number of callsites in memory profile.
diff --git a/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll b/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll
index 9bc1282ab4529e..2461ca32e98212 100644
--- a/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll
+++ b/llvm/test/Transforms/PGOProfile/memprof_loop_unroll.ll
@@ -10,7 +10,9 @@
;; $ clang++ -gmlt -fdebug-info-for-profiling -S %S/Inputs/memprof_loop_unroll_b.cc -emit-llvm
; RUN: llvm-profdata merge %S/Inputs/memprof_loop_unroll.memprofraw --profiled-binary %S/Inputs/memprof_loop_unroll.exe -o %t.memprofdata
-; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s
+;; Set the minimum lifetime threshold to 0 to ensure that one context is
+;; considered cold (the other will be notcold).
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -S -memprof-report-hinted-sizes -memprof-ave-lifetime-cold-threshold=0 2>&1 | FileCheck %s
;; Conservatively annotate as not cold. We get two messages as there are two
;; unrolled copies of the allocation.
diff --git a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
index 4c177ae8446905..4161be0fadca73 100644
--- a/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
+++ b/llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
@@ -25,6 +25,7 @@ using namespace llvm::memprof;
extern cl::opt<float> MemProfLifetimeAccessDensityColdThreshold;
extern cl::opt<unsigned> MemProfAveLifetimeColdThreshold;
extern cl::opt<unsigned> MemProfMinAveLifetimeAccessDensityHotThreshold;
+extern cl::opt<bool> MemProfUseHotHints;
namespace {
@@ -81,14 +82,22 @@ TEST_F(MemoryProfileInfoTest, GetAllocType) {
// MemProfMinAveLifetimeAccessDensityHotThreshold
// so compute the HotTotalLifetimeAccessDensityThreshold at the threshold.
const uint64_t HotTotalLifetimeAccessDensityThreshold =
- (uint64_t)(MemProfMinAveLifetimeAccessDensityHotThreshold * AllocCount * 100);
-
-
+ (uint64_t)(MemProfMinAveLifetimeAccessDensityHotThreshold * AllocCount * 100);
+
+ // Make sure the option for detecting hot allocations is set.
+ MemProfUseHotHints = true;
// Test Hot
// More accesses per byte per sec than hot threshold is hot.
EXPECT_EQ(getAllocType(HotTotalLifetimeAccessDensityThreshold + 1, AllocCount,
ColdTotalLifetimeThreshold + 1),
- AllocationType::Hot);
+ AllocationType::Hot);
+ // Undo the manual set of the option above.
+ cl::ResetAllOptionOccurrences();
+
+ // Without MemProfUseHotHints (default) we should treat simply as NotCold.
+ EXPECT_EQ(getAllocType(HotTotalLifetimeAccessDensityThreshold + 1, AllocCount,
+ ColdTotalLifetimeThreshold + 1),
+ AllocationType::NotCold);
// Test Cold
// Long lived with less accesses per byte per sec than cold threshold is cold.
@@ -155,6 +164,8 @@ define i32* @test() {
%1 = bitcast i8* %call2 to i32*
%call3 = call noalias dereferenceable_or_null(40) i8* @malloc(i64 noundef 40)
%2 = bitcast i8* %call3 to i32*
+ %call4 = call noalias dereferenceable_or_null(40) i8* @malloc(i64 noundef 40)
+ %3 = bitcast i8* %call4 to i32*
ret i32* %1
}
declare dso_local noalias noundef i8* @malloc(i64 noundef)
@@ -194,6 +205,18 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_FALSE(Call3->hasMetadata(LLVMContext::MD_memprof));
EXPECT_TRUE(Call3->hasFnAttr("memprof"));
EXPECT_EQ(Call3->getFnAttr("memprof").getValueAsString(), "hot");
+
+ // Fourth call has hot and non-cold contexts. These should be treated as
+ // notcold and given a notcold attribute.
+ CallStackTrie Trie4;
+ Trie4.addCallStack(AllocationType::Hot, {5, 6});
+ Trie4.addCallStack(AllocationType::NotCold, {5, 7, 8});
+ CallBase *Call4 = findCall(*Func, "call4");
+ Trie4.buildAndAttachMIBMetadata(Call4);
+
+ EXPECT_FALSE(Call4->hasMetadata(LLVMContext::MD_memprof));
+ EXPECT_TRUE(Call4->hasFnAttr("memprof"));
+ EXPECT_EQ(Call4->getFnAttr("memprof").getValueAsString(), "notcold");
}
// Test CallStackTrie::addCallStack interface taking allocation type and list of
@@ -289,56 +312,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Cold);
else {
ASSERT_EQ(StackId->getZExtValue(), 3u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
- }
- }
-}
-
-// Test CallStackTrie::addCallStack interface taking allocation type and list of
-// call stack ids.
-// Test that an allocation call reached by both non cold and hot call stacks
-// gets memprof metadata representing the different allocation type contexts.
-TEST_F(MemoryProfileInfoTest, NotColdAndHotMIB) {
- LLVMContext C;
- std::unique_ptr<Module> M = makeLLVMModule(C,
- R"IR(
-target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
-target triple = "x86_64-pc-linux-gnu"
-define i32* @test() {
-entry:
- %call = call noalias dereferenceable_or_null(40) i8* @malloc(i64 noundef 40)
- %0 = bitcast i8* %call to i32*
- ret i32* %0
-}
-declare dso_local noalias noundef i8* @malloc(i64 noundef)
-)IR");
-
- Function *Func = M->getFunction("test");
-
- CallStackTrie Trie;
- Trie.addCallStack(AllocationType::NotCold, {1, 2});
- Trie.addCallStack(AllocationType::Hot, {1, 3});
-
- CallBase *Call = findCall(*Func, "call");
- Trie.buildAndAttachMIBMetadata(Call);
-
- EXPECT_FALSE(Call->hasFnAttr("memprof"));
- EXPECT_TRUE(Call->hasMetadata(LLVMContext::MD_memprof));
- MDNode *MemProfMD = Call->getMetadata(LLVMContext::MD_memprof);
- ASSERT_EQ(MemProfMD->getNumOperands(), 2u);
- for (auto &MIBOp : MemProfMD->operands()) {
- MDNode *MIB = dyn_cast<MDNode>(MIBOp);
- MDNode *StackMD = getMIBStackNode(MIB);
- ASSERT_NE(StackMD, nullptr);
- ASSERT_EQ(StackMD->getNumOperands(), 2u);
- auto *StackId = mdconst::dyn_extract<ConstantInt>(StackMD->getOperand(0));
- ASSERT_EQ(StackId->getZExtValue(), 1u);
- StackId = mdconst::dyn_extract<ConstantInt>(StackMD->getOperand(1));
- if (StackId->getZExtValue() == 2u)
+ // Hot contexts are converted to NotCold when building the metadata.
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
- else {
- ASSERT_EQ(StackId->getZExtValue(), 3u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
}
}
}
@@ -391,7 +366,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
} else {
ASSERT_EQ(StackId->getZExtValue(), 4u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
+ // Hot contexts are converted to NotCold when building the metadata.
+ EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
}
}
}
@@ -453,7 +429,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
else {
ASSERT_EQ(StackId->getZExtValue(), 8u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
+ // Hot contexts are converted to NotCold when building the metadata.
+ EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
}
}
}
@@ -596,7 +573,8 @@ declare dso_local noalias noundef i8* @malloc(i64 noundef)
EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
else {
ASSERT_EQ(StackId->getZExtValue(), 8u);
- EXPECT_EQ(getMIBAllocType(MIB), AllocationType::Hot);
+ // Hot contexts are converted to NotCold when building the new metadata.
+ EXPECT_EQ(getMIBAllocType(MIB), AllocationType::NotCold);
}
}
}
More information about the llvm-commits
mailing list