[llvm] [SamplePGO] Add a cutoff for number of profile matching anchors (PR #95542)
Krzysztof Pszeniczny via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 14 06:25:37 PDT 2024
https://github.com/amharc updated https://github.com/llvm/llvm-project/pull/95542
>From af8188cfb1c1d96e47b3a15cb41e2c66cc197850 Mon Sep 17 00:00:00 2001
From: Krzysztof Pszeniczny <kpszeniczny at google.com>
Date: Fri, 14 Jun 2024 15:19:33 +0200
Subject: [PATCH] [SamplePGO] Add a cutoff for number of profile matching
anchors
The algorithm added by PR #87375 can be potentially quadratic in the
number of anchors. This is almost never a problem because normally
functions have a reasonable number of function calls.
However, in some rare cases of auto-generated code we observed very
large functions that trigger quadratic behaviour here (resulting in
>130GB of peak heap memory usage for clang). Let's add a knob for
controlling the max number of callsites in a function above which stale
profile matching won't be performed.
---
.../Transforms/IPO/SampleProfileMatcher.cpp | 18 ++++++++++++++++++
.../pseudo-probe-stale-profile-matching-LCS.ll | 3 +++
2 files changed, 21 insertions(+)
diff --git a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
index d7613bce4c52e..d484b53904801 100644
--- a/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp
@@ -14,6 +14,7 @@
#include "llvm/Transforms/IPO/SampleProfileMatcher.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/MDBuilder.h"
+#include "llvm/Support/CommandLine.h"
using namespace llvm;
using namespace sampleprof;
@@ -24,6 +25,11 @@ extern cl::opt<bool> SalvageStaleProfile;
extern cl::opt<bool> PersistProfileStaleness;
extern cl::opt<bool> ReportProfileStaleness;
+static cl::opt<int> SalvageStaleProfileMaxCallsites(
+ "salvage-stale-profile-max-callsites", cl::Hidden, cl::init(INT_MAX),
+ cl::desc("The maximum number of callsites in a function, above which stale "
+ "profile matching will be skipped."));
+
void SampleProfileMatcher::findIRAnchors(const Function &F,
AnchorMap &IRAnchors) {
// For inlined code, recover the original callsite and callee by finding the
@@ -300,6 +306,18 @@ void SampleProfileMatcher::runStaleProfileMatching(
if (FilteredIRAnchorsList.empty() || FilteredProfileAnchorList.empty())
return;
+
+ if (FilteredIRAnchorsList.size() > SalvageStaleProfileMaxCallsites ||
+ FilteredProfileAnchorList.size() > SalvageStaleProfileMaxCallsites) {
+ LLVM_DEBUG(dbgs() << "Skip stale profile matching for " << F.getName()
+ << " because the number of callsites in the IR is "
+ << FilteredIRAnchorsList.size()
+ << " and in the profile is "
+ << FilteredProfileAnchorList.size() << "\n");
+ return;
+ }
+
+
// Match the callsite anchors by finding the longest common subsequence
// between IR and profile. Note that we need to use IR anchor as base(A side)
// to align with the order of IRToProfileLocationMap.
diff --git a/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching-LCS.ll b/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching-LCS.ll
index ecf8484d98e59..4b8cd853301ed 100644
--- a/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching-LCS.ll
+++ b/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching-LCS.ll
@@ -1,6 +1,7 @@
; REQUIRES: x86_64-linux
; REQUIRES: asserts
; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-stale-profile-matching-LCS.prof --salvage-stale-profile -S --debug-only=sample-profile,sample-profile-matcher,sample-profile-impl 2>&1 | FileCheck %s
+; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-stale-profile-matching-LCS.prof --salvage-stale-profile -S --debug-only=sample-profile,sample-profile-matcher,sample-profile-impl --salvage-stale-profile-max-callsites=6 2>&1 | FileCheck %s -check-prefix=CHECK-MAX-CALLSITES
; CHECK: Run stale profile matching for test_direct_call
; CHECK: Location is matched from 1 to 1
@@ -27,6 +28,8 @@
; CHECK: Callsite with callee:unknown.indirect.callee is matched from 9 to 6
; CHECK: Callsite with callee:C is matched from 10 to 7
+; CHECK-MAX-CALLSITES: Skip stale profile matching for test_direct_call
+; CHECK-MAX-CALLSITES-NOT: Skip stale profile matching for test_indirect_call
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
More information about the llvm-commits
mailing list