[llvm] 5c2ae37 - [CSSPGO][Preinliner] Trim cold call edges of the profiled call graph for a more stable profile generation.
Hongtao Yu via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 28 16:25:08 PDT 2023
Author: Hongtao Yu
Date: 2023-03-28T16:24:38-07:00
New Revision: 5c2ae37bbeff948540712bf02ac3eb5aa0b0d43a
URL: https://github.com/llvm/llvm-project/commit/5c2ae37bbeff948540712bf02ac3eb5aa0b0d43a
DIFF: https://github.com/llvm/llvm-project/commit/5c2ae37bbeff948540712bf02ac3eb5aa0b0d43a.diff
LOG: [CSSPGO][Preinliner] Trim cold call edges of the profiled call graph for a more stable profile generation.
I've noticed that for some services CSSPGO profile is less stable than non-CS AutoFDO profile from profiling to profiling without source changes. This is manifested by comparing profile similarities. For example in my experiments, AutoFDO profiles are always 99+% similar over same binary but different inputs (very close dynamic traffics) while CSSPGO profile similarity is around 90%.
The main source of the profile stability is the top-down order computed on the profiled call graph in the llvm-profgen CS preinliner. The top-down order is used to guide the CS preinliner to pre-compute an inline decision that is later on fulfilled by the compiler. A subtle change in the top-down order from run to run could cause a different inline decision computed. A deeper look in the diversion of the top-down order revealed that:
- The topological sorting inside one SCC isn't quite right. This is fixed by {D130717}.
- The profiled call graphs of the two sides of the A/B run isn't 100% the same. The call edges in the two runs do not subsume each other, and edges appear in both graphs may not have exactly the same weight. This is due to the nature that the graphs are dynamic. However, I saw that the graphs can be made more close by removing the cold edges from them and this bumped up the CSSPGO profile stableness to the same level of the AutoFDO profile.
Removing cold call edges from the dynamic call graph may have an impact on cold inlining, but so far I haven't seen any performance issues since the CS preinliner mainly targets hot callsites, and cold inlining can always be done by the compiler CGSCC inliner.
Also fixing an issue where the largest weight instead of the accumulated weight for a call edge is used in the profiled call graph.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D147013
Added:
Modified:
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
llvm/tools/llvm-profgen/CSPreInliner.cpp
Removed:
################################################################################
diff --git a/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h b/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
index 5e12fcfeae1b4..bc8360a80bc02 100644
--- a/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
+++ b/llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
@@ -64,16 +64,22 @@ class ProfiledCallGraph {
using iterator = ProfiledCallGraphNode::iterator;
// Constructor for non-CS profile.
- ProfiledCallGraph(SampleProfileMap &ProfileMap) {
+ ProfiledCallGraph(SampleProfileMap &ProfileMap,
+ uint64_t IgnoreColdCallThreshold = 0) {
assert(!FunctionSamples::ProfileIsCS &&
"CS flat profile is not handled here");
for (const auto &Samples : ProfileMap) {
addProfiledCalls(Samples.second);
}
+
+ // Trim edges with weight up to `IgnoreColdCallThreshold`. This aims
+ // for a more stable call graph with "determinstic" edges from run to run.
+ trimColdEges(IgnoreColdCallThreshold);
}
// Constructor for CS profile.
- ProfiledCallGraph(SampleContextTracker &ContextTracker) {
+ ProfiledCallGraph(SampleContextTracker &ContextTracker,
+ uint64_t IgnoreColdCallThreshold = 0) {
// BFS traverse the context profile trie to add call edges for calls shown
// in context.
std::queue<ContextTrieNode *> Queue;
@@ -121,11 +127,16 @@ class ProfiledCallGraph {
ContextTracker.getFuncNameFor(Callee), Weight);
}
}
+
+ // Trim edges with weight up to `IgnoreColdCallThreshold`. This aims
+ // for a more stable call graph with "determinstic" edges from run to run.
+ trimColdEges(IgnoreColdCallThreshold);
}
iterator begin() { return Root.Edges.begin(); }
iterator end() { return Root.Edges.end(); }
ProfiledCallGraphNode *getEntryNode() { return &Root; }
+
void addProfiledFunction(StringRef Name) {
if (!ProfiledFunctions.count(Name)) {
// Link to synthetic root to make sure every node is reachable
@@ -148,8 +159,9 @@ class ProfiledCallGraph {
auto EdgeIt = Edges.find(Edge);
if (EdgeIt == Edges.end()) {
Edges.insert(Edge);
- } else if (EdgeIt->Weight < Edge.Weight) {
- // Replace existing call edges with same target but smaller weight.
+ } else {
+ // Accumulate weight to the existing edge.
+ Edge.Weight += EdgeIt->Weight;
Edges.erase(EdgeIt);
Edges.insert(Edge);
}
@@ -175,6 +187,24 @@ class ProfiledCallGraph {
}
}
+ // Trim edges with weight up to `Threshold`. Do not trim anything if
+ // `Threshold` is zero.
+ void trimColdEges(uint64_t Threshold = 0) {
+ if (!Threshold)
+ return;
+
+ for (auto &Node : ProfiledFunctions) {
+ auto &Edges = Node.second.Edges;
+ auto I = Edges.begin();
+ while (I != Edges.end()) {
+ if (I->Weight <= Threshold)
+ I = Edges.erase(I);
+ else
+ I++;
+ }
+ }
+ }
+
ProfiledCallGraphNode Root;
StringMap<ProfiledCallGraphNode> ProfiledFunctions;
};
diff --git a/llvm/tools/llvm-profgen/CSPreInliner.cpp b/llvm/tools/llvm-profgen/CSPreInliner.cpp
index 6551eb8dd82aa..f0433865da449 100644
--- a/llvm/tools/llvm-profgen/CSPreInliner.cpp
+++ b/llvm/tools/llvm-profgen/CSPreInliner.cpp
@@ -76,7 +76,12 @@ CSPreInliner::CSPreInliner(SampleContextTracker &Tracker,
std::vector<StringRef> CSPreInliner::buildTopDownOrder() {
std::vector<StringRef> Order;
- ProfiledCallGraph ProfiledCG(ContextTracker);
+ // Trim cold edges to get a more stable call graph. This allows for a more
+ // stable top-down order which in turns helps the stablity of the generated
+ // profile from run to run.
+ uint64_t ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(
+ (Summary->getDetailedSummary()));
+ ProfiledCallGraph ProfiledCG(ContextTracker, ColdCountThreshold);
// Now that we have a profiled call graph, construct top-down order
// by building up SCC and reversing SCC order.
More information about the llvm-commits
mailing list