[llvm] [MemProf] Reduce unnecessary context id computation (NFC) (PR #109857)

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 24 13:17:02 PDT 2024


https://github.com/teresajohnson updated https://github.com/llvm/llvm-project/pull/109857

>From 7a2c61c5994d9e9f1f5041b0ad1d5f8ae53023a8 Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Tue, 24 Sep 2024 12:56:06 -0700
Subject: [PATCH 1/2] [MemProf] Reduce unnecessary context id computation (NFC)

One of the memory reduction techniques was to compute node context ids
on the fly. This reduced memory at the expense of some compile time
increase.

For a large binary we were spending a lot of time invoking getContextIds
on the node during assignStackNodesPostOrder, because we were iterating
through the stack ids for a call from leaf to root (first to last node
in the parlance used in that code). However, all calls for a given entry
in the StackIdToMatchingCalls map share the same last node, so we can
borrow the approach used by similar code in updateStackNodes and compute
the context ids on the last node once, then iterate each call's stack
ids in reverse order while reusing the last node's context ids.

This reduced the thin link time by 43% for a large target. It isn't
clear why there wasn't a similar increase measured when introducing the
node context id recomputation, but the compile time was longer to start
with then.
---
 .../IPO/MemProfContextDisambiguation.cpp      | 41 +++++++++++++------
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index 6927fe538e367b..9f324a6739c9ba 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -1362,12 +1362,22 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
     }
   }
 
+#ifndef NDEBUG
   // Find the node for the last stack id, which should be the same
   // across all calls recorded for this id, and is this node's id.
   uint64_t LastId = Node->OrigStackOrAllocId;
   ContextNode *LastNode = getNodeForStackId(LastId);
   // We should only have kept stack ids that had nodes.
   assert(LastNode);
+  assert(LastNode == Node);
+#else
+  ContextNode *LastNode = Node;
+#endif
+
+  // Compute the last node's context ids once, as it is shared by all calls in
+  // this entry.
+  DenseSet<uint32_t> LastNodeContextIds = LastNode->getContextIds();
+  assert(!LastNodeContextIds.empty());
 
   for (unsigned I = 0; I < Calls.size(); I++) {
     auto &[Call, Ids, Func, SavedContextIds] = Calls[I];
@@ -1398,31 +1408,36 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
     // duplicated context ids. We have to recompute as we might have overlap
     // overlap between the saved context ids for different last nodes, and
     // removed them already during the post order traversal.
-    set_intersect(SavedContextIds, FirstNode->getContextIds());
-    ContextNode *PrevNode = nullptr;
-    for (auto Id : Ids) {
+    set_intersect(SavedContextIds, LastNodeContextIds);
+    ContextNode *PrevNode = LastNode;
+    bool Skip = false;
+    // Iterate backwards through the stack Ids, starting after the last Id
+    // in the list, which was handled once outside for all Calls.
+    for (auto IdIter = Ids.rbegin() + 1; IdIter != Ids.rend(); IdIter++) {
+      auto Id = *IdIter;
       ContextNode *CurNode = getNodeForStackId(Id);
-      // We should only have kept stack ids that had nodes and weren't
-      // recursive.
+      // We should only have kept stack ids that had nodes.
       assert(CurNode);
       assert(!CurNode->Recursive);
-      if (!PrevNode) {
-        PrevNode = CurNode;
-        continue;
-      }
-      auto *Edge = CurNode->findEdgeFromCallee(PrevNode);
+
+      auto *Edge = CurNode->findEdgeFromCaller(PrevNode);
       if (!Edge) {
-        SavedContextIds.clear();
+        Skip = true;
         break;
       }
       PrevNode = CurNode;
+
+      // Update the context ids, which is the intersection of the ids along
+      // all edges in the sequence.
       set_intersect(SavedContextIds, Edge->getContextIds());
 
       // If we now have no context ids for clone, skip this call.
-      if (SavedContextIds.empty())
+      if (SavedContextIds.empty()) {
+        Skip = true;
         break;
+      }
     }
-    if (SavedContextIds.empty())
+    if (Skip)
       continue;
 
     // Create new context node.

>From 862301b47109b82bf7f5288fbc31da2b7249d531 Mon Sep 17 00:00:00 2001
From: Teresa Johnson <tejohnson at google.com>
Date: Tue, 24 Sep 2024 13:16:25 -0700
Subject: [PATCH 2/2] Replace comment that was truncated

---
 llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index 9f324a6739c9ba..49a180150da6bc 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -1416,7 +1416,8 @@ void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::
     for (auto IdIter = Ids.rbegin() + 1; IdIter != Ids.rend(); IdIter++) {
       auto Id = *IdIter;
       ContextNode *CurNode = getNodeForStackId(Id);
-      // We should only have kept stack ids that had nodes.
+      // We should only have kept stack ids that had nodes and weren't
+      // recursive.
       assert(CurNode);
       assert(!CurNode->Recursive);
 



More information about the llvm-commits mailing list