[llvm] [memprof] Speed up caller-callee pair extraction (PR #116184)

Kazu Hirata via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 14 12:59:52 PST 2024


https://github.com/kazutakahirata updated https://github.com/llvm/llvm-project/pull/116184

>From 97e78026bfbccf125b2b558827f18b8aa55ac709 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Wed, 13 Nov 2024 22:54:18 -0800
Subject: [PATCH 1/4] [memprof] Speed up caller-callee pair extraction

We know that the MemProf profile has a lot of duplicate call stacks.
Extracting caller-callee pairs from a call stack we've seen before is
a wasteful effort.

This patch makes the extraction more efficient by first coming up with
a work list of linear call stack IDs -- the set of starting positions
in the radix tree array -- and then extract caller-callee pairs from
each call stack in the work list.

We implement the work list as a bit vector because we expect the work
list to be dense in the range [0, RadixTreeSize).  Also, we want the
set insertion to be cheap.

Without this patch, it takes 25 seconds to extract caller-callee pairs
from a large MemProf profile.  This patch shortenes that down to 4
seconds.
---
 llvm/include/llvm/ProfileData/InstrProfReader.h |  2 ++
 llvm/lib/ProfileData/InstrProfReader.cpp        | 17 ++++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/llvm/include/llvm/ProfileData/InstrProfReader.h b/llvm/include/llvm/ProfileData/InstrProfReader.h
index 42414bc193bc84..1930cc3f5c2c30 100644
--- a/llvm/include/llvm/ProfileData/InstrProfReader.h
+++ b/llvm/include/llvm/ProfileData/InstrProfReader.h
@@ -683,6 +683,8 @@ class IndexedMemProfReader {
   const unsigned char *FrameBase = nullptr;
   /// The starting address of the call stack array.
   const unsigned char *CallStackBase = nullptr;
+  // The number of elements in the radix tree array.
+  unsigned RadixTreeSize = 0;
 
   Error deserializeV012(const unsigned char *Start, const unsigned char *Ptr,
                         uint64_t FirstWord);
diff --git a/llvm/lib/ProfileData/InstrProfReader.cpp b/llvm/lib/ProfileData/InstrProfReader.cpp
index cae6ce5b824e62..88148962dbc390 100644
--- a/llvm/lib/ProfileData/InstrProfReader.cpp
+++ b/llvm/lib/ProfileData/InstrProfReader.cpp
@@ -1303,6 +1303,10 @@ Error IndexedMemProfReader::deserializeV3(const unsigned char *Start,
   FrameBase = Ptr;
   CallStackBase = Start + CallStackPayloadOffset;
 
+  // Compute the number of elements in the radix tree array.
+  RadixTreeSize = (RecordPayloadOffset - CallStackPayloadOffset) /
+                  sizeof(memprof::LinearFrameId);
+
   // Now initialize the table reader with a pointer into data buffer.
   MemProfRecordTable.reset(MemProfRecordHashTable::Create(
       /*Buckets=*/Start + RecordTableOffset,
@@ -1674,11 +1678,22 @@ IndexedMemProfReader::getMemProfCallerCalleePairs() const {
   memprof::LinearFrameIdConverter FrameIdConv(FrameBase);
   memprof::CallerCalleePairExtractor Extractor(CallStackBase, FrameIdConv);
 
+  // The set of linear call stack IDs that we need to traverse from.  We expect
+  // the set to be dense, so we use a BitVector.
+  BitVector Worklist(RadixTreeSize);
+
+  // Collect the set of linear call stack IDs.  Since we expect a lot of
+  // duplicates, we first collect them in the form of a bit vector before
+  // processing them.
   for (const memprof::IndexedMemProfRecord &IndexedRecord :
        MemProfRecordTable->data())
     for (const memprof::IndexedAllocationInfo &IndexedAI :
          IndexedRecord.AllocSites)
-      Extractor(IndexedAI.CSId);
+      Worklist.set(IndexedAI.CSId);
+
+  // Collect caller-callee pairs for each linear call stack ID in Worklist.
+  for (unsigned CS : Worklist.set_bits())
+    Extractor(CS);
 
   DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>> Pairs =
       std::move(Extractor.CallerCalleePairs);

>From 384d08205808350d8a470f710491d415f616a135 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Thu, 14 Nov 2024 09:00:35 -0800
Subject: [PATCH 2/4] Revise a comment.

---
 llvm/lib/ProfileData/InstrProfReader.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/ProfileData/InstrProfReader.cpp b/llvm/lib/ProfileData/InstrProfReader.cpp
index 88148962dbc390..54a7dea59b1aea 100644
--- a/llvm/lib/ProfileData/InstrProfReader.cpp
+++ b/llvm/lib/ProfileData/InstrProfReader.cpp
@@ -1303,7 +1303,9 @@ Error IndexedMemProfReader::deserializeV3(const unsigned char *Start,
   FrameBase = Ptr;
   CallStackBase = Start + CallStackPayloadOffset;
 
-  // Compute the number of elements in the radix tree array.
+  // Compute the number of elements in the radix tree array.  Since we use this
+  // to reserve enough bits in a BitVector, it's totally OK if we overestimate
+  // this number a little bit because of padding just before the next section.
   RadixTreeSize = (RecordPayloadOffset - CallStackPayloadOffset) /
                   sizeof(memprof::LinearFrameId);
 

>From 425d2a85bf4ede58d497202411492ffd7bde9c03 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Thu, 14 Nov 2024 12:37:50 -0800
Subject: [PATCH 3/4] Add an assert on the generator side.

---
 llvm/lib/ProfileData/InstrProfWriter.cpp | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/ProfileData/InstrProfWriter.cpp b/llvm/lib/ProfileData/InstrProfWriter.cpp
index 0ab9f942a08589..9f531c72acdabf 100644
--- a/llvm/lib/ProfileData/InstrProfWriter.cpp
+++ b/llvm/lib/ProfileData/InstrProfWriter.cpp
@@ -601,7 +601,8 @@ writeMemProfCallStackArray(
         &MemProfCallStackData,
     llvm::DenseMap<memprof::FrameId, memprof::LinearFrameId>
         &MemProfFrameIndexes,
-    llvm::DenseMap<memprof::FrameId, memprof::FrameStat> &FrameHistogram) {
+    llvm::DenseMap<memprof::FrameId, memprof::FrameStat> &FrameHistogram,
+    unsigned &NumElements) {
   llvm::DenseMap<memprof::CallStackId, memprof::LinearCallStackId>
       MemProfCallStackIndexes;
 
@@ -610,6 +611,7 @@ writeMemProfCallStackArray(
                 FrameHistogram);
   for (auto I : Builder.getRadixArray())
     OS.write32(I);
+  NumElements = Builder.getRadixArray().size();
   MemProfCallStackIndexes = Builder.takeCallStackPos();
 
   // Release the memory of this vector as it is no longer needed.
@@ -771,15 +773,26 @@ static Error writeMemProfV3(ProfOStream &OS,
       writeMemProfFrameArray(OS, MemProfData.Frames, FrameHistogram);
 
   uint64_t CallStackPayloadOffset = OS.tell();
+  // The number of elements in the call stack array.
+  unsigned NumElements;
   llvm::DenseMap<memprof::CallStackId, memprof::LinearCallStackId>
-      MemProfCallStackIndexes = writeMemProfCallStackArray(
-          OS, MemProfData.CallStacks, MemProfFrameIndexes, FrameHistogram);
+      MemProfCallStackIndexes =
+          writeMemProfCallStackArray(OS, MemProfData.CallStacks,
+                                     MemProfFrameIndexes, FrameHistogram,
+                                     NumElements);
 
   uint64_t RecordPayloadOffset = OS.tell();
   uint64_t RecordTableOffset =
       writeMemProfRecords(OS, MemProfData.Records, &Schema, memprof::Version3,
                           &MemProfCallStackIndexes);
 
+  // IndexedMemProfReader::deserializeV3 computes the number of elements in the
+  // call stack array from the difference between CallStackPayloadOffset and
+  // RecordPayloadOffset.  Verify that the computation works.
+  assert(CallStackPayloadOffset +
+             NumElements * sizeof(memprof::LinearFrameId) ==
+         RecordPayloadOffset);
+
   uint64_t Header[] = {
       CallStackPayloadOffset,
       RecordPayloadOffset,

>From b3ba21c6e123a0c0f016516d4b8bac8a0414ec23 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Thu, 14 Nov 2024 12:57:28 -0800
Subject: [PATCH 4/4] Initialize NumElements to 0.

---
 llvm/lib/ProfileData/InstrProfWriter.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/ProfileData/InstrProfWriter.cpp b/llvm/lib/ProfileData/InstrProfWriter.cpp
index 9f531c72acdabf..456014741d93f7 100644
--- a/llvm/lib/ProfileData/InstrProfWriter.cpp
+++ b/llvm/lib/ProfileData/InstrProfWriter.cpp
@@ -774,7 +774,7 @@ static Error writeMemProfV3(ProfOStream &OS,
 
   uint64_t CallStackPayloadOffset = OS.tell();
   // The number of elements in the call stack array.
-  unsigned NumElements;
+  unsigned NumElements = 0;
   llvm::DenseMap<memprof::CallStackId, memprof::LinearCallStackId>
       MemProfCallStackIndexes =
           writeMemProfCallStackArray(OS, MemProfData.CallStacks,



More information about the llvm-commits mailing list