[llvm] 9a730d8 - [memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807)

via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 13 23:40:15 PST 2024


Author: Kazu Hirata
Date: 2024-11-13T23:40:12-08:00
New Revision: 9a730d878e96e2a992f337acc94f897d47c920e3

URL: https://github.com/llvm/llvm-project/commit/9a730d878e96e2a992f337acc94f897d47c920e3
DIFF: https://github.com/llvm/llvm-project/commit/9a730d878e96e2a992f337acc94f897d47c920e3.diff

LOG: [memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807)

Undrifting the MemProf profile requires two sets of information:

- caller-callee pairs from the profile
- callee-callee pairs from the IR

This patch adds a function to do the former.  The latter has been
addressed by extractCallsFromIR.

Unfortunately, the current MemProf format does not directly give us
the caller-callee pairs from the profile.  "struct Frame" just tells
us where the call site is -- Caller GUID and line/column numbers; it
doesn't tell us what function a given Frame is calling.  To extract
caller-callee pairs, we need to scan each call stack, look at two
adjacent Frames, and extract a caller-callee pair.

Conceptually, we would extract caller-callee pairs with:

  for each MemProfRecord in the profile:
    for each call stack in AllocSites:
      extract caller-callee pairs from adjacent pairs of Frames

However, this is highly inefficient.  Obtaining MemProfRecord involves
looking up the OnDiskHashTable, allocating several vectors on the
heap, and populating fields that are irrelevant to us, such as MIB and
CallSites.

This patch adds an efficient way of doing the above.  Specifically, we

- go though all IndexedMemProfRecords,
- look at each linear call stack ID
- extract caller-callee pairs from each call stack

The extraction is done by a new class CallerCalleePairExtractor,
modified from LinearCallStackIdConverter, which reconstructs a call
stack from the radix tree array.  For our purposes, we skip the
reconstruction and immediately populates the data structure for
caller-callee pairs.

The resulting caller-callee-pairs is of the type:

  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;

which can be passed directly to longestCommonSequence just like the
result of extractCallsFromIR.

Further performance optimizations are possible for the new functions
in this patch.  I'll address those in follow-up patches.

Added: 
    

Modified: 
    llvm/include/llvm/ProfileData/InstrProfReader.h
    llvm/include/llvm/ProfileData/MemProf.h
    llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
    llvm/lib/ProfileData/InstrProfReader.cpp
    llvm/unittests/ProfileData/InstrProfTest.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/ProfileData/InstrProfReader.h b/llvm/include/llvm/ProfileData/InstrProfReader.h
index 6be3fad41824a9..42414bc193bc84 100644
--- a/llvm/include/llvm/ProfileData/InstrProfReader.h
+++ b/llvm/include/llvm/ProfileData/InstrProfReader.h
@@ -695,6 +695,9 @@ class IndexedMemProfReader {
 
   Expected<memprof::MemProfRecord>
   getMemProfRecord(const uint64_t FuncNameHash) const;
+
+  DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
+  getMemProfCallerCalleePairs() const;
 };
 
 /// Reader for the indexed binary instrprof format.
@@ -793,6 +796,11 @@ class IndexedInstrProfReader : public InstrProfReader {
     return MemProfReader.getMemProfRecord(FuncNameHash);
   }
 
+  DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
+  getMemProfCallerCalleePairs() {
+    return MemProfReader.getMemProfCallerCalleePairs();
+  }
+
   /// Fill Counts with the profile data for the given function name.
   Error getFunctionCounts(StringRef FuncName, uint64_t FuncHash,
                           std::vector<uint64_t> &Counts);

diff  --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index da2cc807370095..ff05bb7da2f799 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -931,6 +931,83 @@ struct LinearCallStackIdConverter {
   }
 };
 
+struct LineLocation {
+  LineLocation(uint32_t L, uint32_t D) : LineOffset(L), Column(D) {}
+
+  bool operator<(const LineLocation &O) const {
+    return LineOffset < O.LineOffset ||
+           (LineOffset == O.LineOffset && Column < O.Column);
+  }
+
+  bool operator==(const LineLocation &O) const {
+    return LineOffset == O.LineOffset && Column == O.Column;
+  }
+
+  bool operator!=(const LineLocation &O) const {
+    return LineOffset != O.LineOffset || Column != O.Column;
+  }
+
+  uint64_t getHashCode() const { return ((uint64_t)Column << 32) | LineOffset; }
+
+  uint32_t LineOffset;
+  uint32_t Column;
+};
+
+// A pair of a call site location and its corresponding callee GUID.
+using CallEdgeTy = std::pair<LineLocation, uint64_t>;
+
+// Used to extract caller-callee pairs from the call stack array.  The leaf
+// frame is assumed to call a heap allocation function with GUID 0.  The
+// resulting pairs are accumulated in CallerCalleePairs.  Users can take it
+// with:
+//
+//   auto Pairs = std::move(Extractor.CallerCalleePairs);
+struct CallerCalleePairExtractor {
+  // The base address of the radix tree array.
+  const unsigned char *CallStackBase;
+  // A functor to convert a linear FrameId to a Frame.
+  std::function<Frame(LinearFrameId)> FrameIdToFrame;
+  // A map from caller GUIDs to lists of call sites in respective callers.
+  DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;
+
+  CallerCalleePairExtractor() = delete;
+  CallerCalleePairExtractor(const unsigned char *CallStackBase,
+                            std::function<Frame(LinearFrameId)> FrameIdToFrame)
+      : CallStackBase(CallStackBase), FrameIdToFrame(FrameIdToFrame) {}
+
+  void operator()(LinearCallStackId LinearCSId) {
+    const unsigned char *Ptr =
+        CallStackBase +
+        static_cast<uint64_t>(LinearCSId) * sizeof(LinearFrameId);
+    uint32_t NumFrames =
+        support::endian::readNext<uint32_t, llvm::endianness::little>(Ptr);
+    // The leaf frame calls a function with GUID 0.
+    uint64_t CalleeGUID = 0;
+    for (; NumFrames; --NumFrames) {
+      LinearFrameId Elem =
+          support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+      // Follow a pointer to the parent, if any.  See comments below on
+      // CallStackRadixTreeBuilder for the description of the radix tree format.
+      if (static_cast<std::make_signed_t<LinearFrameId>>(Elem) < 0) {
+        Ptr += (-Elem) * sizeof(LinearFrameId);
+        Elem =
+            support::endian::read<LinearFrameId, llvm::endianness::little>(Ptr);
+      }
+      // We shouldn't encounter another pointer.
+      assert(static_cast<std::make_signed_t<LinearFrameId>>(Elem) >= 0);
+
+      // Add a new caller-callee pair.
+      Frame F = FrameIdToFrame(Elem);
+      uint64_t CallerGUID = F.Function;
+      LineLocation Loc(F.LineOffset, F.Column);
+      CallerCalleePairs[CallerGUID].emplace_back(Loc, CalleeGUID);
+
+      Ptr += sizeof(LinearFrameId);
+      CalleeGUID = CallerGUID;
+    }
+  }
+};
+
 struct IndexedMemProfData {
   // A map to hold memprof data per function. The lower 64 bits obtained from
   // the md5 hash of the function name is used to index into the map.

diff  --git a/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h b/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
index 5177ac97cdfe37..a197a2687ed029 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/MemProfiler.h
@@ -14,6 +14,7 @@
 
 #include "llvm/ADT/IntrusiveRefCntPtr.h"
 #include "llvm/IR/PassManager.h"
+#include "llvm/ProfileData/MemProf.h"
 
 namespace llvm {
 class Function;
@@ -60,31 +61,6 @@ class MemProfUsePass : public PassInfoMixin<MemProfUsePass> {
 
 namespace memprof {
 
-struct LineLocation {
-  LineLocation(uint32_t L, uint32_t D) : LineOffset(L), Column(D) {}
-
-  bool operator<(const LineLocation &O) const {
-    return LineOffset < O.LineOffset ||
-           (LineOffset == O.LineOffset && Column < O.Column);
-  }
-
-  bool operator==(const LineLocation &O) const {
-    return LineOffset == O.LineOffset && Column == O.Column;
-  }
-
-  bool operator!=(const LineLocation &O) const {
-    return LineOffset != O.LineOffset || Column != O.Column;
-  }
-
-  uint64_t getHashCode() const { return ((uint64_t)Column << 32) | LineOffset; }
-
-  uint32_t LineOffset;
-  uint32_t Column;
-};
-
-// A pair of a call site location and its corresponding callee GUID.
-using CallEdgeTy = std::pair<LineLocation, uint64_t>;
-
 // Extract all calls from the IR.  Arrange them in a map from caller GUIDs to a
 // list of call sites, each of the form {LineLocation, CalleeGUID}.
 DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>>

diff  --git a/llvm/lib/ProfileData/InstrProfReader.cpp b/llvm/lib/ProfileData/InstrProfReader.cpp
index b90617c74f6d13..cae6ce5b824e62 100644
--- a/llvm/lib/ProfileData/InstrProfReader.cpp
+++ b/llvm/lib/ProfileData/InstrProfReader.cpp
@@ -1666,6 +1666,32 @@ IndexedMemProfReader::getMemProfRecord(const uint64_t FuncNameHash) const {
               memprof::MaximumSupportedVersion));
 }
 
+DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
+IndexedMemProfReader::getMemProfCallerCalleePairs() const {
+  assert(MemProfRecordTable);
+  assert(Version == memprof::Version3);
+
+  memprof::LinearFrameIdConverter FrameIdConv(FrameBase);
+  memprof::CallerCalleePairExtractor Extractor(CallStackBase, FrameIdConv);
+
+  for (const memprof::IndexedMemProfRecord &IndexedRecord :
+       MemProfRecordTable->data())
+    for (const memprof::IndexedAllocationInfo &IndexedAI :
+         IndexedRecord.AllocSites)
+      Extractor(IndexedAI.CSId);
+
+  DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>> Pairs =
+      std::move(Extractor.CallerCalleePairs);
+
+  // Sort each call list by the source location.
+  for (auto &[CallerGUID, CallList] : Pairs) {
+    llvm::sort(CallList);
+    CallList.erase(llvm::unique(CallList), CallList.end());
+  }
+
+  return Pairs;
+}
+
 Error IndexedInstrProfReader::getFunctionCounts(StringRef FuncName,
                                                 uint64_t FuncHash,
                                                 std::vector<uint64_t> &Counts) {

diff  --git a/llvm/unittests/ProfileData/InstrProfTest.cpp b/llvm/unittests/ProfileData/InstrProfTest.cpp
index 7fdfd15e7bc990..cf3cf7fb952738 100644
--- a/llvm/unittests/ProfileData/InstrProfTest.cpp
+++ b/llvm/unittests/ProfileData/InstrProfTest.cpp
@@ -580,6 +580,68 @@ TEST_F(InstrProfTest, test_memprof_v2_partial_schema) {
   EXPECT_THAT(WantRecord, EqualsRecord(Record));
 }
 
+TEST_F(InstrProfTest, test_caller_callee_pairs) {
+  const MemInfoBlock MIB = makePartialMIB();
+
+  Writer.setMemProfVersionRequested(memprof::Version3);
+  Writer.setMemProfFullSchema(false);
+
+  ASSERT_THAT_ERROR(Writer.mergeProfileKind(InstrProfKind::MemProf),
+                    Succeeded());
+
+  // Call Hierarchy
+  //
+  // Function GUID:0x123
+  //   Line: 1, Column: 2
+  //     Function GUID: 0x234
+  //       Line: 3, Column: 4
+  //         new(...)
+  //   Line: 5, Column: 6
+  //     Function GUID: 0x345
+  //       Line: 7, Column: 8
+  //         new(...)
+
+  const std::pair<memprof::FrameId, memprof::Frame> Frames[] = {
+      {0, {0x123, 1, 2, false}},
+      {1, {0x234, 3, 4, true}},
+      {2, {0x123, 5, 6, false}},
+      {3, {0x345, 7, 8, true}}};
+  for (const auto &[FrameId, Frame] : Frames)
+    Writer.addMemProfFrame(FrameId, Frame, Err);
+
+  const std::pair<memprof::CallStackId, SmallVector<memprof::FrameId>>
+      CallStacks[] = {{0x111, {1, 0}}, {0x222, {3, 2}}};
+  for (const auto &[CSId, CallStack] : CallStacks)
+    Writer.addMemProfCallStack(CSId, CallStack, Err);
+
+  const IndexedMemProfRecord IndexedMR = makeRecordV2(
+      /*AllocFrames=*/{0x111, 0x222},
+      /*CallSiteFrames=*/{}, MIB, memprof::getHotColdSchema());
+  Writer.addMemProfRecord(/*Id=*/0x9999, IndexedMR);
+
+  auto Profile = Writer.writeBuffer();
+  readProfile(std::move(Profile));
+
+  auto Pairs = Reader->getMemProfCallerCalleePairs();
+  ASSERT_THAT(Pairs, SizeIs(3));
+
+  auto It = Pairs.find(0x123);
+  ASSERT_NE(It, Pairs.end());
+  ASSERT_THAT(It->second, SizeIs(2));
+  EXPECT_THAT(It->second[0], testing::Pair(testing::FieldsAre(1U, 2U), 0x234U));
+  EXPECT_THAT(It->second[1], testing::Pair(testing::FieldsAre(5U, 6U), 0x345U));
+
+  It = Pairs.find(0x234);
+  ASSERT_NE(It, Pairs.end());
+  ASSERT_THAT(It->second, SizeIs(1));
+  EXPECT_THAT(It->second[0], testing::Pair(testing::FieldsAre(3U, 4U), 0U));
+
+  It = Pairs.find(0x345);
+  ASSERT_NE(It, Pairs.end());
+  ASSERT_THAT(It->second, SizeIs(1));
+  EXPECT_THAT(It->second[0], testing::Pair(testing::FieldsAre(7U, 8U), 0U));
+}
+
 TEST_F(InstrProfTest, test_memprof_getrecord_error) {
   ASSERT_THAT_ERROR(Writer.mergeProfileKind(InstrProfKind::MemProf),
                     Succeeded());


        


More information about the llvm-commits mailing list