[llvm] [llvm][AsmPrinter] Call graph section format. (PR #159866)
Prabhu Rajasekaran via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 22 14:49:09 PDT 2025
https://github.com/Prabhuk updated https://github.com/llvm/llvm-project/pull/159866
>From 3c805fd795c14c829012e922a3d3ac40462b67ac Mon Sep 17 00:00:00 2001
From: prabhukr <prabhukr at google.com>
Date: Fri, 19 Sep 2025 15:01:59 -0700
Subject: [PATCH] Callgraph section format changes.
---
llvm/docs/CallGraphSection.md | 29 +++++++
llvm/include/llvm/CodeGen/AsmPrinter.h | 6 +-
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp | 90 +++++++++++-----------
3 files changed, 76 insertions(+), 49 deletions(-)
create mode 100644 llvm/docs/CallGraphSection.md
diff --git a/llvm/docs/CallGraphSection.md b/llvm/docs/CallGraphSection.md
new file mode 100644
index 0000000000000..3e4f107572716
--- /dev/null
+++ b/llvm/docs/CallGraphSection.md
@@ -0,0 +1,29 @@
+# .callgraph Section Layout
+
+The `.callgraph` section is used to store call graph information for each function, which can be used for post-link analyses and optimizations. The section contains a series of records, with each record corresponding to a single function.
+
+For efficiency, we make a distinction between direct and indirect call data. For direct calls, we record the unique callees, not the location of each individual call. For indirect calls, we record the location of each call site and the type ID of the callee. Post link analysis scripts which utilize this information to reconstuct the program call graph can potentially receive more information regarding indirect callsites from the user to improve the precision of the call graph.
+
+## Per Function Record Layout
+
+Each record in the `.callgraph` section has the following binary layout:
+
+| Field | Type | Size (bits) | Description |
+| ---------------------------- | ------------- | ----------- | ------------------------------------------------------------------------------------------------------- |
+| Format Version | `uint32_t` | 32 | The version of the record format. The current version is 0. |
+| Function Entry PC | `uintptr_t` | 32/64 | The address of the function's entry point. |
+| Function Kind | `uint8_t` | 8 | An enum indicating the function's properties (e.g., if it's an indirect call target). |
+| Function Type ID | `uint64_t` | 64 | The type ID of the function. This field is **only** present if `Function Kind` is `INDIRECT_TARGET_KNOWN_TID`. |
+| Number of Indirect Callsites | `uint32_t` | 32 | The number of indirect call sites within the function. |
+| Indirect Callsites Array | `Callsite[]` | Variable | An array of `Callsite` records, with a length of `Number of Indirect Callsites`. |
+| Number of Unique Direct Callees | `uint32_t` | 32 | The number of unique direct call destinations from this function. |
+| Direct Callees Array | `uintptr_t[]` | Variable | An array of unique direct callee entry point addresses, with a length of `Number of Direct Callees`. |
+
+### Indirect Callsite Record Layout
+
+Each record in the `Indirect Callsites Array` has the following layout:
+
+| Field | Type | Size (bits) | Description |
+| ----------------- | ----------- | ----------- | ----------------------------------------- |
+| Type ID | `uint64_t` | 64 | The type ID of the indirect call target. |
+| Callsite PC | `uintptr_t` | 32/64 | The address of the indirect call site. |
diff --git a/llvm/include/llvm/CodeGen/AsmPrinter.h b/llvm/include/llvm/CodeGen/AsmPrinter.h
index 4c744a2c0a4d2..8266c91997f37 100644
--- a/llvm/include/llvm/CodeGen/AsmPrinter.h
+++ b/llvm/include/llvm/CodeGen/AsmPrinter.h
@@ -200,13 +200,13 @@ class LLVM_ABI AsmPrinter : public MachineFunctionPass {
/// Map type identifiers to callsite labels. Labels are generated for each
/// indirect callsite in the function.
- SmallVector<std::pair<CGTypeId, MCSymbol *>> CallSiteLabels;
+ SmallVector<std::pair<CGTypeId, MCSymbol *>> IndirectCallsites;
SmallSet<MCSymbol *, 4> DirectCallees;
};
/// Enumeration of function kinds, and their mapping to function kind values
/// stored in callgraph section entries.
- enum class FunctionKind : uint64_t {
+ enum class FunctionKind : uint8_t {
/// Function cannot be target to indirect calls.
NOT_INDIRECT_TARGET = 0,
@@ -217,7 +217,7 @@ class LLVM_ABI AsmPrinter : public MachineFunctionPass {
INDIRECT_TARGET_KNOWN_TID = 2,
};
- enum CallGraphSectionFormatVersion : uint64_t {
+ enum CallGraphSectionFormatVersion : uint32_t {
V_0 = 0,
};
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 701a6a2f0f7a0..a78c9829b40d6 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -1685,58 +1685,56 @@ void AsmPrinter::emitCallGraphSection(const MachineFunction &MF,
OutStreamer->pushSection();
OutStreamer->switchSection(FuncCGSection);
- // Emit format version number.
- OutStreamer->emitInt64(CallGraphSectionFormatVersion::V_0);
-
- // Emit function's self information, which is composed of:
- // 1) FunctionEntryPc
- // 2) FunctionKind: Whether the function is indirect target, and if so,
- // whether its type id is known.
- // 3) FunctionTypeId: Emit only when the function is an indirect target
- // and its type id is known.
-
- // Emit function entry pc.
- const MCSymbol *FunctionSymbol = getFunctionBegin();
- OutStreamer->emitSymbolValue(FunctionSymbol, TM.getProgramPointerSize());
-
- // If this function has external linkage or has its address taken and
- // it is not a callback, then anything could call it.
- const Function &F = MF.getFunction();
- bool IsIndirectTarget =
- !F.hasLocalLinkage() || F.hasAddressTaken(nullptr,
- /*IgnoreCallbackUses=*/true,
- /*IgnoreAssumeLikeCalls=*/true,
- /*IgnoreLLVMUsed=*/false);
-
- // FIXME: FunctionKind takes a few values but emitted as a 64-bit value.
- // Can be optimized to occupy 2 bits instead.
- // Emit function kind, and type id if available.
- if (!IsIndirectTarget) {
- OutStreamer->emitInt64(
- static_cast<uint64_t>(FunctionKind::NOT_INDIRECT_TARGET));
- } else {
+ auto EmitFunctionKindAndTypeId = [&]() {
+ const Function &F = MF.getFunction();
+ // If this function has external linkage or has its address taken and
+ // it is not a callback, then anything could call it.
+ bool IsIndirectTarget = !F.hasLocalLinkage() ||
+ F.hasAddressTaken(nullptr,
+ /*IgnoreCallbackUses=*/true,
+ /*IgnoreAssumeLikeCalls=*/true,
+ /*IgnoreLLVMUsed=*/false);
+ if (!IsIndirectTarget) {
+ OutStreamer->emitInt8(
+ static_cast<uint8_t>(FunctionKind::NOT_INDIRECT_TARGET));
+ return;
+ }
if (const auto *TypeId = extractNumericCGTypeId(F)) {
- OutStreamer->emitInt64(
- static_cast<uint64_t>(FunctionKind::INDIRECT_TARGET_KNOWN_TID));
+ OutStreamer->emitInt8(
+ static_cast<uint8_t>(FunctionKind::INDIRECT_TARGET_KNOWN_TID));
OutStreamer->emitInt64(TypeId->getZExtValue());
- } else {
- OutStreamer->emitInt64(
- static_cast<uint64_t>(FunctionKind::INDIRECT_TARGET_UNKNOWN_TID));
+ return;
}
- }
+ OutStreamer->emitInt8(
+ static_cast<uint8_t>(FunctionKind::INDIRECT_TARGET_UNKNOWN_TID));
+ };
- // Emit callsite labels, where each element is a pair of type id and
- // indirect callsite pc.
- const auto &CallSiteLabels = FuncCGInfo.CallSiteLabels;
- OutStreamer->emitInt64(CallSiteLabels.size());
- for (const auto &[TypeId, Label] : CallSiteLabels) {
+ // Emit function's call graph information.
+ // 1) CallGraphSectionFormatVersion
+ // 2) Function entry PC.
+ // 3) FunctionKind: Whether the function is indirect target, and if so,
+ // whether its type id is known.
+ // 4) FunctionTypeID if the function is indirect target, and its type id is
+ // known.
+ // 5) Number of indirect callsites.
+ // 6) For each indirect callsite, its
+ // callsite PC and callee's expected type id.
+ // 7) Number of unique direct callees.
+ // 8) For each unique direct callee, the callee's PC.
+
+ OutStreamer->emitInt32(CallGraphSectionFormatVersion::V_0);
+ const MCSymbol *FunctionSymbol = getFunctionBegin();
+ OutStreamer->emitSymbolValue(FunctionSymbol, TM.getProgramPointerSize());
+ EmitFunctionKindAndTypeId();
+ const auto &IndirectCallsites = FuncCGInfo.IndirectCallsites;
+ OutStreamer->emitInt32(IndirectCallsites.size());
+ const auto &DirectCallees = FuncCGInfo.DirectCallees;
+ for (const auto &[TypeId, Label] : IndirectCallsites) {
OutStreamer->emitInt64(TypeId);
OutStreamer->emitSymbolValue(Label, TM.getProgramPointerSize());
}
- FuncCGInfo.CallSiteLabels.clear();
-
- const auto &DirectCallees = FuncCGInfo.DirectCallees;
- OutStreamer->emitInt64(DirectCallees.size());
+ FuncCGInfo.IndirectCallsites.clear();
+ OutStreamer->emitInt32(DirectCallees.size());
for (const auto &CalleeSymbol : DirectCallees) {
OutStreamer->emitSymbolValue(CalleeSymbol, TM.getProgramPointerSize());
}
@@ -1908,7 +1906,7 @@ void AsmPrinter::handleCallsiteForCallgraph(
MCSymbol *S = MF->getContext().createTempSymbol();
OutStreamer->emitLabel(S);
uint64_t CalleeTypeIdVal = CalleeTypeId->getZExtValue();
- FuncCGInfo.CallSiteLabels.emplace_back(CalleeTypeIdVal, S);
+ FuncCGInfo.IndirectCallsites.emplace_back(CalleeTypeIdVal, S);
}
}
More information about the llvm-commits
mailing list