[llvm] Add documentation for MemProf. (PR #172238)
Snehasish Kumar via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 24 00:33:47 PST 2025
https://github.com/snehasish updated https://github.com/llvm/llvm-project/pull/172238
>From 6268f6525e11bbb705bcf0b28ece041b8ced6d43 Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Sun, 14 Dec 2025 23:58:08 +0000
Subject: [PATCH 01/15] Add documentation for MemProf.
Generated with the help of Gemini CLI, commands validated with a local
build of LLVM from head and tcmalloc.
---
llvm/docs/MemProf.rst | 276 +++++++++++++++++++++++++++++++++++++++
llvm/docs/UserGuides.rst | 5 +
2 files changed, 281 insertions(+)
create mode 100644 llvm/docs/MemProf.rst
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
new file mode 100644
index 0000000000000..c33163d9d8fc3
--- /dev/null
+++ b/llvm/docs/MemProf.rst
@@ -0,0 +1,276 @@
+====================================
+MemProf: Memory Profiling for LLVM
+====================================
+
+.. contents::
+ :local:
+ :depth: 2
+
+Introduction
+============
+
+MemProf is a profile-guided optimization (PGO) feature for memory. It enables the compiler to optimize memory allocations and static data layout based on runtime profiling information. By understanding the "hotness", lifetime, and access frequency of memory, the compiler can make better decisions about where to place data (both heap and static), reducing fragmentation and improving cache locality.
+
+Motivation
+----------
+
+Traditional PGO focuses on control flow (hot vs. cold code). MemProf extends this concept to data. It answers questions like:
+
+* Which allocation sites are "hot" (frequently accessed)?
+* Which allocation sites are "cold" (rarely accessed)?
+* What is the lifetime of an allocation?
+
+This information enables optimizations such as:
+
+* **Heap Layout Optimization:** Grouping objects with similar lifetimes or access density.
+* **Static Data Partitioning:** Segregating frequently accessed (hot) global variables and constants from rarely accessed (cold) ones to improve data locality and TLB utilization.
+
+User Manual
+===========
+
+This section describes how to use MemProf to profile and optimize your application.
+
+Building with MemProf
+---------------------
+
+To enable MemProf instrumentation, compile your application with the ``-fmemory-profile`` flag. Make sure to include debug information (``-gmlt`` and ``-fdebug-info-for-profiling``) and frame pointers to ensure accurate stack traces and line number reporting.
+
+.. code-block:: bash
+
+ clang++ -fmemory-profile -fdebug-info-for-profiling -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -gmlt -O2 source.cpp -o app
+
+.. note::
+ Link with ``-fmemory-profile`` as well to link the necessary runtime libraries. If you use a separate link step, ensure the flag is passed to the linker.
+
+Running and Generating Profiles
+-------------------------------
+
+Run the instrumented application. By default, MemProf writes a raw profile file named ``memprof.profraw.<pid>`` to the current directory upon exit.
+
+.. code-block:: bash
+
+ ./app
+
+Control the runtime behavior using the ``MEMPROF_OPTIONS`` environment variable. Common options include:
+
+* ``log_path``: Redirects runtime logs (e.g., ``stdout``, ``stderr``, or a file path).
+* ``print_text``: If set to ``true``, prints a text-based summary of the profile to the log path.
+* ``verbosity``: Controls the level of debug output.
+
+**Example:**
+
+.. code-block:: bash
+
+ MEMPROF_OPTIONS=log_path=stdout:print_text=true ./app
+
+Processing Profiles
+-------------------
+
+Raw profiles must be indexed before the compiler can use them. Use ``llvm-profdata`` to merge and index the raw profiles.
+
+.. code-block:: bash
+
+ llvm-profdata merge memprof.profraw.* --profiled-binary ./app -o memprof.memprofdata
+
+To dump the profile in YAML format (useful for debugging or creating test cases):
+
+.. code-block:: bash
+
+ llvm-profdata show --memory memprof.memprofdata > memprof.yaml
+
+Merge MemProf profiles with standard PGO instrumentation profiles if you have both.
+
+Using Profiles for Optimization
+-------------------------------
+
+Feed the indexed profile back into the compiler using the ``-fmemory-profile-use=`` option (or low-level passes options).
+
+.. code-block:: bash
+
+ clang++ -fmemory-profile-use=memprof.memprofdata -O2 source.cpp -o optimized_app -ltcmalloc
+
+If invoking the optimizer directly via ``opt``:
+
+.. code-block:: bash
+
+ opt -passes='memprof-use<profile-filename=memprof.memprofdata>' ...
+
+The compiler uses the profile data to annotate allocation instructions with metadata (e.g., ``!memprof``), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations.
+
+.. note::
+ For the optimized binary to utilize the hot/cold hinting, it must be linked with an allocator that supports this mechanism, such as `tcmalloc <https://github.com/google/tcmalloc>`_. TCMalloc extends operator new that accepts a hint (0 for cold, 255 for hot) to guide data placement and improve locality.
+
+Context Disambiguation (LTO/ThinLTO)
+------------------------------------
+
+To fully benefit from MemProf, especially for common allocation wrappers, enabling **ThinLTO** (preferred) or LTO is required. This allows the compiler to perform **context disambiguation**.
+
+Consider the following example:
+
+.. code-block:: cpp
+
+ void *allocate() { return new char[10]; }
+
+ void hot_path() {
+ // This path is executed frequently.
+ allocate();
+ }
+
+ void cold_path() {
+ // This path is executed rarely.
+ allocate();
+ }
+
+Without context disambiguation, the compiler sees a single ``allocate`` function called from both hot and cold contexts. It must conservatively assume the allocation is "not cold" or "ambiguous".
+
+With ThinLTO and MemProf:
+1. The compiler constructs a whole-program call graph.
+2. It identifies that ``allocate`` has distinct calling contexts with different behaviors.
+3. It **clones** ``allocate`` into two versions: one for the hot path and one for the cold path.
+4. The call in ``cold_path`` is updated to call the cloned "cold" version of ``allocate``, which can then be optimized (e.g., by passing a cold hint to the allocator).
+
+Static Data Partitioning
+------------------------
+
+MemProf profiles guide the layout of static data (e.g., global variables, constants). The goal is to separate "hot" data from "cold" data in the binary, placing hot data into specific sections (e.g., ``.rodata.hot``) to minimize the number of pages required for the working set.
+
+This feature uses a hybrid approach:
+
+1. **Symbolizable Data:** Data with external or local linkage (tracked by the symbol table) is partitioned based on data access profiles collected via instrumentation (`draft <https://github.com/llvm/llvm-project/pull/142884>`_) or hardware performance counters (e.g., Intel PEBS events such as ``MEM_INST_RETIRED.ALL_LOADS``).
+2. **Module-Internal Data:** Data not tracked by the symbol table (e.g., jump tables, constant pools, internal globals) has its hotness inferred from standard PGO code execution profiles.
+
+To enable this feature, pass the following flags to the compiler:
+
+* ``-memprof-annotate-static-data-prefix``: Enables annotation of global variables in IR.
+* ``-split-static-data``: Enables partitioning of other data (like jump tables) in the backend.
+* ``-Wl,-z,keep-data-section-prefix``: Instructs the linker (LLD) to group hot and cold data sections together.
+
+.. code-block:: bash
+
+ clang++ -fmemory-profile-use=memprof.memprofdata -mllvm -memprof-annotate-static-data-prefix -mllvm -split-static-data -fuse-ld=lld -Wl,-z,keep-data-section-prefix -O2 source.cpp -o optimized_app
+
+The optimized layout clusters hot static data, improving dTLB and cache efficiency.
+
+Developer Manual
+================
+
+This section provides an overview of the MemProf architecture and implementation for contributors.
+
+Architecture Overview
+---------------------
+
+MemProf consists of three main components:
+
+1. **Instrumentation Pass (Compile-time):** Injects code to record memory allocations and accesses.
+2. **Runtime Library (Link-time/Run-time):** Manages shadow memory and tracks allocation contexts and access statistics.
+3. **Profile Analysis (Post-processing/Compile-time):** Tools and passes that read the profile, annotate the IR, and perform advanced optimizations like context disambiguation for ThinLTO.
+
+Detailed Workflow (ThinLTO)
+---------------------------
+
+The optimization process, particularly context disambiguation, involves several steps during the ThinLTO pipeline:
+
+1. **Metadata Serialization:** During the ThinLTO summary analysis step, MemProf metadata (including MIBs and CallStacks) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
+2. **Whole Program Graph Construction:** During the ThinLTO indexing step, the compiler constructs a whole-program ``CallingContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
+3. **Auxiliary Graph & Cloning Decisions:** An auxiliary graph is constructed to guide the cloning process. The analysis identifies which functions and callsites need to be cloned to isolate cold allocation paths from hot ones.
+4. **ThinLTO Backend:** The actual cloning of functions and replacement of allocation calls (e.g., ``operator new``) happens in the ThinLTO backend passes. These transformations are guided by the decisions made during the indexing step.
+
+Source Structure
+----------------
+
+* **Runtime:** ``compiler-rt/lib/memprof``
+ * Contains the runtime implementation, including shadow memory mapping, interceptors (malloc, free, etc.), and the thread-local storage for recording stats.
+* **Instrumentation:** ``llvm/lib/Transforms/Instrumentation/MemProfInstrumentation.cpp``
+ * Implements the LLVM IR pass that adds instrumentation calls.
+* **Profile Data:** ``llvm/include/llvm/ProfileData/MemProf.h`` and ``MemProfData.inc``
+ * Defines the profile format, data structures (like ``MemInfoBlock``), and serialization logic.
+* **Use Pass:** ``llvm/lib/Transforms/Instrumentation/MemProfUse.cpp``
+ * Reads the profile and annotates the IR with metadata.
+* **Context Disambiguation:** ``llvm/lib/transforms/ipo/MemProfContextDisambiguation.cpp``
+ * Implements the analysis and transformations (e.g., cloning) for resolving ambiguous allocation contexts, particularly during ThinLTO.
+
+Runtime Implementation
+----------------------
+
+The runtime uses a **shadow memory** scheme similar to AddressSanitizer (ASan) but optimized for profiling.
+* **Shadow Mapping:** Application memory is mapped to shadow memory.
+* **Granularity:** The default granularity is 64 bytes. One byte of shadow memory tracks the access state of 64 bytes of application memory.
+* **MemInfoBlock (MIB):** A key data structure that stores statistics for an allocation context, including:
+ * ``AllocCount``
+ * ``TotalAccessCount``
+ * ``TotalLifetime``
+ * ``Min/MaxAccessDensity``
+
+Profile Format
+--------------
+
+The MemProf profile is a schema-based binary format designed for extensibility. Key structures include:
+
+* **Frame:** Represents a function in the call stack (Function GUID, Line, Column).
+* **CallStack:** A sequence of Frames identifying the context of an allocation.
+* **MemInfoBlock:** The statistics gathered for a specific CallStack.
+
+The format supports versioning to allow adding new fields to the MIB without breaking backward compatibility.
+
+Static Data Profile
+~~~~~~~~~~~~~~~~~~~
+
+To support static data partitioning, the profile format includes a payload for symbolized data access profiles. This maps data addresses to canonical symbol names (or module source location for internal data) and access counts. This enables the compiler to identify which global variables are hot.
+
+Testing
+-------
+
+When making changes to MemProf, verify your changes using the following test suites:
+
+1. **Runtime Tests:**
+ * Location: ``compiler-rt/test/memprof``
+ * Purpose: Verify the runtime instrumentation, shadow memory behavior, and profile generation.
+
+2. **Profile Manipulation Tests:**
+ * Location: ``llvm/test/tools/llvm-profdata``
+ * Purpose: Verify that ``llvm-profdata`` can correctly merge, show, and handle MemProf profiles.
+
+3. **Instrumentation & Optimization Tests:**
+ * Location: ``llvm/test/Transforms/PGOProfile``
+ * Purpose: Verify the correctness of the ``MemProfUse`` pass, metadata annotation, and IR transformations.
+
+4. **ThinLTO & Context Disambiguation Tests:**
+ * Location: ``llvm/test/ThinLTO/X86``
+ * Purpose: Verify context disambiguation, cloning, and summary analysis during ThinLTO.
+
+Testing with YAML Profiles
+--------------------------
+
+You can create MemProf profiles in YAML format for testing purposes. This is useful for creating small, self-contained test cases without needing to run a binary.
+
+1. **Create a YAML Profile:** You can start by dumping a real profile to YAML (see :ref:`Processing Profiles` above) or writing one from scratch.
+2. **Convert to Indexed Format:** Use ``llvm-profdata`` to convert the YAML to the indexed MemProf format.
+
+ .. code-block:: bash
+
+ llvm-profdata merge --memprof-version=4 profile.yaml -o profile.memprofdata
+
+3. **Run the Compiler:** Use the indexed profile with ``opt`` or ``clang``.
+
+ .. code-block:: bash
+
+ opt -passes='memprof-use<profile-filename=profile.memprofdata>' test.ll -S
+
+**Example YAML Profile:**
+
+.. code-block:: yaml
+
+ ---
+ HeapProfileRecords:
+ - GUID: _Z3foov
+ AllocSites:
+ - Callstack:
+ - { Function: _Z3foov, LineOffset: 0, Column: 22, IsInlineFrame: false }
+ - { Function: main, LineOffset: 2, Column: 5, IsInlineFrame: false }
+ MemInfoBlock:
+ TotalSize: 400
+ AllocCount: 1
+ TotalLifetimeAccessDensity: 1
+ TotalLifetime: 1000000
+ CallSites: []
+ ...
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index d3ca2f69016c1..5a5b73836e2ec 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -54,6 +54,7 @@ intermediate LLVM representation.
LoopTerminology
MarkdownQuickstartTemplate
MemorySSA
+ MemProf
MergeFunctions
MCJITDesignAndImplementation
MisExpect
@@ -155,6 +156,10 @@ Optimizations
:doc:`MemorySSA`
Information about the MemorySSA utility in LLVM, as well as how to use it.
+:doc:`MemProf`
+ User guide and internals of MemProf, a profile guided optimization for
+ memory.
+
:doc:`LoopTerminology`
A document describing Loops and associated terms as used in LLVM.
>From 6d62afdbbeee3bbf664a2e1dc2f10f7e148779e3 Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Mon, 15 Dec 2025 06:19:52 +0000
Subject: [PATCH 02/15] Fix indentation.
---
llvm/docs/MemProf.rst | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index c33163d9d8fc3..142491d1861c6 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -196,10 +196,10 @@ The runtime uses a **shadow memory** scheme similar to AddressSanitizer (ASan) b
* **Shadow Mapping:** Application memory is mapped to shadow memory.
* **Granularity:** The default granularity is 64 bytes. One byte of shadow memory tracks the access state of 64 bytes of application memory.
* **MemInfoBlock (MIB):** A key data structure that stores statistics for an allocation context, including:
- * ``AllocCount``
- * ``TotalAccessCount``
- * ``TotalLifetime``
- * ``Min/MaxAccessDensity``
+ * ``AllocCount``
+ * ``TotalAccessCount``
+ * ``TotalLifetime``
+ * ``Min/MaxAccessDensity``
Profile Format
--------------
>From 1869b3fb0a8c4a8489b1127acee47d8d4afa0d34 Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Mon, 15 Dec 2025 11:59:09 +0000
Subject: [PATCH 03/15] Actually fix formatting and warning.
---
llvm/docs/MemProf.rst | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index 142491d1861c6..c3f016482f078 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -63,6 +63,8 @@ Control the runtime behavior using the ``MEMPROF_OPTIONS`` environment variable.
MEMPROF_OPTIONS=log_path=stdout:print_text=true ./app
+.. _Processing Profiles:
+
Processing Profiles
-------------------
@@ -195,11 +197,7 @@ Runtime Implementation
The runtime uses a **shadow memory** scheme similar to AddressSanitizer (ASan) but optimized for profiling.
* **Shadow Mapping:** Application memory is mapped to shadow memory.
* **Granularity:** The default granularity is 64 bytes. One byte of shadow memory tracks the access state of 64 bytes of application memory.
-* **MemInfoBlock (MIB):** A key data structure that stores statistics for an allocation context, including:
- * ``AllocCount``
- * ``TotalAccessCount``
- * ``TotalLifetime``
- * ``Min/MaxAccessDensity``
+* **MemInfoBlock (MIB):** A key data structure that stores statistics for an allocation context, including: ``AllocCount``, ``TotalAccessCount``, ``TotalLifetime``, and ``Min/MaxAccessDensity``.
Profile Format
--------------
>From 732027d578ff75bd64fb59e2bee85e2c5f1825ef Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Sat, 20 Dec 2025 06:38:49 +0000
Subject: [PATCH 04/15] Address comments.
---
llvm/docs/MemProf.rst | 67 +++++++++++++++++++++++++++++--------------
1 file changed, 46 insertions(+), 21 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index c3f016482f078..8a540f799ec6a 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -22,7 +22,7 @@ Traditional PGO focuses on control flow (hot vs. cold code). MemProf extends thi
This information enables optimizations such as:
-* **Heap Layout Optimization:** Grouping objects with similar lifetimes or access density.
+* **Heap Layout Optimization:** Grouping objects with similar lifetimes or access density. This currently requires an allocator that supports the necessary interfaces (e.g., tcmalloc).
* **Static Data Partitioning:** Segregating frequently accessed (hot) global variables and constants from rarely accessed (cold) ones to improve data locality and TLB utilization.
User Manual
@@ -30,14 +30,14 @@ User Manual
This section describes how to use MemProf to profile and optimize your application.
-Building with MemProf
----------------------
+Building with MemProf Instrumentation
+-------------------------------------
To enable MemProf instrumentation, compile your application with the ``-fmemory-profile`` flag. Make sure to include debug information (``-gmlt`` and ``-fdebug-info-for-profiling``) and frame pointers to ensure accurate stack traces and line number reporting.
.. code-block:: bash
- clang++ -fmemory-profile -fdebug-info-for-profiling -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -gmlt -O2 source.cpp -o app
+ clang++ -fmemory-profile -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fno-optimize-sibling-calls -fdebug-info-for-profiling -gmlt -O2 source.cpp -o app
.. note::
Link with ``-fmemory-profile`` as well to link the necessary runtime libraries. If you use a separate link step, ensure the flag is passed to the linker.
@@ -80,7 +80,7 @@ To dump the profile in YAML format (useful for debugging or creating test cases)
llvm-profdata show --memory memprof.memprofdata > memprof.yaml
-Merge MemProf profiles with standard PGO instrumentation profiles if you have both.
+Merge MemProf profiles with standard PGO instrumentation profiles if you have both (optional).
Using Profiles for Optimization
-------------------------------
@@ -89,7 +89,7 @@ Feed the indexed profile back into the compiler using the ``-fmemory-profile-use
.. code-block:: bash
- clang++ -fmemory-profile-use=memprof.memprofdata -O2 source.cpp -o optimized_app -ltcmalloc
+ clang++ -fmemory-profile-use=memprof.memprofdata -O2 -Wl,-mllvm,-enable-memprof-context-disambiguation -Wl,-mllvm,-optimize-hot-cold-new -Wl,-mllvm,-supports-hot-cold-new source.cpp -o optimized_app -ltcmalloc
If invoking the optimizer directly via ``opt``:
@@ -100,10 +100,13 @@ If invoking the optimizer directly via ``opt``:
The compiler uses the profile data to annotate allocation instructions with metadata (e.g., ``!memprof``), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations.
.. note::
- For the optimized binary to utilize the hot/cold hinting, it must be linked with an allocator that supports this mechanism, such as `tcmalloc <https://github.com/google/tcmalloc>`_. TCMalloc extends operator new that accepts a hint (0 for cold, 255 for hot) to guide data placement and improve locality.
+ Ensure that the same debug info flags (e.g. ``-gmlt`` and ``-fdebug-info-for-profiling``) used during instrumentation are also passed during this compilation step to enable correct matching of the profile data.
+
+.. note::
+ For the optimized binary to fully utilize the hot/cold hinting, it must be linked with an allocator that supports this mechanism, such as `tcmalloc <https://github.com/google/tcmalloc>`_. TCMalloc provides an API (``tcmalloc::hot_cold_t``) that accepts a hint (0 for cold, 255 for hot) to guide data placement and improve locality. To indicate that the library supports these interfaces, the ``-mllvm -supports-hot-cold-new`` flag is used during the LTO link.
-Context Disambiguation (LTO/ThinLTO)
-------------------------------------
+Context Disambiguation (LTO)
+----------------------------
To fully benefit from MemProf, especially for common allocation wrappers, enabling **ThinLTO** (preferred) or LTO is required. This allows the compiler to perform **context disambiguation**.
@@ -125,7 +128,7 @@ Consider the following example:
Without context disambiguation, the compiler sees a single ``allocate`` function called from both hot and cold contexts. It must conservatively assume the allocation is "not cold" or "ambiguous".
-With ThinLTO and MemProf:
+With LTO and MemProf:
1. The compiler constructs a whole-program call graph.
2. It identifies that ``allocate`` has distinct calling contexts with different behaviors.
3. It **clones** ``allocate`` into two versions: one for the hot path and one for the cold path.
@@ -153,6 +156,9 @@ To enable this feature, pass the following flags to the compiler:
The optimized layout clusters hot static data, improving dTLB and cache efficiency.
+.. note::
+ For an LTO build -split-static-data needs to be passed to the LTO backend via the linker using ``-Wl,-mllvm,-split-static-data``.
+
Developer Manual
================
@@ -163,40 +169,59 @@ Architecture Overview
MemProf consists of three main components:
-1. **Instrumentation Pass (Compile-time):** Injects code to record memory allocations and accesses.
+1. **Instrumentation Pass (Compile-time):** Memory accesses are instrumented to increment the access count held in a shadow memory location, or alternatively to call into the runtime.
2. **Runtime Library (Link-time/Run-time):** Manages shadow memory and tracks allocation contexts and access statistics.
3. **Profile Analysis (Post-processing/Compile-time):** Tools and passes that read the profile, annotate the IR, and perform advanced optimizations like context disambiguation for ThinLTO.
-Detailed Workflow (ThinLTO)
----------------------------
+Detailed Workflow (LTO)
+-----------------------
-The optimization process, particularly context disambiguation, involves several steps during the ThinLTO pipeline:
+The optimization process, using LTO, involves several steps during the LTO pipeline:
-1. **Metadata Serialization:** During the ThinLTO summary analysis step, MemProf metadata (including MIBs and CallStacks) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
-2. **Whole Program Graph Construction:** During the ThinLTO indexing step, the compiler constructs a whole-program ``CallingContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
-3. **Auxiliary Graph & Cloning Decisions:** An auxiliary graph is constructed to guide the cloning process. The analysis identifies which functions and callsites need to be cloned to isolate cold allocation paths from hot ones.
-4. **ThinLTO Backend:** The actual cloning of functions and replacement of allocation calls (e.g., ``operator new``) happens in the ThinLTO backend passes. These transformations are guided by the decisions made during the indexing step.
+1. **Metadata Serialization:** During the LTO summary analysis step, MemProf metadata (``!memprof`` and ``!callsite``) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
+2. **Whole Program Graph Construction:** During the LTO indexing step, the compiler constructs a whole-program ``CallsiteContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
+3. **Cloning Decisions:** The analysis identifies which functions and callsites need to be cloned to isolate cold allocation paths from hot ones using the ``CallsiteContextGraph``.
+4. **LTO Backend:** The actual cloning of functions happens in the ``MemProfContextDisambiguation`` pass. The replacement of allocation calls (e.g., ``operator new``) happens in ``SimplifyLibCalls`` during the ``InstCombine`` pass. These transformations are guided by the decisions made during the indexing step.
Source Structure
----------------
* **Runtime:** ``compiler-rt/lib/memprof``
+
* Contains the runtime implementation, including shadow memory mapping, interceptors (malloc, free, etc.), and the thread-local storage for recording stats.
+
* **Instrumentation:** ``llvm/lib/Transforms/Instrumentation/MemProfInstrumentation.cpp``
+
* Implements the LLVM IR pass that adds instrumentation calls.
+
* **Profile Data:** ``llvm/include/llvm/ProfileData/MemProf.h`` and ``MemProfData.inc``
+
* Defines the profile format, data structures (like ``MemInfoBlock``), and serialization logic.
+
* **Use Pass:** ``llvm/lib/Transforms/Instrumentation/MemProfUse.cpp``
+
* Reads the profile and annotates the IR with metadata.
+
* **Context Disambiguation:** ``llvm/lib/transforms/ipo/MemProfContextDisambiguation.cpp``
- * Implements the analysis and transformations (e.g., cloning) for resolving ambiguous allocation contexts, particularly during ThinLTO.
+
+ * Implements the analysis and transformations (e.g., cloning) for resolving ambiguous allocation contexts using LTO.
+
+* **Transformation:** ``llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp``
+
+ * Implements the rewriting of allocation calls based on the hot/cold hints.
+
+* **Static Data Partitioning:** ``llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp`` and ``llvm/lib/CodeGen/StaticDataSplitter.cpp``
+
+ * Implements the splitting of static data into hot and cold sections.
Runtime Implementation
----------------------
The runtime uses a **shadow memory** scheme similar to AddressSanitizer (ASan) but optimized for profiling.
* **Shadow Mapping:** Application memory is mapped to shadow memory.
+
* **Granularity:** The default granularity is 64 bytes. One byte of shadow memory tracks the access state of 64 bytes of application memory.
+
* **MemInfoBlock (MIB):** A key data structure that stores statistics for an allocation context, including: ``AllocCount``, ``TotalAccessCount``, ``TotalLifetime``, and ``Min/MaxAccessDensity``.
Profile Format
@@ -233,8 +258,8 @@ When making changes to MemProf, verify your changes using the following test sui
* Purpose: Verify the correctness of the ``MemProfUse`` pass, metadata annotation, and IR transformations.
4. **ThinLTO & Context Disambiguation Tests:**
- * Location: ``llvm/test/ThinLTO/X86``
- * Purpose: Verify context disambiguation, cloning, and summary analysis during ThinLTO.
+ * Location: ``llvm/test/ThinLTO/X86/memprof*`` and ``llvm/test/Transforms/MemProfContextDisambiguation``
+ * Purpose: Verify context disambiguation, cloning, and summary analysis during ThinLTO and LTO.
Testing with YAML Profiles
--------------------------
>From b0d9f1cb6ff101c66cb284221b75e2081118fd7f Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Tue, 23 Dec 2025 01:02:23 +0000
Subject: [PATCH 05/15] Fix formatting issues
---
llvm/docs/MemProf.rst | 62 ++++++++++++++++++++++++++++++-------------
1 file changed, 44 insertions(+), 18 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index 8a540f799ec6a..c736f487634ba 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -241,24 +241,51 @@ Static Data Profile
To support static data partitioning, the profile format includes a payload for symbolized data access profiles. This maps data addresses to canonical symbol names (or module source location for internal data) and access counts. This enables the compiler to identify which global variables are hot.
Testing
+
-------
+
+
When making changes to MemProf, verify your changes using the following test suites:
+
+
1. **Runtime Tests:**
+
+
+
* Location: ``compiler-rt/test/memprof``
+
* Purpose: Verify the runtime instrumentation, shadow memory behavior, and profile generation.
+
+
2. **Profile Manipulation Tests:**
+
+
+
* Location: ``llvm/test/tools/llvm-profdata``
+
* Purpose: Verify that ``llvm-profdata`` can correctly merge, show, and handle MemProf profiles.
+
+
3. **Instrumentation & Optimization Tests:**
+
+
+
* Location: ``llvm/test/Transforms/PGOProfile``
+
* Purpose: Verify the correctness of the ``MemProfUse`` pass, metadata annotation, and IR transformations.
+
+
4. **ThinLTO & Context Disambiguation Tests:**
+
+
+
* Location: ``llvm/test/ThinLTO/X86/memprof*`` and ``llvm/test/Transforms/MemProfContextDisambiguation``
+
* Purpose: Verify context disambiguation, cloning, and summary analysis during ThinLTO and LTO.
Testing with YAML Profiles
@@ -267,6 +294,7 @@ Testing with YAML Profiles
You can create MemProf profiles in YAML format for testing purposes. This is useful for creating small, self-contained test cases without needing to run a binary.
1. **Create a YAML Profile:** You can start by dumping a real profile to YAML (see :ref:`Processing Profiles` above) or writing one from scratch.
+
2. **Convert to Indexed Format:** Use ``llvm-profdata`` to convert the YAML to the indexed MemProf format.
.. code-block:: bash
@@ -279,21 +307,19 @@ You can create MemProf profiles in YAML format for testing purposes. This is use
opt -passes='memprof-use<profile-filename=profile.memprofdata>' test.ll -S
-**Example YAML Profile:**
-
-.. code-block:: yaml
-
- ---
- HeapProfileRecords:
- - GUID: _Z3foov
- AllocSites:
- - Callstack:
- - { Function: _Z3foov, LineOffset: 0, Column: 22, IsInlineFrame: false }
- - { Function: main, LineOffset: 2, Column: 5, IsInlineFrame: false }
- MemInfoBlock:
- TotalSize: 400
- AllocCount: 1
- TotalLifetimeAccessDensity: 1
- TotalLifetime: 1000000
- CallSites: []
- ...
+**Example YAML Profile:** ::
+
+ ---
+ HeapProfileRecords:
+ - GUID: _Z3foov
+ AllocSites:
+ - Callstack:
+ - { Function: _Z3foov, LineOffset: 0, Column: 22, IsInlineFrame: false }
+ - { Function: main, LineOffset: 2, Column: 5, IsInlineFrame: false }
+ MemInfoBlock:
+ TotalSize: 400
+ AllocCount: 1
+ TotalLifetimeAccessDensity: 1
+ TotalLifetime: 1000000
+ CallSites: []
+ ...
>From c44b1fc845612c95afba973366c20ce027e8725e Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Tue, 23 Dec 2025 06:47:39 +0000
Subject: [PATCH 06/15] Fix note and list formatting.
---
llvm/docs/MemProf.rst | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index c736f487634ba..871600d051e92 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -101,8 +101,6 @@ The compiler uses the profile data to annotate allocation instructions with meta
.. note::
Ensure that the same debug info flags (e.g. ``-gmlt`` and ``-fdebug-info-for-profiling``) used during instrumentation are also passed during this compilation step to enable correct matching of the profile data.
-
-.. note::
For the optimized binary to fully utilize the hot/cold hinting, it must be linked with an allocator that supports this mechanism, such as `tcmalloc <https://github.com/google/tcmalloc>`_. TCMalloc provides an API (``tcmalloc::hot_cold_t``) that accepts a hint (0 for cold, 255 for hot) to guide data placement and improve locality. To indicate that the library supports these interfaces, the ``-mllvm -supports-hot-cold-new`` flag is used during the LTO link.
Context Disambiguation (LTO)
@@ -129,6 +127,7 @@ Consider the following example:
Without context disambiguation, the compiler sees a single ``allocate`` function called from both hot and cold contexts. It must conservatively assume the allocation is "not cold" or "ambiguous".
With LTO and MemProf:
+
1. The compiler constructs a whole-program call graph.
2. It identifies that ``allocate`` has distinct calling contexts with different behaviors.
3. It **clones** ``allocate`` into two versions: one for the hot path and one for the cold path.
>From d636ec6c74379deb051c40415d09f6ee0499a07e Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Tue, 23 Dec 2025 06:51:39 +0000
Subject: [PATCH 07/15] Drop extra example run command.
---
llvm/docs/MemProf.rst | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index 871600d051e92..219893d93d24f 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -45,13 +45,7 @@ To enable MemProf instrumentation, compile your application with the ``-fmemory-
Running and Generating Profiles
-------------------------------
-Run the instrumented application. By default, MemProf writes a raw profile file named ``memprof.profraw.<pid>`` to the current directory upon exit.
-
-.. code-block:: bash
-
- ./app
-
-Control the runtime behavior using the ``MEMPROF_OPTIONS`` environment variable. Common options include:
+Run the instrumented application. By default, MemProf writes a raw profile file named ``memprof.profraw.<pid>`` to the current directory upon exit. Control the runtime behavior using the ``MEMPROF_OPTIONS`` environment variable. Common options include:
* ``log_path``: Redirects runtime logs (e.g., ``stdout``, ``stderr``, or a file path).
* ``print_text``: If set to ``true``, prints a text-based summary of the profile to the log path.
>From efa3bd227eb2926ea530ff40df1fb001bb20024f Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Tue, 23 Dec 2025 07:15:31 +0000
Subject: [PATCH 08/15] Fix list formatting
---
llvm/docs/MemProf.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index 219893d93d24f..ba3b776fd3a8e 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -211,6 +211,7 @@ Runtime Implementation
----------------------
The runtime uses a **shadow memory** scheme similar to AddressSanitizer (ASan) but optimized for profiling.
+
* **Shadow Mapping:** Application memory is mapped to shadow memory.
* **Granularity:** The default granularity is 64 bytes. One byte of shadow memory tracks the access state of 64 bytes of application memory.
>From 1da51cd638a82323b22a5ee10a374b837084910c Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Tue, 23 Dec 2025 07:28:55 +0000
Subject: [PATCH 09/15] Add metadata docs
---
llvm/docs/MemProf.rst | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index ba3b776fd3a8e..d4bb227a7bb19 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -91,7 +91,7 @@ If invoking the optimizer directly via ``opt``:
opt -passes='memprof-use<profile-filename=memprof.memprofdata>' ...
-The compiler uses the profile data to annotate allocation instructions with metadata (e.g., ``!memprof``), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations.
+The compiler uses the profile data to annotate allocation instructions with ``!memprof`` metadata (`documentation <https://llvm.org/docs/LangRef.html#memprof-metadata>`), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations. Additionally, callsites which are part of allocation contexts are also annotated with ``!callsite`` metadata (`documentation <https://llvm.org/docs/LangRef.html#callsite-metadata>`).
.. note::
Ensure that the same debug info flags (e.g. ``-gmlt`` and ``-fdebug-info-for-profiling``) used during instrumentation are also passed during this compilation step to enable correct matching of the profile data.
@@ -134,7 +134,7 @@ MemProf profiles guide the layout of static data (e.g., global variables, consta
This feature uses a hybrid approach:
-1. **Symbolizable Data:** Data with external or local linkage (tracked by the symbol table) is partitioned based on data access profiles collected via instrumentation (`draft <https://github.com/llvm/llvm-project/pull/142884>`_) or hardware performance counters (e.g., Intel PEBS events such as ``MEM_INST_RETIRED.ALL_LOADS``).
+1. **Symbolizable Data:** Data with external or local linkage (tracked by the symbol table) is partitioned based on data access profiles collected via instrumentation (`PR <https://github.com/llvm/llvm-project/pull/142884>`_) or hardware performance counters (e.g., Intel PEBS events such as ``MEM_INST_RETIRED.ALL_LOADS``).
2. **Module-Internal Data:** Data not tracked by the symbol table (e.g., jump tables, constant pools, internal globals) has its hotness inferred from standard PGO code execution profiles.
To enable this feature, pass the following flags to the compiler:
@@ -164,12 +164,12 @@ MemProf consists of three main components:
1. **Instrumentation Pass (Compile-time):** Memory accesses are instrumented to increment the access count held in a shadow memory location, or alternatively to call into the runtime.
2. **Runtime Library (Link-time/Run-time):** Manages shadow memory and tracks allocation contexts and access statistics.
-3. **Profile Analysis (Post-processing/Compile-time):** Tools and passes that read the profile, annotate the IR, and perform advanced optimizations like context disambiguation for ThinLTO.
+3. **Profile Analysis (Post-processing/Compile-time):** Tools and passes that read the profile, annotate the IR using metadata, and perform context disambiguation if necessary when LTO is enabled.
Detailed Workflow (LTO)
-----------------------
-The optimization process, using LTO, involves several steps during the LTO pipeline:
+The optimization process, using LTO, involves several steps:
1. **Metadata Serialization:** During the LTO summary analysis step, MemProf metadata (``!memprof`` and ``!callsite``) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
2. **Whole Program Graph Construction:** During the LTO indexing step, the compiler constructs a whole-program ``CallsiteContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
>From f6e75177e73ee519dd5752759a7cb62aa7e5348a Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Tue, 23 Dec 2025 07:36:34 +0000
Subject: [PATCH 10/15] Update LTO section
---
llvm/docs/MemProf.rst | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index d4bb227a7bb19..9d49f4869fa6a 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -107,15 +107,15 @@ Consider the following example:
.. code-block:: cpp
void *allocate() { return new char[10]; }
-
+
void hot_path() {
// This path is executed frequently.
allocate();
- }
-
+ }
+
void cold_path() {
// This path is executed rarely.
- allocate();
+ allocate();
}
Without context disambiguation, the compiler sees a single ``allocate`` function called from both hot and cold contexts. It must conservatively assume the allocation is "not cold" or "ambiguous".
@@ -171,10 +171,11 @@ Detailed Workflow (LTO)
The optimization process, using LTO, involves several steps:
-1. **Metadata Serialization:** During the LTO summary analysis step, MemProf metadata (``!memprof`` and ``!callsite``) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
-2. **Whole Program Graph Construction:** During the LTO indexing step, the compiler constructs a whole-program ``CallsiteContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
-3. **Cloning Decisions:** The analysis identifies which functions and callsites need to be cloned to isolate cold allocation paths from hot ones using the ``CallsiteContextGraph``.
-4. **LTO Backend:** The actual cloning of functions happens in the ``MemProfContextDisambiguation`` pass. The replacement of allocation calls (e.g., ``operator new``) happens in ``SimplifyLibCalls`` during the ``InstCombine`` pass. These transformations are guided by the decisions made during the indexing step.
+1. **Matching (MemProfUse Pass):** The memprof profile is mapped onto allocation calls and callsite which are part of the allocation context using debug information. MemProf metadata is attached to the call instructions in the IR.
+2. **Metadata Serialization:** During the LTO summary analysis step, MemProf metadata (``!memprof`` and ``!callsite``) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
+3. **Whole Program Graph Construction:** During the LTO indexing step, the compiler constructs a whole-program ``CallsiteContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
+4. **Cloning Decisions:** The analysis identifies which functions and callsites need to be cloned to isolate cold allocation paths from hot ones using the ``CallsiteContextGraph``.
+5. **LTO Backend:** The actual cloning of functions happens in the ``MemProfContextDisambiguation`` pass. The replacement of allocation calls (e.g., ``operator new`` to the ``hot_cold_t`` variant) happens in ``SimplifyLibCalls`` during the ``InstCombine`` pass. These transformations are guided by the decisions made during the indexing step.
Source Structure
----------------
>From 4f7d15ae1aee6827d4a80bdb6a4d9f79abcf18c9 Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Tue, 23 Dec 2025 07:39:30 +0000
Subject: [PATCH 11/15] Mention attribute for unambiguous allocation sites
---
llvm/docs/MemProf.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index 9d49f4869fa6a..a966350477d39 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -171,7 +171,7 @@ Detailed Workflow (LTO)
The optimization process, using LTO, involves several steps:
-1. **Matching (MemProfUse Pass):** The memprof profile is mapped onto allocation calls and callsite which are part of the allocation context using debug information. MemProf metadata is attached to the call instructions in the IR.
+1. **Matching (MemProfUse Pass):** The memprof profile is mapped onto allocation calls and callsite which are part of the allocation context using debug information. MemProf metadata is attached to the call instructions in the IR. If the allocation call site is unambiguously cold (or hot) an attribute is added directly which guides the transformation.
2. **Metadata Serialization:** During the LTO summary analysis step, MemProf metadata (``!memprof`` and ``!callsite``) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
3. **Whole Program Graph Construction:** During the LTO indexing step, the compiler constructs a whole-program ``CallsiteContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
4. **Cloning Decisions:** The analysis identifies which functions and callsites need to be cloned to isolate cold allocation paths from hot ones using the ``CallsiteContextGraph``.
>From af58d72e669957a80f9b74d796cb84e166223009 Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Wed, 24 Dec 2025 07:54:53 +0000
Subject: [PATCH 12/15] Address comments
---
llvm/docs/MemProf.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index a966350477d39..412d888454340 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -100,7 +100,7 @@ The compiler uses the profile data to annotate allocation instructions with ``!m
Context Disambiguation (LTO)
----------------------------
-To fully benefit from MemProf, especially for common allocation wrappers, enabling **ThinLTO** (preferred) or LTO is required. This allows the compiler to perform **context disambiguation**.
+To fully benefit from MemProf, especially for common allocation wrappers, enabling **ThinLTO** (preferred) or **Full LTO** is required. This allows the compiler to perform **context disambiguation**.
Consider the following example:
@@ -162,7 +162,7 @@ Architecture Overview
MemProf consists of three main components:
-1. **Instrumentation Pass (Compile-time):** Memory accesses are instrumented to increment the access count held in a shadow memory location, or alternatively to call into the runtime.
+1. **Instrumentation Pass (Compile-time):** Memory accesses are instrumented to increment the access count held in a shadow memory location, or alternatively to call into the runtime. Memory allocations are intercepted by the runtime library.
2. **Runtime Library (Link-time/Run-time):** Manages shadow memory and tracks allocation contexts and access statistics.
3. **Profile Analysis (Post-processing/Compile-time):** Tools and passes that read the profile, annotate the IR using metadata, and perform context disambiguation if necessary when LTO is enabled.
@@ -171,11 +171,11 @@ Detailed Workflow (LTO)
The optimization process, using LTO, involves several steps:
-1. **Matching (MemProfUse Pass):** The memprof profile is mapped onto allocation calls and callsite which are part of the allocation context using debug information. MemProf metadata is attached to the call instructions in the IR. If the allocation call site is unambiguously cold (or hot) an attribute is added directly which guides the transformation.
-2. **Metadata Serialization:** During the LTO summary analysis step, MemProf metadata (``!memprof`` and ``!callsite``) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
-3. **Whole Program Graph Construction:** During the LTO indexing step, the compiler constructs a whole-program ``CallsiteContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
+1. **Matching (MemProfUse Pass):** The memprof profile is mapped onto allocation calls and callsites which are part of the allocation context using debug information. MemProf metadata is attached to the call instructions in the IR. If the allocation call site is unambiguously cold (or hot) an attribute is added directly which guides the transformation.
+2. **Metadata Serialization:** For ThinLTO, during the summary analysis step, MemProf metadata (``!memprof`` and ``!callsite``) is serialized into the module summary. This is implemented in ``llvm/lib/Analysis/ModuleSummaryAnalysis.cpp``.
+3. **Whole Program Graph Construction:** During the LTO step, the compiler constructs a whole-program ``CallsiteContextGraph`` to analyze and disambiguate contexts. This graph identifies where allocation contexts diverge (e.g., same function called from hot vs. cold paths). This logic resides in ``llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp``.
4. **Cloning Decisions:** The analysis identifies which functions and callsites need to be cloned to isolate cold allocation paths from hot ones using the ``CallsiteContextGraph``.
-5. **LTO Backend:** The actual cloning of functions happens in the ``MemProfContextDisambiguation`` pass. The replacement of allocation calls (e.g., ``operator new`` to the ``hot_cold_t`` variant) happens in ``SimplifyLibCalls`` during the ``InstCombine`` pass. These transformations are guided by the decisions made during the indexing step.
+5. **LTO Backend:** The actual cloning of functions happens in the ``MemProfContextDisambiguation`` pass. The replacement of allocation calls (e.g., ``operator new`` to the ``hot_cold_t`` variant) happens in ``SimplifyLibCalls`` during the ``InstCombine`` pass. These transformations are guided by the decisions made during the LTO step.
Source Structure
----------------
>From ebfb31fcd263a6cc4df49570b8981397bf9e691a Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Wed, 24 Dec 2025 08:20:24 +0000
Subject: [PATCH 13/15] Fix hyperlink
---
llvm/docs/MemProf.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index 412d888454340..8e4a838807f61 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -91,7 +91,7 @@ If invoking the optimizer directly via ``opt``:
opt -passes='memprof-use<profile-filename=memprof.memprofdata>' ...
-The compiler uses the profile data to annotate allocation instructions with ``!memprof`` metadata (`documentation <https://llvm.org/docs/LangRef.html#memprof-metadata>`), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations. Additionally, callsites which are part of allocation contexts are also annotated with ``!callsite`` metadata (`documentation <https://llvm.org/docs/LangRef.html#callsite-metadata>`).
+The compiler uses the profile data to annotate allocation instructions with ``!memprof`` metadata (`documentation <https://llvm.org/docs/LangRef.html#memprof-metadata>_`), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations. Additionally, callsites which are part of allocation contexts are also annotated with ``!callsite`` metadata (`documentation <https://llvm.org/docs/LangRef.html#callsite-metadata>_`).
.. note::
Ensure that the same debug info flags (e.g. ``-gmlt`` and ``-fdebug-info-for-profiling``) used during instrumentation are also passed during this compilation step to enable correct matching of the profile data.
>From 175c229cd1c73b3f125d971c5044d3f05bbb29a9 Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Wed, 24 Dec 2025 08:24:01 +0000
Subject: [PATCH 14/15] Fix hyperlink
---
llvm/docs/MemProf.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/llvm/docs/MemProf.rst b/llvm/docs/MemProf.rst
index 8e4a838807f61..945a80eba1a35 100644
--- a/llvm/docs/MemProf.rst
+++ b/llvm/docs/MemProf.rst
@@ -91,7 +91,7 @@ If invoking the optimizer directly via ``opt``:
opt -passes='memprof-use<profile-filename=memprof.memprofdata>' ...
-The compiler uses the profile data to annotate allocation instructions with ``!memprof`` metadata (`documentation <https://llvm.org/docs/LangRef.html#memprof-metadata>_`), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations. Additionally, callsites which are part of allocation contexts are also annotated with ``!callsite`` metadata (`documentation <https://llvm.org/docs/LangRef.html#callsite-metadata>_`).
+The compiler uses the profile data to annotate allocation instructions with ``!memprof`` metadata (`documentation <https://llvm.org/docs/LangRef.html#memprof-metadata>`_), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations. Additionally, callsites which are part of allocation contexts are also annotated with ``!callsite`` metadata (`documentation <https://llvm.org/docs/LangRef.html#callsite-metadata>`_).
.. note::
Ensure that the same debug info flags (e.g. ``-gmlt`` and ``-fdebug-info-for-profiling``) used during instrumentation are also passed during this compilation step to enable correct matching of the profile data.
>From 82190a6608c3f7825339d058e83d5a1b0fd62299 Mon Sep 17 00:00:00 2001
From: Snehasish Kumar <snehasishk at google.com>
Date: Wed, 24 Dec 2025 08:29:49 +0000
Subject: [PATCH 15/15] Cleanup userguide
---
llvm/docs/UserGuides.rst | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 5a5b73836e2ec..10d7fef904d2d 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -157,8 +157,7 @@ Optimizations
Information about the MemorySSA utility in LLVM, as well as how to use it.
:doc:`MemProf`
- User guide and internals of MemProf, a profile guided optimization for
- memory.
+ User guide and internals of MemProf, profile guided optimizations for memory.
:doc:`LoopTerminology`
A document describing Loops and associated terms as used in LLVM.
More information about the llvm-commits
mailing list