[llvm] Add documentation for MemProf. (PR #172238)
Teresa Johnson via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 23 11:59:03 PST 2025
================
@@ -0,0 +1,320 @@
+====================================
+MemProf: Memory Profiling for LLVM
+====================================
+
+.. contents::
+ :local:
+ :depth: 2
+
+Introduction
+============
+
+MemProf is a profile-guided optimization (PGO) feature for memory. It enables the compiler to optimize memory allocations and static data layout based on runtime profiling information. By understanding the "hotness", lifetime, and access frequency of memory, the compiler can make better decisions about where to place data (both heap and static), reducing fragmentation and improving cache locality.
+
+Motivation
+----------
+
+Traditional PGO focuses on control flow (hot vs. cold code). MemProf extends this concept to data. It answers questions like:
+
+* Which allocation sites are "hot" (frequently accessed)?
+* Which allocation sites are "cold" (rarely accessed)?
+* What is the lifetime of an allocation?
+
+This information enables optimizations such as:
+
+* **Heap Layout Optimization:** Grouping objects with similar lifetimes or access density. This currently requires an allocator that supports the necessary interfaces (e.g., tcmalloc).
+* **Static Data Partitioning:** Segregating frequently accessed (hot) global variables and constants from rarely accessed (cold) ones to improve data locality and TLB utilization.
+
+User Manual
+===========
+
+This section describes how to use MemProf to profile and optimize your application.
+
+Building with MemProf Instrumentation
+-------------------------------------
+
+To enable MemProf instrumentation, compile your application with the ``-fmemory-profile`` flag. Make sure to include debug information (``-gmlt`` and ``-fdebug-info-for-profiling``) and frame pointers to ensure accurate stack traces and line number reporting.
+
+.. code-block:: bash
+
+ clang++ -fmemory-profile -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fno-optimize-sibling-calls -fdebug-info-for-profiling -gmlt -O2 source.cpp -o app
+
+.. note::
+ Link with ``-fmemory-profile`` as well to link the necessary runtime libraries. If you use a separate link step, ensure the flag is passed to the linker.
+
+Running and Generating Profiles
+-------------------------------
+
+Run the instrumented application. By default, MemProf writes a raw profile file named ``memprof.profraw.<pid>`` to the current directory upon exit. Control the runtime behavior using the ``MEMPROF_OPTIONS`` environment variable. Common options include:
+
+* ``log_path``: Redirects runtime logs (e.g., ``stdout``, ``stderr``, or a file path).
+* ``print_text``: If set to ``true``, prints a text-based summary of the profile to the log path.
+* ``verbosity``: Controls the level of debug output.
+
+**Example:**
+
+.. code-block:: bash
+
+ MEMPROF_OPTIONS=log_path=stdout:print_text=true ./app
+
+.. _Processing Profiles:
+
+Processing Profiles
+-------------------
+
+Raw profiles must be indexed before the compiler can use them. Use ``llvm-profdata`` to merge and index the raw profiles.
+
+.. code-block:: bash
+
+ llvm-profdata merge memprof.profraw.* --profiled-binary ./app -o memprof.memprofdata
+
+To dump the profile in YAML format (useful for debugging or creating test cases):
+
+.. code-block:: bash
+
+ llvm-profdata show --memory memprof.memprofdata > memprof.yaml
+
+Merge MemProf profiles with standard PGO instrumentation profiles if you have both (optional).
+
+Using Profiles for Optimization
+-------------------------------
+
+Feed the indexed profile back into the compiler using the ``-fmemory-profile-use=`` option (or low-level passes options).
+
+.. code-block:: bash
+
+ clang++ -fmemory-profile-use=memprof.memprofdata -O2 -Wl,-mllvm,-enable-memprof-context-disambiguation -Wl,-mllvm,-optimize-hot-cold-new -Wl,-mllvm,-supports-hot-cold-new source.cpp -o optimized_app -ltcmalloc
+
+If invoking the optimizer directly via ``opt``:
+
+.. code-block:: bash
+
+ opt -passes='memprof-use<profile-filename=memprof.memprofdata>' ...
+
+The compiler uses the profile data to annotate allocation instructions with ``!memprof`` metadata (`documentation <https://llvm.org/docs/LangRef.html#memprof-metadata>`), distinguishing between "hot", "cold", and "notcold" allocations. This metadata guides downstream optimizations. Additionally, callsites which are part of allocation contexts are also annotated with ``!callsite`` metadata (`documentation <https://llvm.org/docs/LangRef.html#callsite-metadata>`).
+
+.. note::
+ Ensure that the same debug info flags (e.g. ``-gmlt`` and ``-fdebug-info-for-profiling``) used during instrumentation are also passed during this compilation step to enable correct matching of the profile data.
+ For the optimized binary to fully utilize the hot/cold hinting, it must be linked with an allocator that supports this mechanism, such as `tcmalloc <https://github.com/google/tcmalloc>`_. TCMalloc provides an API (``tcmalloc::hot_cold_t``) that accepts a hint (0 for cold, 255 for hot) to guide data placement and improve locality. To indicate that the library supports these interfaces, the ``-mllvm -supports-hot-cold-new`` flag is used during the LTO link.
+
+Context Disambiguation (LTO)
+----------------------------
+
+To fully benefit from MemProf, especially for common allocation wrappers, enabling **ThinLTO** (preferred) or LTO is required. This allows the compiler to perform **context disambiguation**.
+
+Consider the following example:
+
+.. code-block:: cpp
+
----------------
teresajohnson wrote:
Make the second one "Full LTO" to distinguish from ThinLTO and the more generic "LTO" references elsewhere in this section
https://github.com/llvm/llvm-project/pull/172238
More information about the llvm-commits
mailing list