[llvm-branch-commits] [clang] [Clang] Introduce -fsanitize=alloc-token (PR #156839)

Marco Elver via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Fri Sep 19 04:07:39 PDT 2025


================
@@ -0,0 +1,172 @@
+=================
+Allocation Tokens
+=================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+Clang provides support for allocation tokens to enable allocator-level heap
+organization strategies. Clang assigns mode-dependent token IDs to allocation
+calls; the runtime behavior depends entirely on the implementation of a
+compatible memory allocator.
+
+Possible allocator strategies include:
+
+* **Security Hardening**: Placing allocations into separate, isolated heap
+  partitions. For example, separating pointer-containing types from raw data
+  can mitigate exploits that rely on overflowing a primitive buffer to corrupt
+  object metadata.
+
+* **Memory Layout Optimization**: Grouping related allocations to improve data
+  locality and cache utilization.
+
+* **Custom Allocation Policies**: Applying different management strategies to
+  different partitions.
+
+Token Assignment Mode
+=====================
+
+The default mode to calculate tokens is:
+
+* ``typehash``: This mode assigns a token ID based on the hash of the allocated
+  type's name.
+
+Other token ID assignment modes are supported, but they may be subject to
+change or removal. These may (experimentally) be selected with ``-mllvm
+-alloc-token-mode=<mode>``:
+
+* ``random``: This mode assigns a statically-determined random token ID to each
+  allocation site.
+
+* ``increment``: This mode assigns a simple, incrementally increasing token ID
+  to each allocation site.
+
+Allocation Token Instrumentation
+================================
+
+To enable instrumentation of allocation functions, code can be compiled with
+the ``-fsanitize=alloc-token`` flag:
+
+.. code-block:: console
+
+    % clang++ -fsanitize=alloc-token example.cc
+
+The instrumentation transforms allocation calls to include a token ID. For
+example:
+
+.. code-block:: c
+
+    // Original:
+    ptr = malloc(size);
+
+    // Instrumented:
+    ptr = __alloc_token_malloc(size, token_id);
+
+In addition, it is typically recommended to configure the following:
+
+* ``-falloc-token-max=<N>``
+    Configures the maximum number of tokens. No max by default (tokens bounded
+    by ``SIZE_MAX``).
+
+    .. code-block:: console
+
+        % clang++ -fsanitize=alloc-token -falloc-token-max=512 example.cc
+
+Runtime Interface
+-----------------
+
+A compatible runtime must be provided that implements the token-enabled
+allocation functions. The instrumentation generates calls to functions that
+take a final ``size_t token_id`` argument.
+
+.. code-block:: c
+
+    // C standard library functions
+    void *__alloc_token_malloc(size_t size, size_t token_id);
+    void *__alloc_token_calloc(size_t count, size_t size, size_t token_id);
+    void *__alloc_token_realloc(void *ptr, size_t size, size_t token_id);
+    // ...
+
+    // C++ operators (mangled names)
+    // operator new(size_t, size_t)
+    void *__alloc_token_Znwm(size_t size, size_t token_id);
+    // operator new[](size_t, size_t)
+    void *__alloc_token_Znam(size_t size, size_t token_id);
+    // ... other variants like nothrow, etc., are also instrumented.
+
+Fast ABI
+--------
+
+An alternative ABI can be enabled with ``-fsanitize-alloc-token-fast-abi``,
+which encodes the token ID hint in the allocation function name.
+
+.. code-block:: c
+
+    void *__alloc_token_0_malloc(size_t size);
+    void *__alloc_token_1_malloc(size_t size);
+    void *__alloc_token_2_malloc(size_t size);
+    ...
+    void *__alloc_token_0_Znwm(size_t size);
+    void *__alloc_token_1_Znwm(size_t size);
+    void *__alloc_token_2_Znwm(size_t size);
+    ...
+
+This ABI provides a more efficient alternative where
+``-falloc-token-max`` is small.
+
+Disabling Instrumentation
+-------------------------
+
+To exclude specific functions from instrumentation, you can use the
+``no_sanitize("alloc-token")`` attribute:
+
+.. code-block:: c
+
+    __attribute__((no_sanitize("alloc-token")))
+    void* custom_allocator(size_t size) {
+        return malloc(size);  // Uses original malloc
+    }
+
+Note: Independent of any given allocator support, the instrumentation aims to
+remain performance neutral. As such, ``no_sanitize("alloc-token")``
+functions may be inlined into instrumented functions and vice-versa. If
+correctness is affected, such functions should explicitly be marked
+``noinline``.
+
+The ``__attribute__((disable_sanitizer_instrumentation))`` is also supported to
+disable this and other sanitizer instrumentations.
+
+Suppressions File (Ignorelist)
+------------------------------
+
+AllocToken respects the ``src`` and ``fun`` entity types in the
+:doc:`SanitizerSpecialCaseList`, which can be used to omit specified source
+files or functions from instrumentation.
+
+.. code-block:: bash
----------------
melver wrote:

That's what AddressSanitizer.rst and the others use. I guess because the '#' comments format sort of ok that way.

https://github.com/llvm/llvm-project/pull/156839


More information about the llvm-branch-commits mailing list