[clang] [llvm] [LangRef] Describe semantic impact of non-temporal memory access. (PR #178264)
James Y Knight via cfe-commits
cfe-commits at lists.llvm.org
Tue Jan 27 09:40:03 PST 2026
https://github.com/jyknight created https://github.com/llvm/llvm-project/pull/178264
LLVM and Clang have both had non-temporal load/store support for over a decade, but the impact on multithreaded program semantics has never been properly described. Furthermore, proper use of these operations requires the use of additional fence operations, which have never yet been added to LLVM.
This PR does not attempt to fully specify the semantics, nor correct the deficiencies, but simply documents the current state of affairs -- both in the LLVM IR manual and the Clang manual.
See issue #64521.
>From 4514174c5820b21d084c61fe8d4a0767095e7d87 Mon Sep 17 00:00:00 2001
From: James Y Knight <jyknight at google.com>
Date: Tue, 27 Jan 2026 12:29:43 -0500
Subject: [PATCH] [LangRef] Describe semantic impact of non-temporal memory
access.
LLVM and Clang have both had non-temporal load/store support for over
a decade, but the impact on multithreaded program semantics has never
been properly described. Furthermore, proper use of these operations
requires the use of additional fence operations, which have never yet
been added to LLVM.
This PR does not attempt to fully specify the semantics, nor correct
the deficiencies, but simply documents the current state of affairs --
both in the LLVM IR manual and the Clang manual.
See issue #64521.
---
clang/docs/LanguageExtensions.rst | 19 ++++++++
llvm/docs/LangRef.rst | 72 +++++++++++++++++++++++++------
2 files changed, 77 insertions(+), 14 deletions(-)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 7363b237d8885..c6aba401d0d57 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -5006,6 +5006,25 @@ The types ``T`` currently supported are:
Note that the compiler does not guarantee that non-temporal loads or stores
will be used.
+.. warning::
+ Non-temporal memory accesses may not adhere to the usual memory model. If you
+ use them in a multi-threaded program, you may need to also emit additional
+ non-temporal-specific fences, which Clang does not currently provide in a
+ cross-target manner. (An ``atomic_thread_fence`` is not sufficient.)
+
+ The interaction between non-temporal memory instructions and cross-thread
+ memory ordering guarantees has not been fully explored across hardware
+ targets, nor has it been fully specified here. The interaction between these
+ relaxed memory ordering semantics and LLVM’s optimization passes has also not
+ yet been fully explored and verified.
+
+ Using these operations correctly is effectively "left as an exercise for the
+ reader" at the moment. See `issue #64521
+ <https://github.com/llvm/llvm-project/issues/64521>`_ for further discussion.
+ *Be very careful* if you use non-temporal memory operations in a multithreaded
+ program before this issue is resolved!
+
+
C++ Coroutines support builtins
--------------------------------
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 103058d161f86..b5c94c5147dfe 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -4002,6 +4002,54 @@ or ``syncscope("<target-scope>")`` *synchronizes with* and participates in the
seq\_cst total orderings of other operations that are not marked
``syncscope("singlethread")`` or ``syncscope("<target-scope>")``.
+.. _nontemporal:
+
+Non-temporal memory access
+--------------------------
+
+Certain memory access operations are marked as "non-temporal". The :ref:`load
+<i_load>` and :ref:`store <i_store>` instructions can be made non-temporal by
+attaching the ``!nontemporal`` metadata. Some target-specific intrinsics may
+also be specified as doing a non-temporal memory access.
+
+Hardware provides these special non-temporal access instructions (such as the
+``MOVNT`` instructions on x86), in order to permit specialized code to reduce
+cache utilization and bandwidth, when the program knows that the memory is
+unlikely to be reused in cache in the near future. However, because they avoid
+caches, these operations can also easily *reduce* your program's performance
+when used inappropriately.
+
+Non-temporal memory operations may also have relaxed memory ordering guarantees
+which do not conform with LLVM's Memory Model for Concurrent Operations. In
+particular, non-temporal memory accesses *might not* be ordered by a
+cross-thread "synchronizes-with" edge, unless additional fences are
+used. Converting a memory operation from normal to non-temporal will not change
+the semantics of a single-threaded program execution, but, because of the
+relaxed memory ordering, it *will* change the semantics of a multithreaded
+program.
+
+If your program requires non-temporal stores to become visible to another
+thread, or stores made from another thread to become visible to a non-temporal
+load, you must emit a special non-temporal fence between the non-temporal
+load/store and the cross-thread synchronizing operation. Unfortunately, no such
+such cross-target fence operation has yet been defined in LLVM IR. (Yes, this is
+a bug). You will therefore need to use a target-specific fence intrinsic if one
+exists, or inline assembly, or knowledge that on some particular target, no
+additional fence is required.
+
+.. warning::
+ The interaction between non-temporal memory instructions and cross-thread
+ memory ordering guarantees has not been fully explored across hardware
+ targets, nor has it been fully specified here. The interaction between these
+ relaxed memory ordering semantics and LLVM's optimization passes has also not
+ yet been fully explored and verified.
+
+ Using these operations correctly is effectively "left as an exercise for the
+ reader" at the moment. See `issue #64521
+ <https://github.com/llvm/llvm-project/issues/64521>`_ for further discussion.
+ *Be very careful* if you use non-temporal memory operations in a multithreaded
+ program before this issue is resolved!
+
.. _floatenv:
Floating-Point Environment
@@ -11658,13 +11706,11 @@ The alignment is only optional when parsing textual IR; for in-memory IR, it is
always present. An omitted ``align`` argument means that the operation has the
ABI alignment for the target.
-The optional ``!nontemporal`` metadata must reference a single
-metadata name ``<nontemp_node>`` corresponding to a metadata node with one
-``i32`` entry of value 1. The existence of the ``!nontemporal``
-metadata on the instruction tells the optimizer and code generator
-that this load is not expected to be reused in the cache. The code
-generator may select special instructions to save cache bandwidth, such
-as the ``MOVNT`` instruction on x86.
+The optional ``!nontemporal`` metadata must reference a single metadata name
+``<nontemp_node>`` corresponding to a metadata node with one ``i32`` entry of
+value 1. This permits the use of a non-temporal memory access instruction, which
+may have different semantics than a usual load/store. See the :ref:`Non-temporal
+memory access <nontemporal>` section for more information.
The optional ``!invariant.load`` metadata must reference a single
metadata name ``<empty_node>`` corresponding to a metadata node with no
@@ -11797,13 +11843,11 @@ The alignment is only optional when parsing textual IR; for in-memory IR, it is
always present. An omitted ``align`` argument means that the operation has the
ABI alignment for the target.
-The optional ``!nontemporal`` metadata must reference a single metadata
-name ``<nontemp_node>`` corresponding to a metadata node with one ``i32`` entry
-of value 1. The existence of the ``!nontemporal`` metadata on the instruction
-tells the optimizer and code generator that this load is not expected to
-be reused in the cache. The code generator may select special
-instructions to save cache bandwidth, such as the ``MOVNT`` instruction on
-x86.
+The optional ``!nontemporal`` metadata must reference a single metadata name
+``<nontemp_node>`` corresponding to a metadata node with one ``i32`` entry of
+value 1. This permits the use of a non-temporal memory access instruction, which
+may have different semantics than a usual load/store. See the :ref:`Non-temporal
+memory access <nontemporal>` section for more information.
The optional ``!invariant.group`` metadata must reference a
single metadata name ``<empty_node>``. See ``invariant.group`` metadata.
More information about the cfe-commits
mailing list