[llvm] AMDGPU: Preliminary documentation for named barriers (PR #165502)
Nicolai Hähnle via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 30 07:41:32 PDT 2025
https://github.com/nhaehnle updated https://github.com/llvm/llvm-project/pull/165502
>From ee4eb7a2b1f1c478dc0dadade0f41bf3033cfb1f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nicolai.haehnle at amd.com>
Date: Tue, 28 Oct 2025 19:49:13 -0700
Subject: [PATCH 1/3] AMDGPU: Preliminary documentation for named barriers
---
llvm/docs/AMDGPUUsage.rst | 179 ++++++++++++++++++++++++++++++++++++++
1 file changed, 179 insertions(+)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7780c0a6dca0a..9a4c644a63f6e 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1179,6 +1179,53 @@ is conservatively correct for OpenCL.
other operations within the same address space.
======================= ===================================================
+Target Types
+------------
+
+The AMDGPU backend implements some target extension types.
+
+.. _amdgpu-types-named-barriers:
+
+Named Barriers
+~~~~~~~~~~~~~~
+
+Named barriers are represented as memory objects of type
+``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
+in the LDS address space. They do not occupy regular LDS memory, but their
+lifetime and allocation granularity matches that of global variables in LDS.
+
+The following types built from named barriers are supported in global variables,
+defined recursively:
+
+* a standalone ``target("amdgcn.named.barrier", 0)``
+* an array of supported types
+* a struct containing a single element of supported type
+
+.. code-block:: llvm
+
+ @bar = addrspace(3) global target("amdgcn.named.barrier", 0) undef
+ @foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
+ @baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef
+
+Barrier types may not be used in ``alloca``.
+
+The integral representation of a pointer to a valid named barrier is in the
+range ``0x0080'0010`` to ``0x0080'0100`` (inclusive). The representation is
+formed by the expression ``0x0080'0000 | (id << 4)``, where ``id`` is the
+hardware barrier ID. The integral representation of the null named barrier is
+``0x0080'0000``.
+
+It is not legal to attempt to form a pointer to any non-named barrier objects.
+
+It is undefined behavior to use a pointer to any part of a named barrier object
+as the pointer operand of a regular memory access instruction or intrinsic.
+Pointers to named barrier objects are intended to be used with dedicated
+intrinsics.
+
+We expand on the semantics of named barriers in
+:ref:`the memory model section <amdgpu-memory-model-named-barriers>`.
+
+
LLVM IR Intrinsics
------------------
@@ -6621,6 +6668,138 @@ Multiple tags can be used at the same time to synchronize with more than one add
better code optimization, at the cost of synchronizing additional address
spaces.
+.. _amdgpu-memory-model-barriers:
+
+Hardware Barriers
++++++++++++++++++
+
+.. note::
+
+ This section is preliminary. The semantics described here are intended to be
+ formalized properly in the future.
+
+Hardware barriers synchronize execution between concurrently running waves using
+fixed function hardware. Intuitively, a set of waves are "members" of a barrier.
+Waves *signal* the barrier and later *wait* for it. Execution only proceeds past
+the *wait* once all member waves have *signaled* the barrier.
+
+Formally, barriers affect semantics in exactly two ways. First, they affect
+forward progress. Waiting on a barrier that never completes (is not signaled
+sufficiently) prevents forward progress and therefore, given the assumption of
+forward progress, is undefined behavior. Second, barrier operations can pair
+with fences to contribute *synchronizes-with* relations in the memory model.
+
+Roughly speaking:
+
+- Release fences pair with barrier signal operations that are later in program
+ order
+- Barrier wait operations pair with acquire fences that are later in program
+ order
+- If a barrier signal operation contributes to allowing a wait operation to
+ complete, then the corresponding paired fences can synchronize-with each
+ other (given compatible sync scopes and memory model relaxation annotations)
+
+Default Barriers
+################
+
+There is a default workgroup barrier and a default cluster barrier. All waves
+of a workgroup and cluster are members of the same default workgroup and
+cluster barriers, respectively.
+
+.. _amdgpu-memory-model-named-barriers:
+
+Named Barriers
+##############
+
+All named barrier operations must occur in wave-uniform control flow. All
+arguments of named barrier intrinsics must be wave-uniform.
+
+Named barriers are allocated as global variables of
+:ref:`a target extension type <amdgpu-types-named-barriers>`.
+
+Named barriers may be signaled by the intrinsics:
+
+.. code-block:: llvm
+
+ declare void @llvm.amdgcn.s.barrier.signal(i32 %barrier_hw_id)
+ declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count)
+
+If the second form is used and ``member_count`` is non-zero, the operation is
+an *initializing* signal, else it is *non*-initializing.
+
+Named barriers may be initialized explicitly using:
+
+.. code-block:: llvm
+
+ declare void @llvm.amdgcn.s.barrier.init(ptr addrspace(3) %barrier_ptr, i32 %member_count)
+
+It is possible to "leave" a named barrier. This decrements the named barrier's
+member count and completes the barrier if all other members have signaled it:
+
+.. code-block:: llvm
+
+ declare void @llvm.amdgcn.s.barrier.leave(i32 %barrier_type)
+
+``barrier_type`` must be set to ``1``.
+
+Note that leaving a named barrier is not exactly the opposite of joining a
+barrier (for example, joining a barrier does not change its member count).
+
+Leaving implicitly *joins* (see below) a null named barrier.
+
+Signal, leave, and initializing operations on the same named barrier must obey
+certain ordering constraints:
+
+* Non-initializing signals must be ordered after some initializing signal or an
+ explicit initializing operation.
+* Explicit initializing operations must not race signal or leave operations.
+* Initializing signal operations must not race leave operations.
+* Initializing signal operations with contradicting member counts must not race
+ each other.
+
+The details of how these orders can be established and races prevented are tbd.
+Using a default workgroup or cluster barrier in the natural way is guaranteed to
+be sufficient.
+
+In order to wait for a named barrier, a wave must first *join* the named barrier
+using:
+
+.. code-block:: llvm
+
+ declare void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) %barrier_ptr)
+
+The named barrier may then be waited for using:
+
+.. code-block:: llvm
+
+ declare void @llvm.amdgcn.s.barrier.wait(i32 %barrier_type)
+
+... with ``barrier_type`` set to ``1``.
+
+Signal, leave, join, and wait operations must obey certain ordering constraints.
+The details are tbd. Satisfying the following rules is guaranteed to be
+sufficient:
+
+* Signal or wait for a named barrier only if it is the most recent to have been
+ joined in program order.
+* Signal or leave a named barrier only if the number of prior signaling
+ operations on that named barrier since the most recent join in program order
+ is equal to the number of prior wait operations on that named barrier since
+ the most recent join in program order.
+* Wait for a named barrier only if the number of prior signaling operations on
+ that named barrier since the most recent join in program order is one larger
+ than the number of prior wait operations on that named barrier since the most
+ recent join in program order.
+* Do not signal a named barrier or wait for it in program order after leaving it.
+
+Additionally, use signal, leave, and wait operations on a named barrier from a
+consistent associated set of waves that is determined at initialization time and
+whose initial size is the member count used at initialization. The set of waves
+may shrink with leave operations. Operations on a named barrier object with
+conflicting sets of waves must not race. The details of this rule and how an
+ordering can be established to prevent a race is tbd. Using a default workgroup
+or cluster barrier in the natural way is guaranteed to be sufficient.
+
.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
Memory Model GFX6-GFX9
>From 0681f1fdde64cb2692522d81655512cf8e123be1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nicolai.haehnle at amd.com>
Date: Wed, 29 Oct 2025 10:16:39 -0700
Subject: [PATCH 2/3] Address some review comments
---
llvm/docs/AMDGPUUsage.rst | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 9a4c644a63f6e..430faeadc86c3 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1189,15 +1189,19 @@ The AMDGPU backend implements some target extension types.
Named Barriers
~~~~~~~~~~~~~~
-Named barriers are represented as memory objects of type
-``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
-in the LDS address space. They do not occupy regular LDS memory, but their
-lifetime and allocation granularity matches that of global variables in LDS.
+Named barriers are fixed function hardware barrier objects that are available
+in gfx12.5+ in addition to the traditional default barriers.
-The following types built from named barriers are supported in global variables,
-defined recursively:
+In LLVM IR, named barriers are represented by global variables of type
+``target("amdgcn.named.barrier", 0)`` in the LDS address space. Named barrier
+global variables do not occupy actual LDS memory, but their lifetime and
+allocation scope matches that of global variables in LDS. Programs in LLVM IR
+refer to named barriers using pointers.
-* a standalone ``target("amdgcn.named.barrier", 0)``
+The following named barrier types are supported in global variables, defined
+recursively:
+
+* a single, standalone ``target("amdgcn.named.barrier", 0)``
* an array of supported types
* a struct containing a single element of supported type
@@ -1207,15 +1211,12 @@ defined recursively:
@foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
@baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef
-Barrier types may not be used in ``alloca``.
+ ...
-The integral representation of a pointer to a valid named barrier is in the
-range ``0x0080'0010`` to ``0x0080'0100`` (inclusive). The representation is
-formed by the expression ``0x0080'0000 | (id << 4)``, where ``id`` is the
-hardware barrier ID. The integral representation of the null named barrier is
-``0x0080'0000``.
+ %foo.i = getelementptr [2 x target("amdgcn.named.barrier", 0)], ptr addrspace(3) @foo, i32 0, i32 %i
+ call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %foo.i, i32 0)
-It is not legal to attempt to form a pointer to any non-named barrier objects.
+Named barrier types may not be used in ``alloca``.
It is undefined behavior to use a pointer to any part of a named barrier object
as the pointer operand of a regular memory access instruction or intrinsic.
@@ -6721,11 +6722,10 @@ Named barriers may be signaled by the intrinsics:
.. code-block:: llvm
- declare void @llvm.amdgcn.s.barrier.signal(i32 %barrier_hw_id)
declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count)
-If the second form is used and ``member_count`` is non-zero, the operation is
-an *initializing* signal, else it is *non*-initializing.
+If ``member_count`` is non-zero, the operation is an *initializing* signal,
+else it is *non*-initializing.
Named barriers may be initialized explicitly using:
>From 43377d8e1182962a471af2c657e97e9a92606ee1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nicolai.haehnle at amd.com>
Date: Thu, 30 Oct 2025 07:41:03 -0700
Subject: [PATCH 3/3] Explicitly say that there's no byte representation
---
llvm/docs/AMDGPUUsage.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 430faeadc86c3..518d9bee7f2ba 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1218,6 +1218,7 @@ recursively:
Named barrier types may not be used in ``alloca``.
+Named barriers do not have an underlying byte representation.
It is undefined behavior to use a pointer to any part of a named barrier object
as the pointer operand of a regular memory access instruction or intrinsic.
Pointers to named barrier objects are intended to be used with dedicated
More information about the llvm-commits
mailing list