[llvm] AMDGPU: Preliminary documentation for named barriers (PR #165502)

Nicolai Hähnle via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 30 07:41:32 PDT 2025


https://github.com/nhaehnle updated https://github.com/llvm/llvm-project/pull/165502

>From ee4eb7a2b1f1c478dc0dadade0f41bf3033cfb1f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nicolai.haehnle at amd.com>
Date: Tue, 28 Oct 2025 19:49:13 -0700
Subject: [PATCH 1/3] AMDGPU: Preliminary documentation for named barriers

---
 llvm/docs/AMDGPUUsage.rst | 179 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 7780c0a6dca0a..9a4c644a63f6e 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1179,6 +1179,53 @@ is conservatively correct for OpenCL.
                              other operations within the same address space.
      ======================= ===================================================
 
+Target Types
+------------
+
+The AMDGPU backend implements some target extension types.
+
+.. _amdgpu-types-named-barriers:
+
+Named Barriers
+~~~~~~~~~~~~~~
+
+Named barriers are represented as memory objects of type
+``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
+in the LDS address space. They do not occupy regular LDS memory, but their
+lifetime and allocation granularity matches that of global variables in LDS.
+
+The following types built from named barriers are supported in global variables,
+defined recursively:
+
+* a standalone ``target("amdgcn.named.barrier", 0)``
+* an array of supported types
+* a struct containing a single element of supported type
+
+.. code-block:: llvm
+
+      @bar = addrspace(3) global target("amdgcn.named.barrier", 0) undef
+      @foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
+      @baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef
+
+Barrier types may not be used in ``alloca``.
+
+The integral representation of a pointer to a valid named barrier is in the
+range ``0x0080'0010`` to ``0x0080'0100`` (inclusive). The representation is
+formed by the expression ``0x0080'0000 | (id << 4)``, where ``id`` is the
+hardware barrier ID. The integral representation of the null named barrier is
+``0x0080'0000``.
+
+It is not legal to attempt to form a pointer to any non-named barrier objects.
+
+It is undefined behavior to use a pointer to any part of a named barrier object
+as the pointer operand of a regular memory access instruction or intrinsic.
+Pointers to named barrier objects are intended to be used with dedicated
+intrinsics.
+
+We expand on the semantics of named barriers in
+:ref:`the memory model section <amdgpu-memory-model-named-barriers>`.
+
+
 LLVM IR Intrinsics
 ------------------
 
@@ -6621,6 +6668,138 @@ Multiple tags can be used at the same time to synchronize with more than one add
   better code optimization, at the cost of synchronizing additional address
   spaces.
 
+.. _amdgpu-memory-model-barriers:
+
+Hardware Barriers
++++++++++++++++++
+
+.. note::
+
+  This section is preliminary. The semantics described here are intended to be
+  formalized properly in the future.
+
+Hardware barriers synchronize execution between concurrently running waves using
+fixed function hardware. Intuitively, a set of waves are "members" of a barrier.
+Waves *signal* the barrier and later *wait* for it. Execution only proceeds past
+the *wait* once all member waves have *signaled* the barrier.
+
+Formally, barriers affect semantics in exactly two ways. First, they affect
+forward progress. Waiting on a barrier that never completes (is not signaled
+sufficiently) prevents forward progress and therefore, given the assumption of
+forward progress, is undefined behavior. Second, barrier operations can pair
+with fences to contribute *synchronizes-with* relations in the memory model.
+
+Roughly speaking:
+
+- Release fences pair with barrier signal operations that are later in program
+  order
+- Barrier wait operations pair with acquire fences that are later in program
+  order
+- If a barrier signal operation contributes to allowing a wait operation to
+  complete, then the corresponding paired fences can synchronize-with each
+  other (given compatible sync scopes and memory model relaxation annotations)
+
+Default Barriers
+################
+
+There is a default workgroup barrier and a default cluster barrier. All waves
+of a workgroup and cluster are members of the same default workgroup and
+cluster barriers, respectively.
+
+.. _amdgpu-memory-model-named-barriers:
+
+Named Barriers
+##############
+
+All named barrier operations must occur in wave-uniform control flow. All
+arguments of named barrier intrinsics must be wave-uniform.
+
+Named barriers are allocated as global variables of
+:ref:`a target extension type <amdgpu-types-named-barriers>`.
+
+Named barriers may be signaled by the intrinsics:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.signal(i32 %barrier_hw_id)
+  declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count)
+
+If the second form is used and ``member_count`` is non-zero, the operation is
+an *initializing* signal, else it is *non*-initializing.
+
+Named barriers may be initialized explicitly using:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.init(ptr addrspace(3) %barrier_ptr, i32 %member_count)
+
+It is possible to "leave" a named barrier. This decrements the named barrier's
+member count and completes the barrier if all other members have signaled it:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.leave(i32 %barrier_type)
+
+``barrier_type`` must be set to ``1``.
+
+Note that leaving a named barrier is not exactly the opposite of joining a
+barrier (for example, joining a barrier does not change its member count).
+
+Leaving implicitly *joins* (see below) a null named barrier.
+
+Signal, leave, and initializing operations on the same named barrier must obey
+certain ordering constraints:
+
+* Non-initializing signals must be ordered after some initializing signal or an
+  explicit initializing operation.
+* Explicit initializing operations must not race signal or leave operations.
+* Initializing signal operations must not race leave operations.
+* Initializing signal operations with contradicting member counts must not race
+  each other.
+
+The details of how these orders can be established and races prevented are tbd.
+Using a default workgroup or cluster barrier in the natural way is guaranteed to
+be sufficient.
+
+In order to wait for a named barrier, a wave must first *join* the named barrier
+using:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) %barrier_ptr)
+
+The named barrier may then be waited for using:
+
+.. code-block:: llvm
+
+  declare void @llvm.amdgcn.s.barrier.wait(i32 %barrier_type)
+
+... with ``barrier_type`` set to ``1``.
+
+Signal, leave, join, and wait operations must obey certain ordering constraints.
+The details are tbd. Satisfying the following rules is guaranteed to be
+sufficient:
+
+* Signal or wait for a named barrier only if it is the most recent to have been
+  joined in program order.
+* Signal or leave a named barrier only if the number of prior signaling
+  operations on that named barrier since the most recent join in program order
+  is equal to the number of prior wait operations on that named barrier since
+  the most recent join in program order.
+* Wait for a named barrier only if the number of prior signaling operations on
+  that named barrier since the most recent join in program order is one larger
+  than the number of prior wait operations on that named barrier since the most
+  recent join in program order.
+* Do not signal a named barrier or wait for it in program order after leaving it.
+
+Additionally, use signal, leave, and wait operations on a named barrier from a
+consistent associated set of waves that is determined at initialization time and
+whose initial size is the member count used at initialization. The set of waves
+may shrink with leave operations. Operations on a named barrier object with
+conflicting sets of waves must not race. The details of this rule and how an
+ordering can be established to prevent a race is tbd. Using a default workgroup
+or cluster barrier in the natural way is guaranteed to be sufficient.
+
 .. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
 
 Memory Model GFX6-GFX9

>From 0681f1fdde64cb2692522d81655512cf8e123be1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nicolai.haehnle at amd.com>
Date: Wed, 29 Oct 2025 10:16:39 -0700
Subject: [PATCH 2/3] Address some review comments

---
 llvm/docs/AMDGPUUsage.rst | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 9a4c644a63f6e..430faeadc86c3 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1189,15 +1189,19 @@ The AMDGPU backend implements some target extension types.
 Named Barriers
 ~~~~~~~~~~~~~~
 
-Named barriers are represented as memory objects of type
-``target("amdgcn.named.barrier", 0)``. They are allocated as global variables
-in the LDS address space. They do not occupy regular LDS memory, but their
-lifetime and allocation granularity matches that of global variables in LDS.
+Named barriers are fixed function hardware barrier objects that are available
+in gfx12.5+ in addition to the traditional default barriers.
 
-The following types built from named barriers are supported in global variables,
-defined recursively:
+In LLVM IR, named barriers are represented by global variables of type
+``target("amdgcn.named.barrier", 0)`` in the LDS address space. Named barrier
+global variables do not occupy actual LDS memory, but their lifetime and
+allocation scope matches that of global variables in LDS. Programs in LLVM IR
+refer to named barriers using pointers.
 
-* a standalone ``target("amdgcn.named.barrier", 0)``
+The following named barrier types are supported in global variables, defined
+recursively:
+
+* a single, standalone ``target("amdgcn.named.barrier", 0)``
 * an array of supported types
 * a struct containing a single element of supported type
 
@@ -1207,15 +1211,12 @@ defined recursively:
       @foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
       @baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef
 
-Barrier types may not be used in ``alloca``.
+      ...
 
-The integral representation of a pointer to a valid named barrier is in the
-range ``0x0080'0010`` to ``0x0080'0100`` (inclusive). The representation is
-formed by the expression ``0x0080'0000 | (id << 4)``, where ``id`` is the
-hardware barrier ID. The integral representation of the null named barrier is
-``0x0080'0000``.
+      %foo.i = getelementptr [2 x target("amdgcn.named.barrier", 0)], ptr addrspace(3) @foo, i32 0, i32 %i
+      call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %foo.i, i32 0)
 
-It is not legal to attempt to form a pointer to any non-named barrier objects.
+Named barrier types may not be used in ``alloca``.
 
 It is undefined behavior to use a pointer to any part of a named barrier object
 as the pointer operand of a regular memory access instruction or intrinsic.
@@ -6721,11 +6722,10 @@ Named barriers may be signaled by the intrinsics:
 
 .. code-block:: llvm
 
-  declare void @llvm.amdgcn.s.barrier.signal(i32 %barrier_hw_id)
   declare void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %barrier_ptr, i32 %member_count)
 
-If the second form is used and ``member_count`` is non-zero, the operation is
-an *initializing* signal, else it is *non*-initializing.
+If ``member_count`` is non-zero, the operation is an *initializing* signal,
+else it is *non*-initializing.
 
 Named barriers may be initialized explicitly using:
 

>From 43377d8e1182962a471af2c657e97e9a92606ee1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nicolai.haehnle at amd.com>
Date: Thu, 30 Oct 2025 07:41:03 -0700
Subject: [PATCH 3/3] Explicitly say that there's no byte representation

---
 llvm/docs/AMDGPUUsage.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 430faeadc86c3..518d9bee7f2ba 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1218,6 +1218,7 @@ recursively:
 
 Named barrier types may not be used in ``alloca``.
 
+Named barriers do not have an underlying byte representation.
 It is undefined behavior to use a pointer to any part of a named barrier object
 as the pointer operand of a regular memory access instruction or intrinsic.
 Pointers to named barrier objects are intended to be used with dedicated



More information about the llvm-commits mailing list