[llvm] [Doc][AMDGPU] Add barrier execution & memory model (PR #170447)

Tue Dec 9 07:24:04 PST 2025

================
@@ -6553,6 +6567,297 @@ The Private Segment Buffer is always requested, but the Private Segment
 Wavefront Offset is only requested if it is used (see
 :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
 
+.. _amdgpu-amdhsa-execution-barriers:
+
+Execution Barriers
+~~~~~~~~~~~~~~~~~~
+
+.. note::
+
+  This specification is a work-in-progress (see lines annotated with :sup:`WIP`), and is not complete for GFX12.5.
+
+Threads can synchronize execution by performing barrier operations on barrier *objects* as described below:
+
+* Barrier *objects* have the following state:
+
+  * An unsigned non-zero positive integer *expected count*: counts the number of *signal* operations
+    expected for this barrier *object*.
+  * An unsigned positive integer *signal count*: counts the number of *signal* operations
+    already performed on this barrier *object*.
+
+      * The initial value of *signal count* is zero.
+      * When an operation causes *signal count* to be equal to *expected count*, the barrier is completed,
+        and the *signal count* is reset to zero.
+
+* Barrier operations are performed on barrier *objects*. A barrier operation is a dynamic instance
+  of one of the following:
+
+  * Barrier *init*.
+  * Barrier *join*.
+  * Barrier *leave*: decrements *expected count* of the barrier *object* by one.
+  * Barrier *signal*: increments *signal count* of the barrier *object* by one.
+  * Barrier *wait*.
+
+* Barrier modification operations are barrier operations that modify the barrier *object* state:
+
+  * Barrier *init*.
+  * Barrier *leave*.
+  * Barrier *signal*.
+
+* For a given barrier *object* ``BO``:
+
+  * There is exactly one barrier *init* for ``BO``. :sup:`WIP`
+  * *Thread-barrier-order<BO>* is the subset of *program-order* that only
+    relates barrier operations performed on ``BO``.
+  * Let ``S`` be the set of barrier modification operations on ``BO``, then
+    *barrier-modification-order<BO>* is a strict total order over ``S``. It is the order
+    in which ``BO`` observes barrier operations that change its state.
+
+    * *Barrier-modification-order<BO>* is consistent with *happens-before*.
+    * The first element in *barrier-modification-order<BO>* is a barrier *init*.
+      There is only one barrier *init* in *barrier-modification-order<BO>*
+
+  * *Barrier-joined-before<BO>* is a strict partial order over barrier operations on ``BO``.
+    A barrier *join* ``J`` is *barrier-joined-before<BO>* a barrier operation ``X`` if and only if all
+    of the following is true:
+
+    * ``J -> X`` in *thread-barrier-order<BO>*.
+    * There is no barrier *leave* ``L`` where ``J -> L -> X`` in *thread-barrier-order<BO>*.
+
+  * *Barrier-participates-in<BO>* is a partial order that relates barrier operations to barrier *waits*.
+    A barrier operation ``X`` may *barrier-participates-in<BO>* a barrier *wait* ``W`` if all of the following
+    is true:
+
+    * ``X`` and ``W`` are both performed on ``BO``.
+    * ``X`` is a barrier *signal* or *leave* operation.
+    * ``X`` does not *barrier-participates-in<BO>* another barrier *wait* ``W'`` in the same thread as ``W``.
+    * ``W -> X`` **not** in *thread-barrier-order<BO>*.
+
+* Let ``S`` be the set of barrier operations that *barrier-participates-in<BO>* a barrier *wait* ``W`` for some
+  barrier *object* ``BO``, then all of the following is true:
+
+  * ``S`` cannot be empty. :sup:`WIP`
+  * The elements of ``S`` are all ordered by a continuous interval of *barrier-modification-order<BO>*.
+  * Let ``A`` be the first operation of ``S`` in *barrier-modification-order<BO>*, then the *signal count* of ``BO``
+    is zero before ``A`` is performed.
+  * Let ``B`` be the last operation of ``S`` in *barrier-modification-order<BO>*, then the *signal count* and
+    *expected count* of ``BO`` are equal after ``B`` is performed. ``B`` is the only barrier operation in ``S``
+    that causes the *signal count* and *expected count* of ``BO`` to be equal.
+
+* For every barrier *signal* ``S`` performed on a barrier *object* ``BO``:
+
+  * The immediate successor of ``S`` in *thread-barrier-order<BO>* is a barrier *wait*. :sup:`WIP`
+
+* For every barrier *wait* ``W`` performed on a barrier *object* ``BO``:
+
+  * There is a barrier *join* ``J`` such that ``J -> W`` in *barrier-joined-before<BO>*. :sup:`WIP`
+
+* For every barrier *join* ``J`` performed on a barrier *object* ``BO``:
+
+  * There is no other barrier operation *thread-barrier-ordered<BO>* before ``J``. :sup:`WIP`
+  * ``J`` is not *barrier-joined-before<BO>* another barrier *join*.
+
+* For every barrier *leave* ``L`` performed on a barrier *object* ``BO``:
+
+  * There is no other barrier operation *thread-barrier-ordered<BO>* after ``L``. :sup:`WIP`
+
+* *Barrier-executes-before* is a strict partial order of all barrier operations. It is the transitive closure of all
+  the following orders:
+
+  * *Thread-barrier-order<BO>* for every barrier object ``BO``.
+  * *Barrier-participates-in<BO>* for every barrier object ``BO``.
+
+* *Barrier-executes-before* is consistent with *program-order*.
+
+*Barrier-executes-before* represents the order in which barrier operations will complete by relating operations
+from different threads together.
+For example, if ``A -> B`` in *barrier-executes-before*, then the execution of ``A`` must complete
+before the execution of ``B`` can complete.
+
+When a barrier *signal* ``S`` *barrier-executes-before* a barrier *wait* ``W``, ``S`` executes before ``W``
+**as-if** ``S`` is *program-ordered* before ``W``. Thus, all *dynamic instances* *program-ordered* before ``S``
+are known to have been executed before the *dynamic instances* *program-ordered* after ``W``.
+
+  .. note::
+
+    Barriers only synchronize execution, not memory: ``S -> W`` in *barrier-executes-before* does not imply
+    ``S`` *happens-before* ``W``. Refer to the :ref:`execution barriers memory model<amdgpu-amdhsa-execution-barriers-memory-model>`
+    to also synchronize memory.
+
+Target-Specific Properties
+++++++++++++++++++++++++++
+
+This section covers properties of barrier operation and *objects* that are specific to the implementation of
+barriers in AMDGPU hardware.
+
+Barrier operations have the following additional target-specific properties:
+
+* Barrier operations are convergent within a wave. All threads of a wavefront use the same barrier *object* when
+  performing any barrier operation.
+
+All barrier *objects* have the following additional target-specific properties:
+
+* Barrier *join* does not increment the *expected count* of a barrier *object*. The *expected count* is set
+  during initialization of the barrier by the hardware. :sup:`WIP`
+* Barrier *objects* are allocated and managed by the hardware.
+
+  * Barrier *objects* are stored in an inaccessible memory location.
+
+* Barrier *objects* exist within a *scope* (see :ref:`amdgpu-amdhsa-llvm-sync-scopes-table`),
+  and can only be accessed by threads in that *scope*.
+
+See :ref:`amdgpu-llvm-ir-intrinsics-table` for more information on how to perform barrier operations using
+LLVM IR intrinsic calls, or see the sections below to perform barrier operations using machine code.
+
+.. _amdgpu-amdhsa-execution-barriers-workgroup-barriers:
+
+Workgroup Barrier Operations
+############################
+
+.. note::
+
+  This section only applies when it is referenced by one of the target code sequence table below.
+
+This section covers properties of barrier operation on *workgroup barrier objects* implemented in AMDGPU
+hardware.
+
+The following barrier operations can never be performed by the shader on *workgroup barrier objects*.
+The hardware will instead perform them automatically under certain conditions.
+
+* Barrier *init*:
+
+  * The hardware automatically initializes *workgroup barrier objects* when a workgroup is launched:
+    The *expected count* of the barrier object is set to the number of waves in the workgroup
+  * The number of waves in the workgroup is not known until all waves of the workgroup have launched.
+    Thus, the *expected count* of *workgrou barrier object* can never be equal to its *signal count*
+    until all wavefronts of the workgroup launched.
----------------
nhaehnle wrote:

> > barrier operations can only appear in wave-uniform control flow, and
> 
> I believe the statement about `Barrier operations are convergent within a wave` is already enough to guarantee this. @ssahasra is that correct or do I need another statement to explicitly forbid using a barrier within non-uniform control flow ?

At least I didn't read the statement in this way. The word "convergent" is really only used as an adjective describing a static property of an operation.

(I notice one point of ConvergenceAndUniformity.rst there is a mention of "... threads that execute convergent operations convergently", but that should probably be changed to "... treads that execute convergent operations converge**dly**" or something like that.)

IMHO the clearest way is just to say "Barrier operations can only appear in wave-uniform control flow."

A downside of that wording is that we don't actually seem to define uniform control flow in ConvergenceAndUniformity.rst, but that could be easily fixed by saying something along the lines of "a thread executes a dynamic instance of an operation is in X-uniform control flow if it is converged with dynamic instances from all other threads of X".

https://github.com/llvm/llvm-project/pull/170447