[llvm] [Doc][AMDGPU] Add barrier execution & memory model (PR #170447)
Pierre van Houtryve via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 5 02:05:14 PST 2025
================
@@ -6553,6 +6567,281 @@ The Private Segment Buffer is always requested, but the Private Segment
Wavefront Offset is only requested if it is used (see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+.. _amdgpu-amdhsa-execution-barriers:
+
+Execution Barriers
+~~~~~~~~~~~~~~~~~~
+
+.. note::
+
+ This specification is a work-in-progress (see lines annotated with :sup:`WIP`), and is not complete for GFX12.5.
+
+Threads can synchronize execution by performing barrier operations on barrier *objects* as described below:
+
+* Barrier *objects* have the following state:
+
+ * An unsigned non-zero positive integer *expected count*: counts the number of *signal* operations
+ expected for this barrier *object*.
+ * An unsigned positive integer *signal count*: counts the number of *signal* operations
+ already performed on this barrier *object*.
+
+ * The initial value of *signal count* is zero.
+ * When an operation causes *signal count* to be equal to *expected count*, the barrier is completed,
+ and the *signal count* is reset to zero.
+
+* Barrier operations are performed on barrier *objects*. A barrier operation is a dynamic instance
+ of one of the following:
+
+ * Barrier *init*.
+ * Barrier *join*.
+ * Barrier *leave*: decrements *expected count* of the barrier *object* by one.
+ * Barrier *signal*: increments *signal count* of the barrier *object* by one.
+ * Barrier *wait*.
+
+* Barrier modification operations are barrier operations that modify the barrier *object* state:
+
+ * Barrier *init*.
+ * Barrier *leave*.
+ * Barrier *signal*.
+
+* For a given barrier *object* ``BO``:
+
+ * There is exactly one barrier *init* for ``BO``. :sup:`WIP`
+ * *Thread-barrier-order<BO>* is the subset of *program-order* that only
+ relates barrier operations performed on ``BO``.
+ * Let ``S`` be the set of barrier modification operations on ``BO``, then
+ *barrier-modification-order<BO>* is a strict total order over ``S``. It is the order
+ in which ``BO`` observes barrier operations that change its state.
+
+ * *Barrier-modification-order<BO>* is consistent with *happens-before*.
+ * The first element in *barrier-modification-order<BO>* is a barrier *init*.
+ There is only one barrier *init* in *barrier-modification-order<BO>*
+
+ * *Barrier-joined-before<BO>* is a strict partial order over barrier operations on ``BO``.
+ A barrier *join* ``J`` is *barrier-joined-before<BO>* a barrier operation ``X`` if and only if all
+ of the following is true:
+
+ * ``J -> X`` in *thread-barrier-order<BO>*.
+ * There is no barrier *leave* ``L`` where ``J -> L -> X`` in *thread-barrier-order<BO>*.
+
+ * *Barrier-participates-in<BO>* is a partial order that relates barrier operations to barrier *waits*.
+ A barrier operation ``X`` may *barrier-participates-in<BO>* a barrier *wait* ``W`` if all of the following
+ is true:
+
+ * ``X`` and ``W`` are both performed on ``BO``.
+ * ``X`` is a barrier *signal* or *leave* operation.
+ * ``X`` does not *barrier-participates-in<BO>* another barrier *wait* ``W'`` in the same thread as ``W``.
+ * ``W -> X`` **not** in *thread-barrier-order<BO>*.
+
+* Let ``S`` be the set of barrier operations that *barrier-participates-in<BO>* a barrier *wait* ``W`` for some
+ barrier *object* ``BO``, then all of the following is true:
+
+ * ``S`` cannot be empty. :sup:`WIP`
+ * The elements of ``S`` are all ordered by a continuous interval of *barrier-modification-order<BO>*.
+ * Let ``A`` be the first operation of ``S`` in *barrier-modification-order<BO>*, then the *signal count* of ``BO``
+ is zero before ``A`` is performed.
+ * Let ``B`` be the last operation of ``S`` in *barrier-modification-order<BO>*, then the *signal count* and
+ *expected count* of ``BO`` are equal after ``B`` is performed. ``B`` is the only barrier operation in ``S``
+ that causes the *signal count* and *expected count* of ``BO`` to be equal.
+
+* For every barrier *signal* ``S`` performed on a barrier *object* ``BO``:
+
+ * The immediate successor of ``S`` in *thread-barrier-order<BO>* is a barrier *wait*. :sup:`WIP`
+
+* For every barrier *wait* ``W`` performed on a barrier *object* ``BO``:
+
+ * There is a barrier *join* ``J`` such that ``J -> W`` in *barrier-joined-before<BO>*. :sup:`WIP`
+
+* For every barrier *join* ``J`` performed on a barrier *object* ``BO``:
+
+ * There is no other barrier operation *thread-barrier-ordered<BO>* before ``J``. :sup:`WIP`
+ * ``J`` is not *barrier-joined-before<BO>* another barrier *join*.
+
+* For every barrier *leave* ``L`` performed on a barrier *object* ``BO``:
+
+ * There is no other barrier operation *thread-barrier-ordered<BO>* after ``L``. :sup:`WIP`
+
+* *Barrier-executes-before* is a strict partial order of all barrier operations. It is a union of all the following
+ orders:
+
+ * *Thread-barrier-order<BO>* for every barrier object ``BO``.
+ * *Barrier-participates-in<BO>* for every barrier object ``BO``.
+
+* *Barrier-executes-before* is consistent with *program-order*.
+
+*Barrier-executes-before* represents the order in which barrier operations will complete by relating operations
+from different threads together.
+For example, if ``A -> B`` in *barrier-executes-before*, then the execution of ``A`` must complete
+before the execution of ``B`` can complete.
+
+When a barrier *signal* ``S`` *barrier-executes-before* a barrier *wait* ``W``, ``S`` executes before ``W``
+**as-if** ``S`` is *program-ordered* before ``W``. Thus, all *dynamic instances* *program-ordered* before ``S``
+are known to have been executed before the *dynamic instances* *program-ordered* after ``W``.
+
+ .. note::
+
+ Barriers only synchronize execution, not memory: ``S -> W`` in *barrier-executes-before* does not imply
+ ``S`` *happens-before* ``W``. Refer to the :ref:`execution barriers memory model<amdgpu-amdhsa-execution-barriers-memory-model>`
+ to also synchronize memory.
+
+Target-Specific Properties
----------------
Pierre-vh wrote:
> Isn't everything in this filed target-specific? :)
Sort of, this is a bit of a stylistic choice. I'm trying to keep the specification as generic and detached from the hardware as possible (within reason), so it can cover any future addition, or any future intrinsic we could add for execution synchronization purposes.
For example if we want to add a generic barrier that uses atomics+sleep, it could still rely on this specification to define its execution and memory model.
https://github.com/llvm/llvm-project/pull/170447
More information about the llvm-commits
mailing list