[llvm] [AMDGPU][Doc] GFX12.5 Barrier Execution Model (PR #185632)

Pierre van Houtryve via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 1 01:37:12 PDT 2026


================
@@ -6950,121 +6975,233 @@ Informally, we can deduce from the above formal model that execution barriers be
 * *Barrier-executes-before* relates the dynamic instances of operations from different threads together.
   For example, if ``A -> B`` in *barrier-executes-before*, then the execution of ``A`` must complete
   before the execution of ``B`` can complete.
-* When a barrier *signal* or *leave* causes the *signal count* of a barrier *object* to be identical to the
-  *expected count*, the *signal count* is reset to zero, and threads that have *joined* the barrier *object*
-  will:
 
-  * Wake-up if they were sleeping because of a barrier *wait*, **or**
-  * Skip the next barrier *wait* operation if they have not previously *waited*.
+  * This property can also be combined with *program-order*. For example, let two (non-barrier) operations
+    ``X`` and ``Y`` where ``X -> A`` and ``B -> Y`` in *program-order*, then we know that the execution
+    of ``X`` completes before the execution of ``Y`` does.
 
 * Barriers do not complete "out-of-thin-air"; a barrier *wait* ``W`` cannot depend on a barrier operation
   ``X`` to complete if ``W -> X`` in *barrier-executes-before*.
-* It is undefined behavior to operate on an uninitialized barrier.
+* It is undefined behavior to operate on an uninitialized barrier object.
 * It is undefined behavior for a barrier *wait* to never complete.
+* It is not mandatory to *drop* a barrier after *joining* it. The operations are not opposites; *drop*
+  affects future barrier operations by decrementing the *expected count* of the barrier *object*, which
+  can only be undone by re-*initializing* the barrier.
+* A thread may not *arrive* at then *drop* a barrier *object* unless the barrier completes before the
+  barrier *drop*. Incrementing the *signal count* and decrementing the *expected count* directly
+  after may cause undefined behavior.
+* *Joining* a barrier is only useful if the thread will *wait* on that same barrier *object* later.
 
 Execution Barrier GFX6-11
 +++++++++++++++++++++++++
 
 Targets from GFX6 through GFX11 included do not have the split barrier feature.
-The barrier *signal* and barrier *wait* operations cannot be performed independently.
+The barrier *arrive* and barrier *wait* operations **cannot** be performed independently.
 
 There is only one *workgroup barrier* object of ``workgroup`` scope that is implicitly used
 by all barrier operations.
 
-  .. table:: AMDHSA Execution Barriers Code Sequences GFX6-GFX11
-     :name: amdgpu-amdhsa-execution-barriers-code-sequences-gfx6-gfx11-table
-
-     ===================== ====================== ===========================================================
-     Barrier Operation(s)  Barrier *Object*       AMDGPU Machine Code
-     ===================== ====================== ===========================================================
-     **Init, Join and Leave**
-     --------------------------------------------------------------------------------------------------------
-     *init*                - *Workgroup barrier*  See barrier *init* in
-                                                  :ref:`amdgpu-amdhsa-execution-barriers-workgroup-barriers`.
-
-     *join*                - *Workgroup barrier*  See barrier *join* in
-                                                  :ref:`amdgpu-amdhsa-execution-barriers-workgroup-barriers`.
-
-     *leave*               - *Workgroup barrier*  See barrier *leave* in
-                                                  :ref:`amdgpu-amdhsa-execution-barriers-workgroup-barriers`.
-
-     **Signal and Wait**
-     --------------------------------------------------------------------------------------------------------
-     *signal* then *wait*  - *Workgroup barrier*  | **BackOffBarrier**
-                                                  | ``s_barrier``
-                                                  | **No BackOffBarrier**
-                                                  | ``s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)``
-                                                  | ``s_waitcnt_vscnt null, 0x0``
-                                                  | ``s_barrier``
-
-                                                  - If the target does not have the BackOffBarrier feature,
-                                                    then there cannot be any outstanding memory operations
-                                                    before issuing the ``s_barrier`` instruction.
-                                                  - The waitcnts can independently be moved earlier, or
-                                                    removed entirely as long as the associated
-                                                    counter remains at zero before issuing the
-                                                    ``s_barrier`` instruction.
-
-     *signal*              - *Workgroup barrier*  Not available separately, see *signal* then *wait*
-
-     *wait*                - *Workgroup barrier*  Not available separately, see *signal* then *wait*
-     ===================== ====================== ===========================================================
+The following code sequences can be used to implement the barrier operations described by the above specification:
+
+.. table:: AMDHSA Execution Barriers Code Sequences GFX6-GFX11
+    :name: amdgpu-amdhsa-execution-barriers-code-sequences-gfx6-gfx11-table
+    :widths: 15 15 70
+
+    ===================== ====================== ===========================================================
+    Barrier Operation(s)  Barrier *Object*       AMDGPU Machine Code
+    ===================== ====================== ===========================================================
+    **Init, Join and Drop**
+    --------------------------------------------------------------------------------------------------------
+    *init*                - *Workgroup barrier*  Automatically initialized by the hardware when a workgroup
+                                                 is launched. The *expected count* of this barrier is set
+                                                 to the number of waves in the workgroup.
+
+    *join*                - *Workgroup barrier*  Any thread launched within a workgroup automatically *joins*
+                                                 this barrier *object*.
+
+    *drop*                - *Workgroup barrier*  When a thread ends, it automatically *drops* this barrier
+                                                 *object* if it had previously *joined* it.
+
+    **Arrive and Wait**
+    --------------------------------------------------------------------------------------------------------
+    *arrive* then *wait*  - *Workgroup barrier*  | **BackOffBarrier**
+                                                 | ``s_barrier``
+                                                 | **No BackOffBarrier**
+                                                 | ``s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)``
+                                                 | ``s_waitcnt_vscnt null, 0x0``
+                                                 | ``s_barrier``
+
+                                                 - If the target does not have the BackOffBarrier feature,
+                                                   then there cannot be any outstanding memory operations
+                                                   before issuing the ``s_barrier`` instruction.
+                                                 - The waitcnts can independently be moved earlier, or
+                                                   removed entirely as long as the associated
+                                                   counter remains at zero before issuing the
+                                                   ``s_barrier`` instruction.
+                                                 - The ``s_barrier`` instruction cannot complete
+                                                   before all waves of the workgroup have launched.
+
+    *arrive*              - *Workgroup barrier*  Not available separately, see *arrive* then *wait*
+
+    *wait*                - *Workgroup barrier*  Not available separately, see *arrive* then *wait*
+    ===================== ====================== ===========================================================
 
 Execution Barrier GFX12
 +++++++++++++++++++++++
 
-.. note::
-
-  This is incomplete for GFX12.5.
-
 GFX12 targets have the split-barrier feature, and also offer multiple barrier *objects* per workgroup
-(see :ref:`amdgpu-amdhsa-execution-barriers-ids-gfx12-table`).
-
-  .. table:: AMDHSA Execution Barriers Code Sequences GFX12
-     :name: amdgpu-amdhsa-execution-barriers-code-sequences-gfx12-table
-
-     ===================== =========================== ===========================================================
-     Barrier Operation(s)  Barrier *Object*            AMDGPU Machine Code
-     ===================== =========================== ===========================================================
-     **Init, Join and Leave**
-     -------------------------------------------------------------------------------------------------------------
-     *init*                - *Workgroup barrier*       See barrier *init* in
-                           - *Workgroup trap barrier*  :ref:`amdgpu-amdhsa-execution-barriers-workgroup-barriers`.
-
-     *join*                - *Workgroup barrier*       See barrier *join* in
-                           - *Workgroup trap barrier*  :ref:`amdgpu-amdhsa-execution-barriers-workgroup-barriers`.
-
-     *leave*               - *Workgroup barrier*       See barrier *leave* in
-                           - *Workgroup trap barrier*  :ref:`amdgpu-amdhsa-execution-barriers-workgroup-barriers`.
-
-     **Signal and Wait**
-     -------------------------------------------------------------------------------------------------------------
-
-     *signal*              - *Workgroup barrier*       | ``s_barrier_signal -1``
-                                                       | Or
-                                                       | ``s_barrier_signal_isfirst -1``
+(see :ref:`amdgpu-amdhsa-execution-barriers-ids-gfx12-table`). Each barrier *object* has a unique barrier ID that
+instructions use to operate on them.
 
+GFX12.5 additionally introduces new barrier *objects* that offer more flexibility for synchronizing the execution
+of a subset of waves of a workgroup, or synchronizing execution across workgroups within a workgroup cluster.
 
-     *wait*                - *Workgroup barrier*       ``s_barrier_wait -1``.
-
-     *signal*              - *Workgroup trap barrier*  Not available to the shader.
-
-     *wait*                - *Workgroup trap barrier*  Not available to the shader.
-     ===================== =========================== ===========================================================
-
-
-  .. table:: AMDHSA Execution Barriers IDs GFX12
-     :name: amdgpu-amdhsa-execution-barriers-ids-gfx12-table
-
-     =========== ============== ==============================================================
-     Barrier ID  Scope          Description
-     =========== ============== ==============================================================
-     ``-2``      ``workgroup``  *Workgroup trap barrier*, dedicated for the trap handler and
-                                only available in privileged execution mode
-                                (not accessible by the shader).
+.. note::
 
-     ``-1``      ``workgroup``  *Workgroup barrier*.
-     =========== ============== ==============================================================
+  Check the :ref:`the table below<amdgpu-amdhsa-execution-barriers-ids-gfx12-table>` to determine which barrier IDs are
+  available to the shader on a given target.
+
+
+The following code sequences can be used to implement the barrier operations described by the above specification:
+
+.. table:: AMDHSA Execution Barriers Code Sequences GFX12
+    :name: amdgpu-amdhsa-execution-barriers-code-sequences-gfx12-table
+    :widths: 15 15 70
+
+    ===================== =========================== ===========================================================
+    Barrier Operation(s)  Barrier ID                  AMDGPU Machine Code
+    ===================== =========================== ===========================================================
+    **Init, Join and Drop**
+    -------------------------------------------------------------------------------------------------------------
+    *init*                - ``-2``, ``-1``            Automatically initialized by the hardware when a workgroup
+                                                      is launched. The *expected count* of this barrier is set
+                                                      to the number of waves in the workgroup.
+
+    *init*                - ``-4``, ``-3``            Automatically initialized by the hardware when a workgroup
+                                                      is launched as part of a workgroup cluster.
+                                                      The *expected count* of this barrier is set to the number
+                                                      of workgroups in the workgroup cluster.
+
+    *init*                - ``0``                     Automatically initialized by the hardware and always
+                                                      available. This barrier *object* is opaque and immutable
+                                                      as all operations other than barrier *join* are no-ops.
+
+    *init*                - ``[1, 16]``               | ``s_barrier_init <N>``
+
+                                                      - ``<N>`` is an immediate constant, or stored in the lower
+                                                        half of ``m0``.
+                                                      - The value to set as the *expected count* of the barrier
+                                                        is stored in the upper half of ``m0``.
+
+    *join*                - ``-2``, ``-1``            Any thread launched within a workgroup automatically *joins*
+                                                      this barrier *object*.
+
+    *join*                - ``-4``, ``-3``            Any thread launched within a workgroup cluster
+                                                      automatically *joins* this barrier *object*.
+
+    *join*                - ``0``                     | ``s_barrier_join <N>``
+                          - ``[1, 16]``
+                                                      - ``<N>`` is an immediate constant, or stored in the lower
+                                                        half of ``m0``.
+
+    *drop*                - ``0``                     | ``s_barrier_leave``
+                          - ``[1, 16]``
+                                                      - ``s_barrier_leave`` takes no operand. It can only be used
+                                                        to *drop* a barrier *object* ``BO`` if ``BO`` was
+                                                        previously *joined* using ``s_barrier_join``.
+                                                      - *Drops* the barrier *object* ``BO`` if and only if
+                                                        there is a barrier *join* ``J`` such that ``J`` is
+                                                        *barrier-joined-before* this barrier
+                                                        *drop* operation.
+
+    *drop*                - ``-2``, ``-1``            When a thread ends, it automatically *drops* this barrier
+                          - ``-4``, ``-3``            *object* if it had previously *joined* it.
+
+    **Arrive and Wait**
+    -------------------------------------------------------------------------------------------------------------
+
+    *arrive*              - ``-4``, ``-3``            | ``s_barrier_signal <N>``
+                          - ``-2``, ``-1``            | Or
+                          - ``0``                     | ``s_barrier_signal_isfirst <N>``
+                          - ``[1, 16]``
+                                                      - ``<N>`` is an immediate constant, or stored in bits ``[4:0]`` of ``m0``.
+                                                      - The ``_isfirst`` variant sets ``SCC=1`` if this wave is the first
+                                                        to signal the barrier, otherwise ``SCC=0``.
+                                                      - For barrier *objects* ``[1, 16]``: When using ``m0`` as an operand,
+                                                        if there is a non-zero value contained in the bits ``[22:16]`` of ``m0``,
+                                                        the *expected count* of the barrier *object* is set to that value before
+                                                        the *arrive count* of the barrier *object* is incremented.
+                                                        The new *expected count* value must be greater than or equal to the old
+                                                        value, otherwise the behavior is undefined.
----------------
Pierre-vh wrote:

Done

https://github.com/llvm/llvm-project/pull/185632


More information about the llvm-commits mailing list