[llvm] 6816078 - [NFC][AMDGPU] Restructure the AMDGPU memory model description

via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 2 13:32:58 PST 2020


Author: Tony
Date: 2020-11-02T21:32:20Z
New Revision: 68160789c12099b281e6a302ea75094272d92efb

URL: https://github.com/llvm/llvm-project/commit/68160789c12099b281e6a302ea75094272d92efb
DIFF: https://github.com/llvm/llvm-project/commit/68160789c12099b281e6a302ea75094272d92efb.diff

LOG: [NFC][AMDGPU] Restructure the AMDGPU memory model description

Separate the AMDGPU memory model description into separate sections
for each architecture.

Differential Revision: https://reviews.llvm.org/D90548

Added: 
    

Modified: 
    llvm/docs/AMDGPUUsage.rst

Removed: 
    


################################################################################
diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 52cfbd7f4c15..0732d1a79ab3 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -3468,7 +3468,7 @@ alignment.
                                                        Reserved, must be 0.
                                                      GFX10
                                                        Controls the behavior of the
-                                                       waitcnt's vmcnt and vscnt
+                                                       s_waitcnt's vmcnt and vscnt
                                                        counters.
 
                                                        - If 0 vmcnt reports completion
@@ -4140,24 +4140,22 @@ Wavefront Offset is only requested if it is used (see
 Memory Model
 ~~~~~~~~~~~~
 
-This section describes the mapping of LLVM memory model onto AMDGPU machine code
-(see :ref:`memmodel`).
+This section describes the mapping of the LLVM memory model onto AMDGPU machine
+code (see :ref:`memmodel`).
 
 The AMDGPU backend supports the memory synchronization scopes specified in
 :ref:`amdgpu-memory-scopes`.
 
-The code sequences used to implement the memory model are defined in table
-:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx10-table`.
-
-The sequences specify the order of instructions that a single thread must
-execute. The ``s_waitcnt`` and ``buffer_wbinvl1_vol`` are defined with respect
+The code sequences used to implement the memory model specify the order of
+instructions that a single thread must execute. The ``s_waitcnt`` and cache
+management instructions such as ``buffer_wbinvl1_vol`` are defined with respect
 to other memory instructions executed by the same thread. This allows them to be
 moved earlier or later which can allow them to be combined with other instances
-of the same instruction, or hoisted/sunk out of loops to improve
-performance. Only the instructions related to the memory model are given;
-additional ``s_waitcnt`` instructions are required to ensure registers are
-defined before being used. These may be able to be combined with the memory
-model ``s_waitcnt`` instructions as described above.
+of the same instruction, or hoisted/sunk out of loops to improve performance.
+Only the instructions related to the memory model are given; additional
+``s_waitcnt`` instructions are required to ensure registers are defined before
+being used. These may be able to be combined with the memory model ``s_waitcnt``
+instructions as described above.
 
 The AMDGPU backend supports the following memory models:
 
@@ -4183,6 +4181,79 @@ operations.
 ``buffer/global/flat_load/store/atomic`` instructions to global memory are
 termed vector memory operations.
 
+Private address space uses ``buffer_load/store`` using the scratch V#
+(GFX6-GFX8), or ``scratch_load/store`` (GFX9-GFX10). Since only a single thread
+is accessing the memory, atomic memory orderings are not meaningful, and all
+accesses are treated as non-atomic.
+
+Constant address space uses ``buffer/global_load`` instructions (or equivalent
+scalar memory instructions). Since the constant address space contents do not
+change during the execution of a kernel dispatch it is not legal to perform
+stores, and atomic memory orderings are not meaningful, and all accesses are
+treated as non-atomic.
+
+A memory synchronization scope wider than work-group is not meaningful for the
+group (LDS) address space and is treated as work-group.
+
+The memory model does not support the region address space which is treated as
+non-atomic.
+
+Acquire memory ordering is not meaningful on store atomic instructions and is
+treated as non-atomic.
+
+Release memory ordering is not meaningful on load atomic instructions and is
+treated a non-atomic.
+
+Acquire-release memory ordering is not meaningful on load or store atomic
+instructions and is treated as acquire and release respectively.
+
+The memory order also adds the single thread optimization constraints defined in
+table
+:ref:`amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-table`.
+
+  .. table:: AMDHSA Memory Model Single Thread Optimization Constraints
+     :name: amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-table
+
+     ============ ==============================================================
+     LLVM Memory  Optimization Constraints
+     Ordering
+     ============ ==============================================================
+     unordered    *none*
+     monotonic    *none*
+     acquire      - If a load atomic/atomicrmw then no following load/load
+                    atomic/store/store atomic/atomicrmw/fence instruction can be
+                    moved before the acquire.
+                  - If a fence then same as load atomic, plus no preceding
+                    associated fence-paired-atomic can be moved after the fence.
+     release      - If a store atomic/atomicrmw then no preceding load/load
+                    atomic/store/store atomic/atomicrmw/fence instruction can be
+                    moved after the release.
+                  - If a fence then same as store atomic, plus no following
+                    associated fence-paired-atomic can be moved before the
+                    fence.
+     acq_rel      Same constraints as both acquire and release.
+     seq_cst      - If a load atomic then same constraints as acquire, plus no
+                    preceding sequentially consistent load atomic/store
+                    atomic/atomicrmw/fence instruction can be moved after the
+                    seq_cst.
+                  - If a store atomic then the same constraints as release, plus
+                    no following sequentially consistent load atomic/store
+                    atomic/atomicrmw/fence instruction can be moved before the
+                    seq_cst.
+                  - If an atomicrmw/fence then same constraints as acq_rel.
+     ============ ==============================================================
+
+The code sequences used to implement the memory model are defined in the
+following sections:
+
+* :ref:`amdgpu-amdhsa-memory-model-gfx6-gfx9`
+* :ref:`amdgpu-amdhsa-memory-model-gfx10`
+
+.. _amdgpu-amdhsa-memory-model-gfx6-gfx9:
+
+Memory Model GFX6-GFX9
+++++++++++++++++++++++
+
 For GFX6-GFX9:
 
 * Each agent has multiple shader arrays (SA).
@@ -4233,110 +4304,13 @@ For GFX6-GFX9:
 * The L2 cache can be kept coherent with other agents on some targets, or ranges
   of virtual addresses can be set up to bypass it to ensure system coherence.
 
-For GFX10:
-
-* Each agent has multiple shader arrays (SA).
-* Each SA has multiple work-group processors (WGP).
-* Each WGP has multiple compute units (CU).
-* Each CU has multiple SIMDs that execute wavefronts.
-* The wavefronts for a single work-group are executed in the same
-  WGP. In CU wavefront execution mode the wavefronts may be executed by
-  
diff erent SIMDs in the same CU. In WGP wavefront execution mode the
-  wavefronts may be executed by 
diff erent SIMDs in 
diff erent CUs in the same
-  WGP.
-* Each WGP has a single LDS memory shared by the wavefronts of the work-groups
-  executing on it.
-* All LDS operations of a WGP are performed as wavefront wide operations in a
-  global order and involve no caching. Completion is reported to a wavefront in
-  execution order.
-* The LDS memory has multiple request queues shared by the SIMDs of a
-  WGP. Therefore, the LDS operations performed by 
diff erent wavefronts of a
-  work-group can be reordered relative to each other, which can result in
-  reordering the visibility of vector memory operations with respect to LDS
-  operations of other wavefronts in the same work-group. A ``s_waitcnt
-  lgkmcnt(0)`` is required to ensure synchronization between LDS operations and
-  vector memory operations between wavefronts of a work-group, but not between
-  operations performed by the same wavefront.
-* The vector memory operations are performed as wavefront wide operations.
-  Completion of load/store/sample operations are reported to a wavefront in
-  execution order of other load/store/sample operations performed by that
-  wavefront.
-* The vector memory operations access a vector L0 cache. There is a single L0
-  cache per CU. Each SIMD of a CU accesses the same L0 cache. Therefore, no
-  special action is required for coherence between the lanes of a single
-  wavefront. However, a ``buffer_gl0_inv`` is required for coherence between
-  wavefronts executing in the same work-group as they may be executing on SIMDs
-  of 
diff erent CUs that access 
diff erent L0s. A ``buffer_gl0_inv`` is also
-  required for coherence between wavefronts executing in 
diff erent work-groups
-  as they may be executing on 
diff erent WGPs.
-* The scalar memory operations access a scalar L0 cache shared by all wavefronts
-  on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
-  operations are used in a restricted way so do not impact the memory model. See
-  :ref:`amdgpu-amdhsa-memory-spaces`.
-* The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
-  the same SA. Therefore, no special action is required for coherence between
-  the wavefronts of a single work-group. However, a ``buffer_gl1_inv`` is
-  required for coherence between wavefronts executing in 
diff erent work-groups
-  as they may be executing on 
diff erent SAs that access 
diff erent L1s.
-* The L1 caches have independent quadrants to service disjoint ranges of virtual
-  addresses.
-* Each L0 cache has a separate request queue per L1 quadrant. Therefore, the
-  vector and scalar memory operations performed by 
diff erent wavefronts, whether
-  executing in the same or 
diff erent work-groups (which may be executing on
-  
diff erent CUs accessing 
diff erent L0s), can be reordered relative to each
-  other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is required to ensure
-  synchronization between vector memory operations of 
diff erent wavefronts. It
-  ensures a previous vector memory operation has completed before executing a
-  subsequent vector memory or LDS operation and so can be used to meet the
-  requirements of acquire, release and sequential consistency.
-* The L1 caches use an L2 cache shared by all SAs on the same agent.
-* The L2 cache has independent channels to service disjoint ranges of virtual
-  addresses.
-* Each L1 quadrant of a single SA accesses a 
diff erent L2 channel. Each L1
-  quadrant has a separate request queue per L2 channel. Therefore, the vector
-  and scalar memory operations performed by wavefronts executing in 
diff erent
-  work-groups (which may be executing on 
diff erent SAs) of an agent can be
-  reordered relative to each other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is
-  required to ensure synchronization between vector memory operations of
-  
diff erent SAs. It ensures a previous vector memory operation has completed
-  before executing a subsequent vector memory and so can be used to meet the
-  requirements of acquire, release and sequential consistency.
-* The L2 cache can be kept coherent with other agents on some targets, or ranges
-  of virtual addresses can be set up to bypass it to ensure system coherence.
-
-Private address space uses ``buffer_load/store`` using the scratch V#
-(GFX6-GFX8), or ``scratch_load/store`` (GFX9-GFX10). Since only a single thread
-is accessing the memory, atomic memory orderings are not meaningful, and all
-accesses are treated as non-atomic.
-
-Constant address space uses ``buffer/global_load`` instructions (or equivalent
-scalar memory instructions). Since the constant address space contents do not
-change during the execution of a kernel dispatch it is not legal to perform
-stores, and atomic memory orderings are not meaningful, and all access are
-treated as non-atomic.
-
-A memory synchronization scope wider than work-group is not meaningful for the
-group (LDS) address space and is treated as work-group.
-
-The memory model does not support the region address space which is treated as
-non-atomic.
-
-Acquire memory ordering is not meaningful on store atomic instructions and is
-treated as non-atomic.
-
-Release memory ordering is not meaningful on load atomic instructions and is
-treated a non-atomic.
-
-Acquire-release memory ordering is not meaningful on load or store atomic
-instructions and is treated as acquire and release respectively.
-
-AMDGPU backend only uses scalar memory operations to access memory that is
-proven to not change during the execution of the kernel dispatch. This includes
-constant address space and global address space for program scope const
-variables. Therefore, the kernel machine code does not have to maintain the
-scalar L1 cache to ensure it is coherent with the vector L1 cache. The scalar
-and vector L1 caches are invalidated between kernel dispatches by CP since
-constant address space data may change between kernel dispatch executions. See
+Scalar memory operations are only used to access memory that is proven to not
+change during the execution of the kernel dispatch. This includes constant
+address space and global address space for program scope const variables.
+Therefore, the kernel machine code does not have to maintain the scalar cache to
+ensure it is coherent with the vector caches. The scalar and vector caches are
+invalidated between kernel dispatches by CP since constant address space data
+may change between kernel dispatch executions. See
 :ref:`amdgpu-amdhsa-memory-spaces`.
 
 The one exception is if scalar writes are used to spill SGPR registers. In this
@@ -4348,441 +4322,256 @@ future wavefront that uses the same scratch area, or a function call that
 creates a frame at the same address, respectively. There is no need for a
 ``s_dcache_inv`` as all scalar writes are write-before-read in the same thread.
 
-For GFX6-GFX9, scratch backing memory (which is used for the private address
-space) is accessed with MTYPE NC_NV (non-coherent non-volatile). Since the
-private address space is only accessed by a single thread, and is always
-write-before-read, there is never a need to invalidate these entries from the L1
-cache. Hence all cache invalidates are done as ``*_vol`` to only invalidate the
-volatile cache lines.
-
-For GFX10, scratch backing memory (which is used for the private address space)
-is accessed with MTYPE NC (non-coherent). Since the private address space is
-only accessed by a single thread, and is always write-before-read, there is
-never a need to invalidate these entries from the L0 or L1 caches.
-
-For GFX10, wavefronts are executed in native mode with in-order reporting of
-loads and sample instructions. In this mode vmcnt reports completion of load,
-atomic with return and sample instructions in order, and the vscnt reports the
-completion of store and atomic without return in order. See ``MEM_ORDERED``
-field in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+For kernarg backing memory:
 
-In GFX10, wavefronts can be executed in WGP or CU wavefront execution mode:
-
-* In WGP wavefront execution mode the wavefronts of a work-group are executed
-  on the SIMDs of both CUs of the WGP. Therefore, explicit management of the per
-  CU L0 caches is required for work-group synchronization. Also accesses to L1
-  at work-group scope need to be explicitly ordered as the accesses from
-  
diff erent CUs are not ordered.
-* In CU wavefront execution mode the wavefronts of a work-group are executed on
-  the SIMDs of a single CU of the WGP. Therefore, all global memory access by
-  the work-group access the same L0 which in turn ensures L1 accesses are
-  ordered and so do not require explicit management of the caches for
-  work-group synchronization.
+* CP invalidates the L1 cache at the start of each kernel dispatch.
+* On dGPU the kernarg backing memory is allocated in host memory accessed as
+  MTYPE UC (uncached) to avoid needing to invalidate the L2 cache. This also
+  causes it to be treated as non-volatile and so is not invalidated by
+  ``*_vol``.
+* On APU the kernarg backing memory it is accessed as MTYPE CC (cache coherent)
+  and so the L2 cache will be coherent with the CPU and other agents.
 
-See ``WGP_MODE`` field in
-:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table` and
-:ref:`amdgpu-target-features`.
+Scratch backing memory (which is used for the private address space) is accessed
+with MTYPE NC_NV (non-coherent non-volatile). Since the private address space is
+only accessed by a single thread, and is always write-before-read, there is
+never a need to invalidate these entries from the L1 cache. Hence all cache
+invalidates are done as ``*_vol`` to only invalidate the volatile cache lines.
 
-On dGPU the kernarg backing memory is accessed as UC (uncached) to avoid needing
-to invalidate the L2 cache. For GFX6-GFX9, this also causes it to be treated as
-non-volatile and so is not invalidated by ``*_vol``. On APU it is accessed as CC
-(cache coherent) and so the L2 cache will be coherent with the CPU and other
-agents.
+The code sequences used to implement the memory model for GFX6-GFX9 are defined
+in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
 
-  .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX10
-     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx10-table
+  .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX9
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table
 
-     ============ ============ ============== ========== ================================ ================================
-     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code              AMDGPU Machine Code
-                  Ordering     Sync Scope     Address    GFX6-9                           GFX10
+     ============ ============ ============== ========== ================================
+     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code
+                  Ordering     Sync Scope     Address    GFX6-9
                                               Space
-     ============ ============ ============== ========== ================================ ================================
+     ============ ============ ============== ========== ================================
      **Non-Atomic**
-     ---------------------------------------------------------------------------------------------------------------------
-     load         *none*       *none*         - global   - !volatile & !nontemporal       - !volatile & !nontemporal
+     ------------------------------------------------------------------------------------
+     load         *none*       *none*         - global   - !volatile & !nontemporal
                                               - generic
-                                              - private    1. buffer/global/flat_load       1. buffer/global/flat_load
+                                              - private    1. buffer/global/flat_load
                                               - constant
-                                                         - volatile & !nontemporal        - volatile & !nontemporal
+                                                         - volatile & !nontemporal
 
-                                                           1. buffer/global/flat_load       1. buffer/global/flat_load
-                                                              glc=1                            glc=1 dlc=1
+                                                           1. buffer/global/flat_load
+                                                              glc=1
 
-                                                         - nontemporal                    - nontemporal
+                                                         - nontemporal
 
-                                                           1. buffer/global/flat_load       1. buffer/global/flat_load
-                                                              glc=1 slc=1                      slc=1
+                                                           1. buffer/global/flat_load
+                                                              glc=1 slc=1
 
-     load         *none*       *none*         - local    1. ds_load                       1. ds_load
-     store        *none*       *none*         - global   - !nontemporal                   - !nontemporal
+     load         *none*       *none*         - local    1. ds_load
+     store        *none*       *none*         - global   - !nontemporal
                                               - generic
-                                              - private    1. buffer/global/flat_store      1. buffer/global/flat_store
+                                              - private    1. buffer/global/flat_store
                                               - constant
-                                                         - nontemporal                    - nontemporal
+                                                         - nontemporal
 
-                                                           1. buffer/global/flat_store       1. buffer/global/flat_store
-                                                              glc=1 slc=1                       slc=1
+                                                           1. buffer/global/flat_store
+                                                              glc=1 slc=1
 
-     store        *none*       *none*         - local    1. ds_store                      1. ds_store
+     store        *none*       *none*         - local    1. ds_store
      **Unordered Atomic**
-     ---------------------------------------------------------------------------------------------------------------------
-     load atomic  unordered    *any*          *any*      *Same as non-atomic*.            *Same as non-atomic*.
-     store atomic unordered    *any*          *any*      *Same as non-atomic*.            *Same as non-atomic*.
-     atomicrmw    unordered    *any*          *any*      *Same as monotonic               *Same as monotonic
-                                                         atomic*.                         atomic*.
+     ------------------------------------------------------------------------------------
+     load atomic  unordered    *any*          *any*      *Same as non-atomic*.
+     store atomic unordered    *any*          *any*      *Same as non-atomic*.
+     atomicrmw    unordered    *any*          *any*      *Same as monotonic
+                                                         atomic*.
      **Monotonic Atomic**
-     ---------------------------------------------------------------------------------------------------------------------
-     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load       1. buffer/global/flat_load
-                               - wavefront    - generic
-     load atomic  monotonic    - workgroup    - global   1. buffer/global/flat_load       1. buffer/global/flat_load
-                                              - generic                                      glc=1
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit glc=1.
-
-     load atomic  monotonic    - singlethread - local    1. ds_load                       1. ds_load
-                               - wavefront
-                               - workgroup
-     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load       1. buffer/global/flat_load
-                               - system       - generic     glc=1                            glc=1 dlc=1
-     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store      1. buffer/global/flat_store
+     ------------------------------------------------------------------------------------
+     load atomic  monotonic    - singlethread - global   1. buffer/global/ds/flat_load
+                               - wavefront    - local
+                               - workgroup    - generic
+     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load
+                               - system       - generic     glc=1
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
                                - wavefront    - generic
                                - workgroup
                                - agent
                                - system
-     store atomic monotonic    - singlethread - local    1. ds_store                      1. ds_store
+     store atomic monotonic    - singlethread - local    1. ds_store
                                - wavefront
                                - workgroup
-     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic     1. buffer/global/flat_atomic
+     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic
                                - wavefront    - generic
                                - workgroup
                                - agent
                                - system
-     atomicrmw    monotonic    - singlethread - local    1. ds_atomic                     1. ds_atomic
+     atomicrmw    monotonic    - singlethread - local    1. ds_atomic
                                - wavefront
                                - workgroup
      **Acquire Atomic**
-     ---------------------------------------------------------------------------------------------------------------------
-     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load    1. buffer/global/ds/flat_load
+     ------------------------------------------------------------------------------------
+     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load
                                - wavefront    - local
                                               - generic
-     load atomic  acquire      - workgroup    - global   1. buffer/global_load            1. buffer/global_load glc=1
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit glc=1.
-
-                                                                                          2. s_waitcnt vmcnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Must happen before
-                                                                                              the following buffer_gl0_inv
-                                                                                              and before any following
-                                                                                              global/generic
-                                                                                              load/load
-                                                                                              atomic/store/store
-                                                                                              atomic/atomicrmw.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     load atomic  acquire      - workgroup    - local    1. ds_load                       1. ds_load
-                                                         2. s_waitcnt lgkmcnt(0)          2. s_waitcnt lgkmcnt(0)
-
-                                                           - If OpenCL, omit.               - If OpenCL, omit.
-                                                           - Must happen before             - Must happen before
-                                                             any following                    the following buffer_gl0_inv
-                                                             global/generic                   and before any following
-                                                             load/load                        global/generic load/load
-                                                             atomic/store/store               atomic/store/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures any                    - Ensures any
-                                                             following global                 following global
-                                                             data read is no                  data read is no
-                                                             older than the load              older than the load
-                                                             atomic value being               atomic value being
-                                                             acquired.                        acquired.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - If OpenCL, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     load atomic  acquire      - workgroup    - generic  1. flat_load                     1. flat_load glc=1
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit glc=1.
-
-                                                         2. s_waitcnt lgkmcnt(0)          2. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit
-                                                                                              lgkmcnt(0).
-                                                           - Must happen before             - Must happen before
-                                                             any following                    the following
-                                                             global/generic                   buffer_gl0_inv and any
-                                                             load/load                        following global/generic
-                                                             atomic/store/store               load/load
-                                                             atomic/atomicrmw.                atomic/store/store
-                                                                                              atomic/atomicrmw.
-                                                           - Ensures any                    - Ensures any
-                                                             following global                 following global
-                                                             data read is no                  data read is no
-                                                             older than the load              older than the load
-                                                             atomic value being               atomic value being
-                                                             acquired.                        acquired.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     load atomic  acquire      - agent        - global   1. buffer/global_load            1. buffer/global_load
-                               - system                     glc=1                            glc=1 dlc=1
-                                                         2. s_waitcnt vmcnt(0)            2. s_waitcnt vmcnt(0)
-
-                                                           - Must happen before             - Must happen before
-                                                             following                        following
-                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
-                                                           - Ensures the load               - Ensures the load
-                                                             has completed                    has completed
-                                                             before invalidating              before invalidating
-                                                             the cache.                       the caches.
-
-                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following                        following
-                                                             loads will not see               loads will not see
-                                                             stale global data.               stale global data.
-
-     load atomic  acquire      - agent        - generic  1. flat_load glc=1               1. flat_load glc=1 dlc=1
-                               - system                  2. s_waitcnt vmcnt(0) &          2. s_waitcnt vmcnt(0) &
-                                                            lgkmcnt(0)                       lgkmcnt(0)
-
-                                                           - If OpenCL omit                 - If OpenCL omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                           - Must happen before             - Must happen before
-                                                             following                        following
-                                                             buffer_wbinvl1_vol.              buffer_gl*_invl.
-                                                           - Ensures the flat_load          - Ensures the flat_load
-                                                             has completed                    has completed
-                                                             before invalidating              before invalidating
-                                                             the cache.                       the caches.
-
-                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following loads                  following loads
-                                                             will not see stale               will not see stale
-                                                             global data.                     global data.
-
-     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic  1. buffer/global/ds/flat_atomic
+     load atomic  acquire      - workgroup    - global   1. buffer/global_load
+     load atomic  acquire      - workgroup    - local    1. ds/flat_load
+                                              - generic  2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than a local load
+                                                             atomic value being
+                                                             acquired.
+
+     load atomic  acquire      - agent        - global   1. buffer/global_load
+                               - system                     glc=1
+                                                         2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the load
+                                                             has completed
+                                                             before invalidating
+                                                             the cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale global data.
+
+     load atomic  acquire      - agent        - generic  1. flat_load glc=1
+                               - system                  2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the flat_load
+                                                             has completed
+                                                             before invalidating
+                                                             the cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    acquire      - workgroup    - global   1. buffer/global_atomic          1. buffer/global_atomic
-                                                                                          2. s_waitcnt vm/vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Use vmcnt(0) if atomic with
-                                                                                              return and vscnt(0) if
-                                                                                              atomic with no-return.
-                                                                                            - Must happen before
-                                                                                              the following buffer_gl0_inv
-                                                                                              and before any following
-                                                                                              global/generic
-                                                                                              load/load
-                                                                                              atomic/store/store
-                                                                                              atomic/atomicrmw.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     atomicrmw    acquire      - workgroup    - local    1. ds_atomic                     1. ds_atomic
-                                                         2. waitcnt lgkmcnt(0)            2. waitcnt lgkmcnt(0)
-
-                                                           - If OpenCL, omit.               - If OpenCL, omit.
-                                                           - Must happen before             - Must happen before
-                                                             any following                    the following
-                                                             global/generic                   buffer_gl0_inv.
+     atomicrmw    acquire      - workgroup    - global   1. buffer/global_atomic
+     atomicrmw    acquire      - workgroup    - local    1. ds/flat_atomic
+                                              - generic  2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                    - Ensures any
-                                                             following global                 following global
-                                                             data read is no                  data read is no
-                                                             older than the                   older than the
-                                                             atomicrmw value                  atomicrmw value
-                                                             being acquired.                  being acquired.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If OpenCL omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     atomicrmw    acquire      - workgroup    - generic  1. flat_atomic                   1. flat_atomic
-                                                         2. waitcnt lgkmcnt(0)            2. waitcnt lgkmcnt(0) &
-                                                                                             vm/vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vm/vscnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit
-                                                                                              waitcnt lgkmcnt(0).
-                                                                                            - Use vmcnt(0) if atomic with
-                                                                                              return and vscnt(0) if
-                                                                                              atomic with no-return.
-                                                           - Must happen before             - Must happen before
-                                                             any following                    the following
-                                                             global/generic                   buffer_gl0_inv.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than a local
+                                                             atomicrmw value
+                                                             being acquired.
+
+     atomicrmw    acquire      - agent        - global   1. buffer/global_atomic
+                               - system                  2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - agent        - generic  1. flat_atomic
+                               - system                  2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
                                                              load/load
-                                                             atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                    - Ensures any
-                                                             following global                 following global
-                                                             data read is no                  data read is no
-                                                             older than the                   older than the
-                                                             atomicrmw value                  atomicrmw value
-                                                             being acquired.                  being acquired.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     atomicrmw    acquire      - agent        - global   1. buffer/global_atomic          1. buffer/global_atomic
-                               - system                  2. s_waitcnt vmcnt(0)            2. s_waitcnt vm/vscnt(0)
-
-                                                                                            - Use vmcnt(0) if atomic with
-                                                                                              return and vscnt(0) if
-                                                                                              atomic with no-return.
-                                                                                              waitcnt lgkmcnt(0).
-                                                           - Must happen before             - Must happen before
-                                                             following                        following
-                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
-                                                           - Ensures the                    - Ensures the
-                                                             atomicrmw has                    atomicrmw has
-                                                             completed before                 completed before
-                                                             invalidating the                 invalidating the
-                                                             cache.                           caches.
-
-                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following loads                  following loads
-                                                             will not see stale               will not see stale
-                                                             global data.                     global data.
-
-     atomicrmw    acquire      - agent        - generic  1. flat_atomic                   1. flat_atomic
-                               - system                  2. s_waitcnt vmcnt(0) &          2. s_waitcnt vm/vscnt(0) &
-                                                            lgkmcnt(0)                       lgkmcnt(0)
-
-                                                           - If OpenCL, omit                - If OpenCL, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                                                            - Use vmcnt(0) if atomic with
-                                                                                              return and vscnt(0) if
-                                                                                              atomic with no-return.
-                                                           - Must happen before             - Must happen before
-                                                             following                        following
-                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
-                                                           - Ensures the                    - Ensures the
-                                                             atomicrmw has                    atomicrmw has
-                                                             completed before                 completed before
-                                                             invalidating the                 invalidating the
-                                                             cache.                           caches.
-
-                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following loads                  following loads
-                                                             will not see stale               will not see stale
-                                                             global data.                     global data.
-
-     fence        acquire      - singlethread *none*     *none*                           *none*
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acquire      - singlethread *none*     *none*
                                - wavefront
-     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL and                  - If OpenCL and
-                                                             address space is                 address space is
-                                                             not generic, omit.               not generic, omit
-                                                                                              lgkmcnt(0).
-                                                                                            - If OpenCL and
-                                                                                              address space is
-                                                                                              local, omit
-                                                                                              vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM            - However, since LLVM
-                                                             currently has no                 currently has no
-                                                             address space on                 address space on
-                                                             the fence need to                the fence need to
-                                                             conservatively                   conservatively
-                                                             always generate. If              always generate. If
-                                                             fence had an                     fence had an
-                                                             address space then               address space then
-                                                             set to address                   set to address
-                                                             space of OpenCL                  space of OpenCL
-                                                             fence flag, or to                fence flag, or to
-                                                             generic if both                  generic if both
-                                                             local and global                 local and global
-                                                             flags are                        flags are
-                                                             specified.                       specified.
+     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit.
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic load
@@ -4806,96 +4595,22 @@ agents.
                                                              older than the
                                                              value read by the
                                                              fence-paired-atomic.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value
-                                                                                              with an equal or
-                                                                                              wider sync scope
-                                                                                              and memory ordering
-                                                                                              stronger than
-                                                                                              unordered (this is
-                                                                                              termed the
-                                                                                              fence-paired-atomic).
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              atomicrmw-no-return-value
-                                                                                              with an equal or
-                                                                                              wider sync scope
-                                                                                              and memory ordering
-                                                                                              stronger than
-                                                                                              unordered (this is
-                                                                                              termed the
-                                                                                              fence-paired-atomic).
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic load
-                                                                                              atomic/atomicrmw
-                                                                                              with an equal or
-                                                                                              wider sync scope
-                                                                                              and memory ordering
-                                                                                              stronger than
-                                                                                              unordered (this is
-                                                                                              termed the
-                                                                                              fence-paired-atomic).
-                                                                                            - Must happen before
-                                                                                              the following
-                                                                                              buffer_gl0_inv.
-                                                                                            - Ensures that the
-                                                                                              fence-paired atomic
-                                                                                              has completed
-                                                                                              before invalidating
-                                                                                              the
-                                                                                              cache. Therefore
-                                                                                              any following
-                                                                                              locations read must
-                                                                                              be no older than
-                                                                                              the value read by
-                                                                                              the
-                                                                                              fence-paired-atomic.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     fence        acquire      - agent        *none*     1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL and                  - If OpenCL and
-                                                             address space is                 address space is
-                                                             not generic, omit                not generic, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                                                            - If OpenCL and
-                                                                                              address space is
-                                                                                              local, omit
-                                                                                              vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM            - However, since LLVM
-                                                             currently has no                 currently has no
-                                                             address space on                 address space on
-                                                             the fence need to                the fence need to
-                                                             conservatively                   conservatively
-                                                             always generate                  always generate
-                                                             (see comment for                 (see comment for
-                                                             previous fence).                 previous fence).
+
+     fence        acquire      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
                                                            - Could be split into
                                                              separate s_waitcnt
                                                              vmcnt(0) and
@@ -4944,1636 +4659,2867 @@ agents.
                                                              the value read by
                                                              the
                                                              fence-paired-atomic.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value
-                                                                                              with an equal or
-                                                                                              wider sync scope
-                                                                                              and memory ordering
-                                                                                              stronger than
-                                                                                              unordered (this is
-                                                                                              termed the
-                                                                                              fence-paired-atomic).
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              atomicrmw-no-return-value
-                                                                                              with an equal or
-                                                                                              wider sync scope
-                                                                                              and memory ordering
-                                                                                              stronger than
-                                                                                              unordered (this is
-                                                                                              termed the
-                                                                                              fence-paired-atomic).
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic load
-                                                                                              atomic/atomicrmw
-                                                                                              with an equal or
-                                                                                              wider sync scope
-                                                                                              and memory ordering
-                                                                                              stronger than
-                                                                                              unordered (this is
-                                                                                              termed the
-                                                                                              fence-paired-atomic).
-                                                                                            - Must happen before
-                                                                                              the following
-                                                                                              buffer_gl*_inv.
-                                                                                            - Ensures that the
-                                                                                              fence-paired atomic
-                                                                                              has completed
-                                                                                              before invalidating
-                                                                                              the
-                                                                                              caches. Therefore
-                                                                                              any following
-                                                                                              locations read must
-                                                                                              be no older than
-                                                                                              the value read by
-                                                                                              the
-                                                                                              fence-paired-atomic.
-
-                                                         2. buffer_wbinvl1_vol            2. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before any         - Must happen before any
-                                                             following global/generic         following global/generic
-                                                             load/load                        load/load
-                                                             atomic/store/store               atomic/store/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following loads                  following loads
-                                                             will not see stale               will not see stale
-                                                             global data.                     global data.
+
+                                                         2. buffer_wbinvl1_vol
+
+                                                           - Must happen before any
+                                                             following global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
 
      **Release Atomic**
-     ---------------------------------------------------------------------------------------------------------------------
-     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store   1. buffer/global/ds/flat_store
+     ------------------------------------------------------------------------------------
+     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store
                                - wavefront    - local
                                               - generic
-     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit
-                                                                                              lgkmcnt(0).
+     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+                                              - generic
+                                                           - If OpenCL, omit.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store
-                                                                                              atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             store.                           store.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             store that is being              store that is being
-                                                             released.                        released.
-
-                                                         2. buffer/global_store           2. buffer/global_store
-     store atomic release      - workgroup    - local                                     1. waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - If OpenCL, omit.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0) and s_waitcnt
-                                                                                              vscnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - Must happen before
-                                                                                              the following
-                                                                                              store.
-                                                                                            - Ensures that all
-                                                                                              global memory
-                                                                                              operations have
-                                                                                              completed before
-                                                                                              performing the
-                                                                                              store that is being
-                                                                                              released.
-
-                                                         1. ds_store                      2. ds_store
-     store atomic release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit
-                                                                                              lgkmcnt(0).
-                                                           - Must happen after
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/flat_store
+     store atomic release      - workgroup    - local    1. ds_store
+     store atomic release      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store
-                                                                                              atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             store.                           store.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             store that is being              store that is being
-                                                             released.                        released.
-
-                                                         2. flat_store                    2. flat_store
-     store atomic release      - agent        - global   1. s_waitcnt lgkmcnt(0) &          1. s_waitcnt lgkmcnt(0) &
-                               - system       - generic     vmcnt(0)                           vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL, omit                - If OpenCL, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                           - Could be split into            - Could be split into
-                                                             separate s_waitcnt               separate s_waitcnt
-                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt vscnt(0)
-                                                             s_waitcnt                        and s_waitcnt
-                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
-                                                             them to be                       them to be
-                                                             independently moved              independently moved
-                                                             according to the                 according to the
-                                                             following rules.                 following rules.
-                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             global/generic                   global/generic
-                                                             load/store/load                  load/load
-                                                             atomic/store                     atomic/
-                                                             atomic/atomicrmw.                atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             local/generic                    local/generic
-                                                             load/store/load                  load/store/load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             store.                           store.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to memory have                   to memory have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             store that is being              store that is being
-                                                             released.                        released.
-
-                                                         2. buffer/global/flat_store      2. buffer/global/flat_store
-     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic  1. buffer/global/ds/flat_atomic
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to memory have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/flat_store
+     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
+     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+                                              - generic
                                                            - If OpenCL, omit.
-
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store
-                                                                                              atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             atomicrmw.                       atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             atomicrmw that is                atomicrmw that is
-                                                             being released.                  being released.
-
-                                                         2. buffer/global_atomic          2. buffer/global_atomic
-     atomicrmw    release      - workgroup    - local                                     1. waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - If OpenCL, omit.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0) and s_waitcnt
-                                                                                              vscnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - Must happen before
-                                                                                              the following
-                                                                                              store.
-                                                                                            - Ensures that all
-                                                                                              global memory
-                                                                                              operations have
-                                                                                              completed before
-                                                                                              performing the
-                                                                                              store that is being
-                                                                                              released.
-
-                                                         1. ds_atomic                     2. ds_atomic
-     atomicrmw    release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit
-                                                                                              lgkmcnt(0).
-                                                           - Must happen after
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global/flat_atomic
+     atomicrmw    release      - workgroup    - local    1. ds_atomic
+     atomicrmw    release      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
                                                              any preceding
-                                                             local/generic
+                                                             global/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store
-                                                                                              atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             atomicrmw.                       atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             atomicrmw that is                atomicrmw that is
-                                                             being released.                  being released.
-
-                                                         2. flat_atomic                   2. flat_atomic
-     atomicrmw    release      - agent        - global   1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lkkmcnt(0) &
-                               - system       - generic     vmcnt(0)                          vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL, omit                - If OpenCL, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                           - Could be split into            - Could be split into
-                                                             separate s_waitcnt               separate s_waitcnt
-                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
-                                                             s_waitcnt                        vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
-                                                             them to be                       them to be
-                                                             independently moved              independently moved
-                                                             according to the                 according to the
-                                                             following rules.                 following rules.
-                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             global/generic                   global/generic
-                                                             load/store/load                  load/load atomic/
-                                                             atomic/store                     atomicrmw-with-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             local/generic                    local/generic
-                                                             load/store/load                  load/store/load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             atomicrmw.                       atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to global and local              to global and local
-                                                             have completed                   have completed
-                                                             before performing                before performing
-                                                             the atomicrmw that               the atomicrmw that
-                                                             is being released.               is being released.
-
-                                                         2. buffer/global/flat_atomic     2. buffer/global/flat_atomic
-     fence        release      - singlethread *none*     *none*                           *none*
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global and local
+                                                             have completed
+                                                             before performing
+                                                             the atomicrmw that
+                                                             is being released.
+
+                                                         2. buffer/global/flat_atomic
+     fence        release      - singlethread *none*     *none*
                                - wavefront
-     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL and                  - If OpenCL and
-                                                             address space is                 address space is
-                                                             not generic, omit.               not generic, omit
-                                                                                              lgkmcnt(0).
-                                                                                            - If OpenCL and
-                                                                                              address space is
-                                                                                              local, omit
-                                                                                              vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM            - However, since LLVM
-                                                             currently has no                 currently has no
-                                                             address space on                 address space on
-                                                             the fence need to                the fence need to
-                                                             conservatively                   conservatively
-                                                             always generate. If              always generate. If
-                                                             fence had an                     fence had an
-                                                             address space then               address space then
-                                                             set to address                   set to address
-                                                             space of OpenCL                  space of OpenCL
-                                                             fence flag, or to                fence flag, or to
-                                                             generic if both                  generic if both
-                                                             local and global                 local and global
-                                                             flags are                        flags are
-                                                             specified.                       specified.
+     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit.
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store atomic/
-                                                                                              atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             any following store              any following store
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with an equal or                 with an equal or
-                                                             wider sync scope                 wider sync scope
-                                                             and memory ordering              and memory ordering
-                                                             stronger than                    stronger than
-                                                             unordered (this is               unordered (this is
-                                                             termed the                       termed the
-                                                             fence-paired-atomic).            fence-paired-atomic).
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             following                        following
-                                                             fence-paired-atomic.             fence-paired-atomic.
-
-     fence        release      - agent        *none*     1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL and                  - If OpenCL and
-                                                             address space is                 address space is
-                                                             not generic, omit                not generic, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                           - If OpenCL and                  - If OpenCL and
-                                                             address space is                 address space is
-                                                             local, omit                      local, omit
-                                                             vmcnt(0).                        vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM            - However, since LLVM
-                                                             currently has no                 currently has no
-                                                             address space on                 address space on
-                                                             the fence need to                the fence need to
-                                                             conservatively                   conservatively
-                                                             always generate. If              always generate. If
-                                                             fence had an                     fence had an
-                                                             address space then               address space then
-                                                             set to address                   set to address
-                                                             space of OpenCL                  space of OpenCL
-                                                             fence flag, or to                fence flag, or to
-                                                             generic if both                  generic if both
-                                                             local and global                 local and global
-                                                             flags are                        flags are
-                                                             specified.                       specified.
-                                                           - Could be split into            - Could be split into
-                                                             separate s_waitcnt               separate s_waitcnt
-                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
-                                                             s_waitcnt                        vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
-                                                             them to be                       them to be
-                                                             independently moved              independently moved
-                                                             according to the                 according to the
-                                                             following rules.                 following rules.
-                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             global/generic                   global/generic
-                                                             load/store/load                  load/load atomic/
-                                                             atomic/store                     atomicrmw-with-return-value.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
+
+     fence        release      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             local, omit
+                                                             vmcnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             local/generic                    local/generic
-                                                             load/store/load                  load/store/load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             any following store              any following store
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with an equal or                 with an equal or
-                                                             wider sync scope                 wider sync scope
-                                                             and memory ordering              and memory ordering
-                                                             stronger than                    stronger than
-                                                             unordered (this is               unordered (this is
-                                                             termed the                       termed the
-                                                             fence-paired-atomic).            fence-paired-atomic).
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             have                             have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             following                        following
-                                                             fence-paired-atomic.             fence-paired-atomic.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
 
      **Acquire-Release Atomic**
-     ---------------------------------------------------------------------------------------------------------------------
-     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic  1. buffer/global/ds/flat_atomic
+     ------------------------------------------------------------------------------------
+     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit
-                                                                                              lgkmcnt(0).
-                                                           - Must happen after              - Must happen after
-                                                             any preceding                    any preceding
-                                                             local/generic                    local/generic
-                                                             load/store/load                  load/store/load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0), and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store
-                                                                                              atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             atomicrmw.                       atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             atomicrmw that is                atomicrmw that is
-                                                             being released.                  being released.
-
-                                                         2. buffer/global_atomic          2. buffer/global_atomic
-                                                                                          3. s_waitcnt vm/vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Use vmcnt(0) if atomic with
-                                                                                              return and vscnt(0) if
-                                                                                              atomic with no-return.
-                                                                                            - Must happen before
-                                                                                              the following
-                                                                                              buffer_gl0_inv.
-                                                                                            - Ensures any
-                                                                                              following global
-                                                                                              data read is no
-                                                                                              older than the
-                                                                                              atomicrmw value
-                                                                                              being acquired.
-
-                                                                                          4. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     atomicrmw    acq_rel      - workgroup    - local                                     1. waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - If OpenCL, omit.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0) and s_waitcnt
-                                                                                              vscnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - Must happen before
-                                                                                              the following
-                                                                                              store.
-                                                                                            - Ensures that all
-                                                                                              global memory
-                                                                                              operations have
-                                                                                              completed before
-                                                                                              performing the
-                                                                                              store that is being
-                                                                                              released.
-
-                                                         1. ds_atomic                     2. ds_atomic
-                                                         2. s_waitcnt lgkmcnt(0)          3. s_waitcnt lgkmcnt(0)
-
-                                                           - If OpenCL, omit.               - If OpenCL, omit.
-                                                           - Must happen before             - Must happen before
-                                                             any following                    the following
-                                                             global/generic                   buffer_gl0_inv.
+     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+
+     atomicrmw    acq_rel      - workgroup    - local    1. ds_atomic
+                                                         2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                    - Ensures any
-                                                             following global                 following global
-                                                             data read is no                  data read is no
-                                                             older than the load              older than the load
-                                                             atomic value being               atomic value being
-                                                             acquired.                        acquired.
-
-                                                                                          4. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - If OpenCL omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit lgkmcnt(0).
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the local load
+                                                             atomic value being
+                                                             acquired.
+
+     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store
-                                                                                              atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             atomicrmw.                       atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             atomicrmw that is                atomicrmw that is
-                                                             being released.                  being released.
-
-                                                         2. flat_atomic                   2. flat_atomic
-                                                         3. s_waitcnt lgkmcnt(0)          3. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL, omit.               - If OpenCL, omit lgkmcnt(0).
-                                                           - Must happen before             - Must happen before
-                                                             any following                    the following
-                                                             global/generic                   buffer_gl0_inv.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                    - Ensures any
-                                                             following global                 following global
-                                                             data read is no                  data read is no
-                                                             older than the load              older than the load
-                                                             atomic value being               atomic value being
-                                                             acquired.                        acquired.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL, omit                - If OpenCL, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                           - Could be split into            - Could be split into
-                                                             separate s_waitcnt               separate s_waitcnt
-                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
-                                                             s_waitcnt                        vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
-                                                             them to be                       them to be
-                                                             independently moved              independently moved
-                                                             according to the                 according to the
-                                                             following rules.                 following rules.
-                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             global/generic                   global/generic
-                                                             load/store/load                  load/load atomic/
-                                                             atomic/store                     atomicrmw-with-return-value.
-                                                             atomic/atomicrmw.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             local/generic                    local/generic
-                                                             load/store/load                  load/store/load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             atomicrmw.                       atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to global have                   to global have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             atomicrmw that is                atomicrmw that is
-                                                             being released.                  being released.
-
-                                                         2. buffer/global_atomic          2. buffer/global_atomic
-                                                         3. s_waitcnt vmcnt(0)            3. s_waitcnt vm/vscnt(0)
-
-                                                                                            - Use vmcnt(0) if atomic with
-                                                                                              return and vscnt(0) if
-                                                                                              atomic with no-return.
-                                                                                              waitcnt lgkmcnt(0).
-                                                           - Must happen before             - Must happen before
-                                                             following                        following
-                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
-                                                           - Ensures the                    - Ensures the
-                                                             atomicrmw has                    atomicrmw has
-                                                             completed before                 completed before
-                                                             invalidating the                 invalidating the
-                                                             cache.                           caches.
-
-                                                         4. buffer_wbinvl1_vol            4. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following loads                  following loads
-                                                             will not see stale               will not see stale
-                                                             global data.                     global data.
-
-     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL, omit                - If OpenCL, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                           - Could be split into            - Could be split into
-                                                             separate s_waitcnt               separate s_waitcnt
-                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
-                                                             s_waitcnt                        vscnt(0), and s_waitcnt
-                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
-                                                             them to be                       them to be
-                                                             independently moved              independently moved
-                                                             according to the                 according to the
-                                                             following rules.                 following rules.
-                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             global/generic                   global/generic
-                                                             load/store/load                  load/load atomic
-                                                             atomic/store                     atomicrmw-with-return-value.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than a local load
+                                                             atomic value being
+                                                             acquired.
+
+     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
                                                              atomic/atomicrmw.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             local/generic                    local/generic
-                                                             load/store/load                  load/store/load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             atomicrmw.                       atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to global have                   have
-                                                             completed before                 completed before
-                                                             performing the                   performing the
-                                                             atomicrmw that is                atomicrmw that is
-                                                             being released.                  being released.
-
-                                                         2. flat_atomic                   2. flat_atomic
-                                                         3. s_waitcnt vmcnt(0) &          3. s_waitcnt vm/vscnt(0) &
-                                                            lgkmcnt(0)                       lgkmcnt(0)
-
-                                                           - If OpenCL, omit                - If OpenCL, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                                                            - Use vmcnt(0) if atomic with
-                                                                                              return and vscnt(0) if
-                                                                                              atomic with no-return.
-                                                           - Must happen before             - Must happen before
-                                                             following                        following
-                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
-                                                           - Ensures the                    - Ensures the
-                                                             atomicrmw has                    atomicrmw has
-                                                             completed before                 completed before
-                                                             invalidating the                 invalidating the
-                                                             cache.                           caches.
-
-                                                         4. buffer_wbinvl1_vol            4. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following loads                  following loads
-                                                             will not see stale               will not see stale
-                                                             global data.                     global data.
-
-     fence        acq_rel      - singlethread *none*     *none*                           *none*
-                               - wavefront
-     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                                                                             vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                           - If OpenCL and                  - If OpenCL and
-                                                             address space is                 address space is
-                                                             not generic, omit.               not generic, omit
-                                                                                              lgkmcnt(0).
-                                                                                            - If OpenCL and
-                                                                                              address space is
-                                                                                              local, omit
-                                                                                              vmcnt(0) and vscnt(0).
-                                                           - However,                       - However,
-                                                             since LLVM                       since LLVM
-                                                             currently has no                 currently has no
-                                                             address space on                 address space on
-                                                             the fence need to                the fence need to
-                                                             conservatively                   conservatively
-                                                             always generate                  always generate
-                                                             (see comment for                 (see comment for
-                                                             previous fence).                 previous fence).
-                                                           - Must happen after
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
                                                              any preceding
                                                              local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+                                                         3. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         4. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
                                                              load/load
-                                                             atomic/store/store
                                                              atomic/atomicrmw.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0) and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - s_waitcnt vmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              load/load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                                                            - s_waitcnt lgkmcnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              local/generic
-                                                                                              load/store/load
-                                                                                              atomic/store atomic/
-                                                                                              atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/store/store               atomic/store/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that all               - Ensures that all
-                                                             memory operations                memory operations
-                                                             to local have                    have
-                                                             completed before                 completed before
-                                                             performing any                   performing any
-                                                             following global                 following global
-                                                             memory operations.               memory operations.
-                                                           - Ensures that the               - Ensures that the
-                                                             preceding                        preceding
-                                                             local/generic load               local/generic load
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with an equal or                 with an equal or
-                                                             wider sync scope                 wider sync scope
-                                                             and memory ordering              and memory ordering
-                                                             stronger than                    stronger than
-                                                             unordered (this is               unordered (this is
-                                                             termed the                       termed the
-                                                             acquire-fence-paired-atomic      acquire-fence-paired-atomic
-                                                             ) has completed                  ) has completed
-                                                             before following                 before following
-                                                             global memory                    global memory
-                                                             operations. This                 operations. This
-                                                             satisfies the                    satisfies the
-                                                             requirements of                  requirements of
-                                                             acquire.                         acquire.
-                                                           - Ensures that all               - Ensures that all
-                                                             previous memory                  previous memory
-                                                             operations have                  operations have
-                                                             completed before a               completed before a
-                                                             following                        following
-                                                             local/generic store              local/generic store
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with an equal or                 with an equal or
-                                                             wider sync scope                 wider sync scope
-                                                             and memory ordering              and memory ordering
-                                                             stronger than                    stronger than
-                                                             unordered (this is               unordered (this is
-                                                             termed the                       termed the
-                                                             release-fence-paired-atomic      release-fence-paired-atomic
-                                                             ). This satisfies the            ). This satisfies the
-                                                             requirements of                  requirements of
-                                                             release.                         release.
-                                                                                            - Must happen before
-                                                                                              the following
-                                                                                              buffer_gl0_inv.
-                                                                                            - Ensures that the
-                                                                                              acquire-fence-paired
-                                                                                              atomic has completed
-                                                                                              before invalidating
-                                                                                              the
-                                                                                              cache. Therefore
-                                                                                              any following
-                                                                                              locations read must
-                                                                                              be no older than
-                                                                                              the value read by
-                                                                                              the
-                                                                                              acquire-fence-paired-atomic.
-
-                                                                                          3. buffer_gl0_inv
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Ensures that
-                                                                                              following
-                                                                                              loads will not see
-                                                                                              stale data.
-
-     fence        acq_rel      - agent        *none*     1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL and                  - If OpenCL and
-                                                             address space is                 address space is
-                                                             not generic, omit                not generic, omit
-                                                             lgkmcnt(0).                      lgkmcnt(0).
-                                                                                            - If OpenCL and
-                                                                                              address space is
-                                                                                              local, omit
-                                                                                              vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM            - However, since LLVM
-                                                             currently has no                 currently has no
-                                                             address space on                 address space on
-                                                             the fence need to                the fence need to
-                                                             conservatively                   conservatively
-                                                             always generate                  always generate
-                                                             (see comment for                 (see comment for
-                                                             previous fence).                 previous fence).
-                                                           - Could be split into            - Could be split into
-                                                             separate s_waitcnt               separate s_waitcnt
-                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
-                                                             s_waitcnt                        vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
-                                                             them to be                       them to be
-                                                             independently moved              independently moved
-                                                             according to the                 according to the
-                                                             following rules.                 following rules.
-                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             global/generic                   global/generic
-                                                             load/store/load                  load/load
-                                                             atomic/store                     atomic/
-                                                             atomic/atomicrmw.                atomicrmw-with-return-value.
-                                                                                            - s_waitcnt vscnt(0)
-                                                                                              must happen after
-                                                                                              any preceding
-                                                                                              global/generic
-                                                                                              store/store atomic/
-                                                                                              atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
-                                                             must happen after                must happen after
-                                                             any preceding                    any preceding
-                                                             local/generic                    local/generic
-                                                             load/store/load                  load/store/load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Must happen before             - Must happen before
-                                                             the following                    the following
-                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
-                                                           - Ensures that the               - Ensures that the
-                                                             preceding                        preceding
-                                                             global/local/generic             global/local/generic
-                                                             load                             load
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with an equal or                 with an equal or
-                                                             wider sync scope                 wider sync scope
-                                                             and memory ordering              and memory ordering
-                                                             stronger than                    stronger than
-                                                             unordered (this is               unordered (this is
-                                                             termed the                       termed the
-                                                             acquire-fence-paired-atomic      acquire-fence-paired-atomic
-                                                             ) has completed                  ) has completed
-                                                             before invalidating              before invalidating
-                                                             the cache. This                  the caches. This
-                                                             satisfies the                    satisfies the
-                                                             requirements of                  requirements of
-                                                             acquire.                         acquire.
-                                                           - Ensures that all               - Ensures that all
-                                                             previous memory                  previous memory
-                                                             operations have                  operations have
-                                                             completed before a               completed before a
-                                                             following                        following
-                                                             global/local/generic             global/local/generic
-                                                             store                            store
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with an equal or                 with an equal or
-                                                             wider sync scope                 wider sync scope
-                                                             and memory ordering              and memory ordering
-                                                             stronger than                    stronger than
-                                                             unordered (this is               unordered (this is
-                                                             termed the                       termed the
-                                                             release-fence-paired-atomic      release-fence-paired-atomic
-                                                             ). This satisfies the            ). This satisfies the
-                                                             requirements of                  requirements of
-                                                             release.                         release.
-
-                                                         2. buffer_wbinvl1_vol            2. buffer_gl0_inv;
-                                                                                             buffer_gl1_inv
-
-                                                           - Must happen before             - Must happen before
-                                                             any following                    any following
-                                                             global/generic                   global/generic
-                                                             load/load                        load/load
-                                                             atomic/store/store               atomic/store/store
-                                                             atomic/atomicrmw.                atomic/atomicrmw.
-                                                           - Ensures that                   - Ensures that
-                                                             following loads                  following loads
-                                                             will not see stale               will not see stale
-                                                             global data. This                global data. This
-                                                             satisfies the                    satisfies the
-                                                             requirements of                  requirements of
-                                                             acquire.                         acquire.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
 
-     **Sequential Consistent Atomic**
-     ---------------------------------------------------------------------------------------------------------------------
-     load atomic  seq_cst      - singlethread - global   *Same as corresponding           *Same as corresponding
-                               - wavefront    - local    load atomic acquire,             load atomic acquire,
-                                              - generic  except must generated            except must generated
-                                                         all instructions even            all instructions even
-                                                         for OpenCL.*                     for OpenCL.*
-     load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
-                                              - generic                                      vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit vmcnt(0) and
-                                                                                              vscnt(0).
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0), s_waitcnt
-                                                                                              vscnt(0), and s_waitcnt
-                                                                                              lgkmcnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                           - Must                           - waitcnt lgkmcnt(0) must
-                                                             happen after                     happen after
-                                                             preceding                        preceding
-                                                             global/generic load              local load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with memory                      with memory
-                                                             ordering of seq_cst              ordering of seq_cst
-                                                             and with equal or                and with equal or
-                                                             wider sync scope.                wider sync scope.
-                                                             (Note that seq_cst               (Note that seq_cst
-                                                             fences have their                fences have their
-                                                             own s_waitcnt                    own s_waitcnt
-                                                             lgkmcnt(0) and so do             lgkmcnt(0) and so do
-                                                             not need to be                   not need to be
-                                                             considered.)                     considered.)
-                                                                                            - waitcnt vmcnt(0)
-                                                                                              Must happen after
-                                                                                              preceding
-                                                                                              global/generic load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value
-                                                                                              with memory
-                                                                                              ordering of seq_cst
-                                                                                              and with equal or
-                                                                                              wider sync scope.
-                                                                                              (Note that seq_cst
-                                                                                              fences have their
-                                                                                              own s_waitcnt
-                                                                                              vmcnt(0) and so do
-                                                                                              not need to be
-                                                                                              considered.)
-                                                                                            - waitcnt vscnt(0)
-                                                                                              Must happen after
-                                                                                              preceding
-                                                                                              global/generic store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value
-                                                                                              with memory
-                                                                                              ordering of seq_cst
-                                                                                              and with equal or
-                                                                                              wider sync scope.
-                                                                                              (Note that seq_cst
-                                                                                              fences have their
-                                                                                              own s_waitcnt
-                                                                                              vscnt(0) and so do
-                                                                                              not need to be
-                                                                                              considered.)
-                                                           - Ensures any                    - Ensures any
-                                                             preceding                        preceding
-                                                             sequential                       sequential
-                                                             consistent local                 consistent global/local
-                                                             memory instructions              memory instructions
-                                                             have completed                   have completed
-                                                             before executing                 before executing
-                                                             this sequentially                this sequentially
-                                                             consistent                       consistent
-                                                             instruction. This                instruction. This
-                                                             prevents reordering              prevents reordering
-                                                             a seq_cst store                  a seq_cst store
-                                                             followed by a                    followed by a
-                                                             seq_cst load. (Note              seq_cst load. (Note
-                                                             that seq_cst is                  that seq_cst is
-                                                             stronger than                    stronger than
-                                                             acquire/release as               acquire/release as
-                                                             the reordering of                the reordering of
-                                                             load acquire                     load acquire
-                                                             followed by a store              followed by a store
-                                                             release is                       release is
-                                                             prevented by the                 prevented by the
-                                                             waitcnt of                       waitcnt of
-                                                             the release, but                 the release, but
-                                                             there is nothing                 there is nothing
-                                                             preventing a store               preventing a store
-                                                             release followed by              release followed by
-                                                             load acquire from                load acquire from
-                                                             completing out of                completing out of
-                                                             order. The waitcnt               order. The waitcnt
-                                                             could be placed after            could be placed after
-                                                             seq_store or before              seq_store or before
-                                                             the seq_load. We                 the seq_load. We
-                                                             choose the load to               choose the load to
-                                                             make the waitcnt be              make the waitcnt be
-                                                             as late as possible              as late as possible
-                                                             so that the store                so that the store
-                                                             may have already                 may have already
-                                                             completed.)                      completed.)
-
-                                                         2. *Following                    2. *Following
-                                                            instructions same as             instructions same as
-                                                            corresponding load               corresponding load
-                                                            atomic acquire,                  atomic acquire,
-                                                            except must generated            except must generated
-                                                            all instructions even            all instructions even
-                                                            for OpenCL.*                     for OpenCL.*
-     load atomic  seq_cst      - workgroup    - local    *Same as corresponding
-                                                         load atomic acquire,
-                                                         except must generated
-                                                         all instructions even
-                                                         for OpenCL.*
+     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)
 
-                                                                                          1. s_waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                            - If CU wavefront execution
-                                                                                              mode, omit.
-                                                                                            - Could be split into
-                                                                                              separate s_waitcnt
-                                                                                              vmcnt(0) and s_waitcnt
-                                                                                              vscnt(0) to allow
-                                                                                              them to be
-                                                                                              independently moved
-                                                                                              according to the
-                                                                                              following rules.
-                                                                                            - waitcnt vmcnt(0)
-                                                                                              Must happen after
-                                                                                              preceding
-                                                                                              global/generic load
-                                                                                              atomic/
-                                                                                              atomicrmw-with-return-value
-                                                                                              with memory
-                                                                                              ordering of seq_cst
-                                                                                              and with equal or
-                                                                                              wider sync scope.
-                                                                                              (Note that seq_cst
-                                                                                              fences have their
-                                                                                              own s_waitcnt
-                                                                                              vmcnt(0) and so do
-                                                                                              not need to be
-                                                                                              considered.)
-                                                                                            - waitcnt vscnt(0)
-                                                                                              Must happen after
-                                                                                              preceding
-                                                                                              global/generic store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value
-                                                                                              with memory
-                                                                                              ordering of seq_cst
-                                                                                              and with equal or
-                                                                                              wider sync scope.
-                                                                                              (Note that seq_cst
-                                                                                              fences have their
-                                                                                              own s_waitcnt
-                                                                                              vscnt(0) and so do
-                                                                                              not need to be
-                                                                                              considered.)
-                                                                                            - Ensures any
-                                                                                              preceding
-                                                                                              sequential
-                                                                                              consistent global
-                                                                                              memory instructions
-                                                                                              have completed
-                                                                                              before executing
-                                                                                              this sequentially
-                                                                                              consistent
-                                                                                              instruction. This
-                                                                                              prevents reordering
-                                                                                              a seq_cst store
-                                                                                              followed by a
-                                                                                              seq_cst load. (Note
-                                                                                              that seq_cst is
-                                                                                              stronger than
-                                                                                              acquire/release as
-                                                                                              the reordering of
-                                                                                              load acquire
-                                                                                              followed by a store
-                                                                                              release is
-                                                                                              prevented by the
-                                                                                              waitcnt of
-                                                                                              the release, but
-                                                                                              there is nothing
-                                                                                              preventing a store
-                                                                                              release followed by
-                                                                                              load acquire from
-                                                                                              completing out of
-                                                                                              order. The waitcnt
-                                                                                              could be placed after
-                                                                                              seq_store or before
-                                                                                              the seq_load. We
-                                                                                              choose the load to
-                                                                                              make the waitcnt be
-                                                                                              as late as possible
-                                                                                              so that the store
-                                                                                              may have already
-                                                                                              completed.)
-
-                                                                                          2. *Following
-                                                                                             instructions same as
-                                                                                             corresponding load
-                                                                                             atomic acquire,
-                                                                                             except must generated
-                                                                                             all instructions even
-                                                                                             for OpenCL.*
-
-     load atomic  seq_cst      - agent        - global   1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
-                               - system       - generic     vmcnt(0)                         vmcnt(0) & vscnt(0)
-
-                                                           - Could be split into            - Could be split into
-                                                             separate s_waitcnt               separate s_waitcnt
-                                                             vmcnt(0)                         vmcnt(0), s_waitcnt
-                                                             and s_waitcnt                    vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
-                                                             them to be                       them to be
-                                                             independently moved              independently moved
-                                                             according to the                 according to the
-                                                             following rules.                 following rules.
-                                                           - waitcnt lgkmcnt(0)             - waitcnt lgkmcnt(0)
-                                                             must happen after                must happen after
-                                                             preceding                        preceding
-                                                             global/generic load              local load
-                                                             atomic/store                     atomic/store
-                                                             atomic/atomicrmw                 atomic/atomicrmw
-                                                             with memory                      with memory
-                                                             ordering of seq_cst              ordering of seq_cst
-                                                             and with equal or                and with equal or
-                                                             wider sync scope.                wider sync scope.
-                                                             (Note that seq_cst               (Note that seq_cst
-                                                             fences have their                fences have their
-                                                             own s_waitcnt                    own s_waitcnt
-                                                             lgkmcnt(0) and so do             lgkmcnt(0) and so do
-                                                             not need to be                   not need to be
-                                                             considered.)                     considered.)
-                                                           - waitcnt vmcnt(0)               - waitcnt vmcnt(0)
-                                                             must happen after                must happen after
-                                                             preceding                        preceding
-                                                             global/generic load              global/generic load
-                                                             atomic/store                     atomic/
-                                                             atomic/atomicrmw                 atomicrmw-with-return-value
-                                                             with memory                      with memory
-                                                             ordering of seq_cst              ordering of seq_cst
-                                                             and with equal or                and with equal or
-                                                             wider sync scope.                wider sync scope.
-                                                             (Note that seq_cst               (Note that seq_cst
-                                                             fences have their                fences have their
-                                                             own s_waitcnt                    own s_waitcnt
-                                                             vmcnt(0) and so do               vmcnt(0) and so do
-                                                             not need to be                   not need to be
-                                                             considered.)                     considered.)
-                                                                                            - waitcnt vscnt(0)
-                                                                                              Must happen after
-                                                                                              preceding
-                                                                                              global/generic store
-                                                                                              atomic/
-                                                                                              atomicrmw-no-return-value
-                                                                                              with memory
-                                                                                              ordering of seq_cst
-                                                                                              and with equal or
-                                                                                              wider sync scope.
-                                                                                              (Note that seq_cst
-                                                                                              fences have their
-                                                                                              own s_waitcnt
-                                                                                              vscnt(0) and so do
-                                                                                              not need to be
-                                                                                              considered.)
-                                                           - Ensures any                    - Ensures any
-                                                             preceding                        preceding
-                                                             sequential                       sequential
-                                                             consistent global                consistent global
-                                                             memory instructions              memory instructions
-                                                             have completed                   have completed
-                                                             before executing                 before executing
-                                                             this sequentially                this sequentially
-                                                             consistent                       consistent
-                                                             instruction. This                instruction. This
-                                                             prevents reordering              prevents reordering
-                                                             a seq_cst store                  a seq_cst store
-                                                             followed by a                    followed by a
-                                                             seq_cst load. (Note              seq_cst load. (Note
-                                                             that seq_cst is                  that seq_cst is
-                                                             stronger than                    stronger than
-                                                             acquire/release as               acquire/release as
-                                                             the reordering of                the reordering of
-                                                             load acquire                     load acquire
-                                                             followed by a store              followed by a store
-                                                             release is                       release is
-                                                             prevented by the                 prevented by the
-                                                             waitcnt of                       waitcnt of
-                                                             the release, but                 the release, but
-                                                             there is nothing                 there is nothing
-                                                             preventing a store               preventing a store
-                                                             release followed by              release followed by
-                                                             load acquire from                load acquire from
-                                                             completing out of                completing out of
-                                                             order. The waitcnt               order. The waitcnt
-                                                             could be placed after            could be placed after
-                                                             seq_store or before              seq_store or before
-                                                             the seq_load. We                 the seq_load. We
-                                                             choose the load to               choose the load to
-                                                             make the waitcnt be              make the waitcnt be
-                                                             as late as possible              as late as possible
-                                                             so that the store                so that the store
-                                                             may have already                 may have already
-                                                             completed.)                      completed.)
-
-                                                         2. *Following                    2. *Following
-                                                            instructions same as             instructions same as
-                                                            corresponding load               corresponding load
-                                                            atomic acquire,                  atomic acquire,
-                                                            except must generated            except must generated
-                                                            all instructions even            all instructions even
-                                                            for OpenCL.*                     for OpenCL.*
-     store atomic seq_cst      - singlethread - global   *Same as corresponding           *Same as corresponding
-                               - wavefront    - local    store atomic release,            store atomic release,
-                               - workgroup    - generic  except must generated            except must generated
-                                                         all instructions even            all instructions even
-                                                         for OpenCL.*                     for OpenCL.*
-     store atomic seq_cst      - agent        - global   *Same as corresponding           *Same as corresponding
-                               - system       - generic  store atomic release,            store atomic release,
-                                                         except must generated            except must generated
-                                                         all instructions even            all instructions even
-                                                         for OpenCL.*                     for OpenCL.*
-     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding           *Same as corresponding
-                               - wavefront    - local    atomicrmw acq_rel,               atomicrmw acq_rel,
-                               - workgroup    - generic  except must generated            except must generated
-                                                         all instructions even            all instructions even
-                                                         for OpenCL.*                     for OpenCL.*
-     atomicrmw    seq_cst      - agent        - global   *Same as corresponding           *Same as corresponding
-                               - system       - generic  atomicrmw acq_rel,               atomicrmw acq_rel,
-                                                         except must generated            except must generated
-                                                         all instructions even            all instructions even
-                                                         for OpenCL.*                     for OpenCL.*
-     fence        seq_cst      - singlethread *none*     *Same as corresponding           *Same as corresponding
-                               - wavefront               fence acq_rel,                   fence acq_rel,
-                               - workgroup               except must generated            except must generated
-                               - agent                   all instructions even            all instructions even
-                               - system                  for OpenCL.*                     for OpenCL.*
-     ============ ============ ============== ========== ================================ ================================
-
-The memory order also adds the single thread optimization constrains defined in
-table
-:ref:`amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx10-table`.
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
 
-  .. table:: AMDHSA Memory Model Single Thread Optimization Constraints GFX6-GFX10
-     :name: amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx10-table
+                                                         4. buffer_wbinvl1_vol
 
-     ============ ==============================================================
-     LLVM Memory  Optimization Constraints
-     Ordering
-     ============ ==============================================================
-     unordered    *none*
-     monotonic    *none*
-     acquire      - If a load atomic/atomicrmw then no following load/load
-                    atomic/store/ store atomic/atomicrmw/fence instruction can
-                    be moved before the acquire.
-                  - If a fence then same as load atomic, plus no preceding
-                    associated fence-paired-atomic can be moved after the fence.
-     release      - If a store atomic/atomicrmw then no preceding load/load
-                    atomic/store/ store atomic/atomicrmw/fence instruction can
-                    be moved after the release.
-                  - If a fence then same as store atomic, plus no following
-                    associated fence-paired-atomic can be moved before the
-                    fence.
-     acq_rel      Same constraints as both acquire and release.
-     seq_cst      - If a load atomic then same constraints as acquire, plus no
-                    preceding sequentially consistent load atomic/store
-                    atomic/atomicrmw/fence instruction can be moved after the
-                    seq_cst.
-                  - If a store atomic then the same constraints as release, plus
-                    no following sequentially consistent load atomic/store
-                    atomic/atomicrmw/fence instruction can be moved before the
-                    seq_cst.
-                  - If an atomicrmw/fence then same constraints as acq_rel.
-     ============ ==============================================================
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acq_rel      - singlethread *none*     *none*
+                               - wavefront
+     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit.
+                                                           - However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing any
+                                                             following global
+                                                             memory operations.
+                                                           - Ensures that the
+                                                             preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             acquire-fence-paired-atomic)
+                                                             has completed
+                                                             before following
+                                                             global memory
+                                                             operations. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             local/generic store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             release-fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+
+     fence        acq_rel      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures that the
+                                                             preceding
+                                                             global/local/generic
+                                                             load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             acquire-fence-paired-atomic)
+                                                             has completed
+                                                             before invalidating
+                                                             the cache. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             global/local/generic
+                                                             store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             release-fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+
+                                                         2. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+
+     **Sequential Consistent Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    load atomic acquire,
+                                              - generic  except must generated
+                                                         all instructions even
+                                                         for OpenCL.*
+     load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+                                              - generic
+
+                                                           - Must
+                                                             happen after
+                                                             preceding
+                                                             local/generic load
+                                                             atomic/store
+                                                             atomic/atomicrmw
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             lgkmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - Ensures any
+                                                             preceding
+                                                             sequential
+                                                             consistent local
+                                                             memory instructions
+                                                             have completed
+                                                             before executing
+                                                             this sequentially
+                                                             consistent
+                                                             instruction. This
+                                                             prevents reordering
+                                                             a seq_cst store
+                                                             followed by a
+                                                             seq_cst load. (Note
+                                                             that seq_cst is
+                                                             stronger than
+                                                             acquire/release as
+                                                             the reordering of
+                                                             load acquire
+                                                             followed by a store
+                                                             release is
+                                                             prevented by the
+                                                             s_waitcnt of
+                                                             the release, but
+                                                             there is nothing
+                                                             preventing a store
+                                                             release followed by
+                                                             load acquire from
+                                                             completing out of
+                                                             order. The s_waitcnt
+                                                             could be placed after
+                                                             seq_store or before
+                                                             the seq_load. We
+                                                             choose the load to
+                                                             make the s_waitcnt be
+                                                             as late as possible
+                                                             so that the store
+                                                             may have already
+                                                             completed.)
+
+                                                         2. *Following
+                                                            instructions same as
+                                                            corresponding load
+                                                            atomic acquire,
+                                                            except must generated
+                                                            all instructions even
+                                                            for OpenCL.*
+     load atomic  seq_cst      - workgroup    - local    *Same as corresponding
+                                                         load atomic acquire,
+                                                         except must generated
+                                                         all instructions even
+                                                         for OpenCL.*
+
+     load atomic  seq_cst      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0)
+
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0)
+                                                             and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             preceding
+                                                             global/generic load
+                                                             atomic/store
+                                                             atomic/atomicrmw
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             lgkmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             preceding
+                                                             global/generic load
+                                                             atomic/store
+                                                             atomic/atomicrmw
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - Ensures any
+                                                             preceding
+                                                             sequential
+                                                             consistent global
+                                                             memory instructions
+                                                             have completed
+                                                             before executing
+                                                             this sequentially
+                                                             consistent
+                                                             instruction. This
+                                                             prevents reordering
+                                                             a seq_cst store
+                                                             followed by a
+                                                             seq_cst load. (Note
+                                                             that seq_cst is
+                                                             stronger than
+                                                             acquire/release as
+                                                             the reordering of
+                                                             load acquire
+                                                             followed by a store
+                                                             release is
+                                                             prevented by the
+                                                             s_waitcnt of
+                                                             the release, but
+                                                             there is nothing
+                                                             preventing a store
+                                                             release followed by
+                                                             load acquire from
+                                                             completing out of
+                                                             order. The s_waitcnt
+                                                             could be placed after
+                                                             seq_store or before
+                                                             the seq_load. We
+                                                             choose the load to
+                                                             make the s_waitcnt be
+                                                             as late as possible
+                                                             so that the store
+                                                             may have already
+                                                             completed.)
+
+                                                         2. *Following
+                                                            instructions same as
+                                                            corresponding load
+                                                            atomic acquire,
+                                                            except must generated
+                                                            all instructions even
+                                                            for OpenCL.*
+     store atomic seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    store atomic release,
+                               - workgroup    - generic  except must generated
+                               - agent                   all instructions even
+                               - system                  for OpenCL.*
+     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    atomicrmw acq_rel,
+                               - workgroup    - generic  except must generated
+                               - agent                   all instructions even
+                               - system                  for OpenCL.*
+     fence        seq_cst      - singlethread *none*     *Same as corresponding
+                               - wavefront               fence acq_rel,
+                               - workgroup               except must generated
+                               - agent                   all instructions even
+                               - system                  for OpenCL.*
+     ============ ============ ============== ========== ================================
+
+.. _amdgpu-amdhsa-memory-model-gfx10:
+
+Memory Model GFX10
+++++++++++++++++++
+
+For GFX10:
+
+* Each agent has multiple shader arrays (SA).
+* Each SA has multiple work-group processors (WGP).
+* Each WGP has multiple compute units (CU).
+* Each CU has multiple SIMDs that execute wavefronts.
+* The wavefronts for a single work-group are executed in the same
+  WGP. In CU wavefront execution mode the wavefronts may be executed by
+  
diff erent SIMDs in the same CU. In WGP wavefront execution mode the
+  wavefronts may be executed by 
diff erent SIMDs in 
diff erent CUs in the same
+  WGP.
+* Each WGP has a single LDS memory shared by the wavefronts of the work-groups
+  executing on it.
+* All LDS operations of a WGP are performed as wavefront wide operations in a
+  global order and involve no caching. Completion is reported to a wavefront in
+  execution order.
+* The LDS memory has multiple request queues shared by the SIMDs of a
+  WGP. Therefore, the LDS operations performed by 
diff erent wavefronts of a
+  work-group can be reordered relative to each other, which can result in
+  reordering the visibility of vector memory operations with respect to LDS
+  operations of other wavefronts in the same work-group. A ``s_waitcnt
+  lgkmcnt(0)`` is required to ensure synchronization between LDS operations and
+  vector memory operations between wavefronts of a work-group, but not between
+  operations performed by the same wavefront.
+* The vector memory operations are performed as wavefront wide operations.
+  Completion of load/store/sample operations are reported to a wavefront in
+  execution order of other load/store/sample operations performed by that
+  wavefront.
+* The vector memory operations access a vector L0 cache. There is a single L0
+  cache per CU. Each SIMD of a CU accesses the same L0 cache. Therefore, no
+  special action is required for coherence between the lanes of a single
+  wavefront. However, a ``buffer_gl0_inv`` is required for coherence between
+  wavefronts executing in the same work-group as they may be executing on SIMDs
+  of 
diff erent CUs that access 
diff erent L0s. A ``buffer_gl0_inv`` is also
+  required for coherence between wavefronts executing in 
diff erent work-groups
+  as they may be executing on 
diff erent WGPs.
+* The scalar memory operations access a scalar L0 cache shared by all wavefronts
+  on a WGP. The scalar and vector L0 caches are not coherent. However, scalar
+  operations are used in a restricted way so do not impact the memory model. See
+  :ref:`amdgpu-amdhsa-memory-spaces`.
+* The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
+  the same SA. Therefore, no special action is required for coherence between
+  the wavefronts of a single work-group. However, a ``buffer_gl1_inv`` is
+  required for coherence between wavefronts executing in 
diff erent work-groups
+  as they may be executing on 
diff erent SAs that access 
diff erent L1s.
+* The L1 caches have independent quadrants to service disjoint ranges of virtual
+  addresses.
+* Each L0 cache has a separate request queue per L1 quadrant. Therefore, the
+  vector and scalar memory operations performed by 
diff erent wavefronts, whether
+  executing in the same or 
diff erent work-groups (which may be executing on
+  
diff erent CUs accessing 
diff erent L0s), can be reordered relative to each
+  other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is required to ensure
+  synchronization between vector memory operations of 
diff erent wavefronts. It
+  ensures a previous vector memory operation has completed before executing a
+  subsequent vector memory or LDS operation and so can be used to meet the
+  requirements of acquire, release and sequential consistency.
+* The L1 caches use an L2 cache shared by all SAs on the same agent.
+* The L2 cache has independent channels to service disjoint ranges of virtual
+  addresses.
+* Each L1 quadrant of a single SA accesses a 
diff erent L2 channel. Each L1
+  quadrant has a separate request queue per L2 channel. Therefore, the vector
+  and scalar memory operations performed by wavefronts executing in 
diff erent
+  work-groups (which may be executing on 
diff erent SAs) of an agent can be
+  reordered relative to each other. A ``s_waitcnt vmcnt(0) & vscnt(0)`` is
+  required to ensure synchronization between vector memory operations of
+  
diff erent SAs. It ensures a previous vector memory operation has completed
+  before executing a subsequent vector memory and so can be used to meet the
+  requirements of acquire, release and sequential consistency.
+* The L2 cache can be kept coherent with other agents on some targets, or ranges
+  of virtual addresses can be set up to bypass it to ensure system coherence.
+
+Scalar memory operations are only used to access memory that is proven to not
+change during the execution of the kernel dispatch. This includes constant
+address space and global address space for program scope const variables.
+Therefore, the kernel machine code does not have to maintain the scalar cache to
+ensure it is coherent with the vector caches. The scalar and vector caches are
+invalidated between kernel dispatches by CP since constant address space data
+may change between kernel dispatch executions. See
+:ref:`amdgpu-amdhsa-memory-spaces`.
+
+The one exception is if scalar writes are used to spill SGPR registers. In this
+case the AMDGPU backend ensures the memory location used to spill is never
+accessed by vector memory operations at the same time. If scalar writes are used
+then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function
+return since the locations may be used for vector memory instructions by a
+future wavefront that uses the same scratch area, or a function call that
+creates a frame at the same address, respectively. There is no need for a
+``s_dcache_inv`` as all scalar writes are write-before-read in the same thread.
+
+For kernarg backing memory:
+
+* CP invalidates the L0 and L1 caches at the start of each kernel dispatch.
+* On dGPU the kernarg backing memory is accessed as MTYPE UC (uncached) to avoid
+  needing to invalidate the L2 cache.
+* On APU the kernarg backing memory is accessed as MTYPE CC (cache coherent) and
+  so the L2 cache will be coherent with the CPU and other agents.
+
+Scratch backing memory (which is used for the private address space) is accessed
+with MTYPE NC (non-coherent). Since the private address space is only accessed
+by a single thread, and is always write-before-read, there is never a need to
+invalidate these entries from the L0 or L1 caches.
+
+Wavefronts are executed in native mode with in-order reporting of loads and
+sample instructions. In this mode vmcnt reports completion of load, atomic with
+return and sample instructions in order, and the vscnt reports the completion of
+store and atomic without return in order. See ``MEM_ORDERED`` field in
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
+
+Wavefronts can be executed in WGP or CU wavefront execution mode:
+
+* In WGP wavefront execution mode the wavefronts of a work-group are executed
+  on the SIMDs of both CUs of the WGP. Therefore, explicit management of the per
+  CU L0 caches is required for work-group synchronization. Also accesses to L1
+  at work-group scope need to be explicitly ordered as the accesses from
+  
diff erent CUs are not ordered.
+* In CU wavefront execution mode the wavefronts of a work-group are executed on
+  the SIMDs of a single CU of the WGP. Therefore, all global memory access by
+  the work-group access the same L0 which in turn ensures L1 accesses are
+  ordered and so do not require explicit management of the caches for
+  work-group synchronization.
+
+See ``WGP_MODE`` field in
+:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table` and
+:ref:`amdgpu-target-features`.
+
+The code sequences used to implement the memory model for GFX10 are defined in
+table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-table`.
+
+  .. table:: AMDHSA Memory Model Code Sequences GFX10
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx10-table
+
+     ============ ============ ============== ========== ================================
+     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code
+                  Ordering     Sync Scope     Address    GFX10
+                                              Space
+     ============ ============ ============== ========== ================================
+     **Non-Atomic**
+     ------------------------------------------------------------------------------------
+     load         *none*       *none*         - global   - !volatile & !nontemporal
+                                              - generic
+                                              - private    1. buffer/global/flat_load
+                                              - constant
+                                                         - volatile & !nontemporal
+
+                                                           1. buffer/global/flat_load
+                                                              glc=1 dlc=1
+
+                                                         - nontemporal
+
+                                                           1. buffer/global/flat_load
+                                                              slc=1
+
+     load         *none*       *none*         - local    1. ds_load
+     store        *none*       *none*         - global   - !nontemporal
+                                              - generic
+                                              - private    1. buffer/global/flat_store
+                                              - constant
+                                                         - nontemporal
+
+                                                            1. buffer/global/flat_store
+                                                               slc=1
+
+     store        *none*       *none*         - local    1. ds_store
+     **Unordered Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  unordered    *any*          *any*      *Same as non-atomic*.
+     store atomic unordered    *any*          *any*      *Same as non-atomic*.
+     atomicrmw    unordered    *any*          *any*      *Same as monotonic
+                                                         atomic*.
+     **Monotonic Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load
+                               - wavefront    - generic
+     load atomic  monotonic    - workgroup    - global   1. buffer/global/flat_load
+                                              - generic     glc=1
+
+                                                           - If CU wavefront execution
+                                                             mode, omit glc=1.
+
+     load atomic  monotonic    - singlethread - local    1. ds_load
+                               - wavefront
+                               - workgroup
+     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load
+                               - system       - generic     glc=1 dlc=1
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
+                               - wavefront    - generic
+                               - workgroup
+                               - agent
+                               - system
+     store atomic monotonic    - singlethread - local    1. ds_store
+                               - wavefront
+                               - workgroup
+     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic
+                               - wavefront    - generic
+                               - workgroup
+                               - agent
+                               - system
+     atomicrmw    monotonic    - singlethread - local    1. ds_atomic
+                               - wavefront
+                               - workgroup
+     **Acquire Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load
+                               - wavefront    - local
+                                              - generic
+     load atomic  acquire      - workgroup    - global   1. buffer/global_load glc=1
+
+                                                           - If CU wavefront execution
+                                                             mode, omit glc=1.
+
+                                                         2. s_waitcnt vmcnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Must happen before
+                                                             the following buffer_gl0_inv
+                                                             and before any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     load atomic  acquire      - workgroup    - local    1. ds_load
+                                                         2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             the following buffer_gl0_inv
+                                                             and before any following
+                                                             global/generic load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the local load
+                                                             atomic value being
+                                                             acquired.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - If OpenCL, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     load atomic  acquire      - workgroup    - generic  1. flat_load glc=1
+
+                                                           - If CU wavefront execution
+                                                             mode, omit glc=1.
+
+                                                         2. s_waitcnt lgkmcnt(0) &
+                                                            vmcnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0).
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv and any
+                                                             following global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than a local load
+                                                             atomic value being
+                                                             acquired.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     load atomic  acquire      - agent        - global   1. buffer/global_load
+                               - system                     glc=1 dlc=1
+                                                         2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_gl*_inv.
+                                                           - Ensures the load
+                                                             has completed
+                                                             before invalidating
+                                                             the caches.
+
+                                                         3. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale global data.
+
+     load atomic  acquire      - agent        - generic  1. flat_load glc=1 dlc=1
+                               - system                  2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_gl*_invl.
+                                                           - Ensures the flat_load
+                                                             has completed
+                                                             before invalidating
+                                                             the caches.
+
+                                                         3. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    acquire      - workgroup    - global   1. buffer/global_atomic
+                                                         2. s_waitcnt vm/vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Use vmcnt(0) if atomic with
+                                                             return and vscnt(0) if
+                                                             atomic with no-return.
+                                                           - Must happen before
+                                                             the following buffer_gl0_inv
+                                                             and before any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     atomicrmw    acquire      - workgroup    - local    1. ds_atomic
+                                                         2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the local
+                                                             atomicrmw value
+                                                             being acquired.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If OpenCL omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     atomicrmw    acquire      - workgroup    - generic  1. flat_atomic
+                                                         2. s_waitcnt lgkmcnt(0) &
+                                                            vm/vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vm/vscnt(0).
+                                                           - If OpenCL, omit lgkmcnt(0).
+                                                           - Use vmcnt(0) if atomic with
+                                                             return and vscnt(0) if
+                                                             atomic with no-return.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than a local
+                                                             atomicrmw value
+                                                             being acquired.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     atomicrmw    acquire      - agent        - global   1. buffer/global_atomic
+                               - system                  2. s_waitcnt vm/vscnt(0)
+
+                                                           - Use vmcnt(0) if atomic with
+                                                             return and vscnt(0) if
+                                                             atomic with no-return.
+                                                           - Must happen before
+                                                             following
+                                                             buffer_gl*_inv.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             caches.
+
+                                                         3. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - agent        - generic  1. flat_atomic
+                               - system                  2. s_waitcnt vm/vscnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Use vmcnt(0) if atomic with
+                                                             return and vscnt(0) if
+                                                             atomic with no-return.
+                                                           - Must happen before
+                                                             following
+                                                             buffer_gl*_inv.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             caches.
+
+                                                         3. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acquire      - singlethread *none*     *none*
+                               - wavefront
+     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0) &
+                                                            vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             local, omit
+                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load
+                                                             atomic/
+                                                             atomicrmw-with-return-value
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             atomicrmw-no-return-value
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv.
+                                                           - Ensures that the
+                                                             fence-paired atomic
+                                                             has completed
+                                                             before invalidating
+                                                             the
+                                                             cache. Therefore
+                                                             any following
+                                                             locations read must
+                                                             be no older than
+                                                             the value read by
+                                                             the
+                                                             fence-paired-atomic.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     fence        acquire      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             local, omit
+                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load
+                                                             atomic/
+                                                             atomicrmw-with-return-value
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             atomicrmw-no-return-value
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl*_inv.
+                                                           - Ensures that the
+                                                             fence-paired atomic
+                                                             has completed
+                                                             before invalidating
+                                                             the
+                                                             caches. Therefore
+                                                             any following
+                                                             locations read must
+                                                             be no older than
+                                                             the value read by
+                                                             the
+                                                             fence-paired-atomic.
+
+                                                         2. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before any
+                                                             following global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     **Release Atomic**
+     ------------------------------------------------------------------------------------
+     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store
+                               - wavefront    - local
+                                              - generic
+     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0) &
+                                              - generic     vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/flat_store
+     store atomic release      - workgroup    - local    1. s_waitcnt vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - If OpenCL, omit.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and s_waitcnt
+                                                             vscnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             global memory
+                                                             operations have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. ds_store
+     store atomic release      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt vscnt(0)
+                                                             and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/flat_store
+     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0) &
+                                              - generic     vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL, omit lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global/flat_atomic
+     atomicrmw    release      - workgroup    - local    1. s_waitcnt vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - If OpenCL, omit.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and s_waitcnt
+                                                             vscnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             global memory
+                                                             operations have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. ds_atomic
+     atomicrmw    release      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic      vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global and local
+                                                             have completed
+                                                             before performing
+                                                             the atomicrmw that
+                                                             is being released.
+
+                                                         2. buffer/global/flat_atomic
+     fence        release      - singlethread *none*     *none*
+                               - wavefront
+     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0) &
+                                                            vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             local, omit
+                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store atomic/
+                                                             atomicrmw.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
+
+     fence        release      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             local, omit
+                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
+
+     **Acquire-Release Atomic**
+     ------------------------------------------------------------------------------------
+     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0) &
+                                                            vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0), and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+                                                         3. s_waitcnt vm/vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Use vmcnt(0) if atomic with
+                                                             return and vscnt(0) if
+                                                             atomic with no-return.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the
+                                                             atomicrmw value
+                                                             being acquired.
+
+                                                         4. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     atomicrmw    acq_rel      - workgroup    - local    1. s_waitcnt vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - If OpenCL, omit.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and s_waitcnt
+                                                             vscnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             global memory
+                                                             operations have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. ds_atomic
+                                                         3. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the local load
+                                                             atomic value being
+                                                             acquired.
+
+                                                         4. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - If OpenCL omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0) &
+                                                            vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL, omit lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store
+                                                             atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt lgkmcnt(0) &
+                                                            vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL, omit lgkmcnt(0).
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the load
+                                                             atomic value being
+                                                             acquired.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+                                                         3. s_waitcnt vm/vscnt(0)
+
+                                                           - Use vmcnt(0) if atomic with
+                                                             return and vscnt(0) if
+                                                             atomic with no-return.
+                                                           - Must happen before
+                                                             following
+                                                             buffer_gl*_inv.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             caches.
+
+                                                         4. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0), and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load atomic
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt vm/vscnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Use vmcnt(0) if atomic with
+                                                             return and vscnt(0) if
+                                                             atomic with no-return.
+                                                           - Must happen before
+                                                             following
+                                                             buffer_gl*_inv.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             caches.
+
+                                                         4. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acq_rel      - singlethread *none*     *none*
+                               - wavefront
+     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0) &
+                                                            vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             local, omit
+                                                             vmcnt(0) and vscnt(0).
+                                                           - However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store atomic/
+                                                             atomicrmw.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             have
+                                                             completed before
+                                                             performing any
+                                                             following global
+                                                             memory operations.
+                                                           - Ensures that the
+                                                             preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             acquire-fence-paired-atomic)
+                                                             has completed
+                                                             before following
+                                                             global memory
+                                                             operations. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             local/generic store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             release-fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl0_inv.
+                                                           - Ensures that the
+                                                             acquire-fence-paired
+                                                             atomic has completed
+                                                             before invalidating
+                                                             the
+                                                             cache. Therefore
+                                                             any following
+                                                             locations read must
+                                                             be no older than
+                                                             the value read by
+                                                             the
+                                                             acquire-fence-paired-atomic.
+
+                                                         3. buffer_gl0_inv
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale data.
+
+     fence        acq_rel      - agent        *none*     1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                           - If OpenCL and
+                                                             address space is
+                                                             local, omit
+                                                             vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/load
+                                                             atomic/
+                                                             atomicrmw-with-return-value.
+                                                           - s_waitcnt vscnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             store/store atomic/
+                                                             atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_gl*_inv.
+                                                           - Ensures that the
+                                                             preceding
+                                                             global/local/generic
+                                                             load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             acquire-fence-paired-atomic)
+                                                             has completed
+                                                             before invalidating
+                                                             the caches. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             global/local/generic
+                                                             store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             release-fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+
+                                                         2. buffer_gl0_inv;
+                                                            buffer_gl1_inv
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+
+     **Sequential Consistent Atomic**
+     ------------------------------------------------------------------------------------
+     load atomic  seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    load atomic acquire,
+                                              - generic  except must generated
+                                                         all instructions even
+                                                         for OpenCL.*
+     load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkmcnt(0) &
+                                              - generic     vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit vmcnt(0) and
+                                                             vscnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0), and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt lgkmcnt(0) must
+                                                             happen after
+                                                             preceding
+                                                             local/generic load
+                                                             atomic/store
+                                                             atomic/atomicrmw
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             lgkmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             preceding
+                                                             global/generic load
+                                                             atomic/
+                                                             atomicrmw-with-return-value
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - s_waitcnt vscnt(0)
+                                                             Must happen after
+                                                             preceding
+                                                             global/generic store
+                                                             atomic/
+                                                             atomicrmw-no-return-value
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vscnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - Ensures any
+                                                             preceding
+                                                             sequential
+                                                             consistent global/local
+                                                             memory instructions
+                                                             have completed
+                                                             before executing
+                                                             this sequentially
+                                                             consistent
+                                                             instruction. This
+                                                             prevents reordering
+                                                             a seq_cst store
+                                                             followed by a
+                                                             seq_cst load. (Note
+                                                             that seq_cst is
+                                                             stronger than
+                                                             acquire/release as
+                                                             the reordering of
+                                                             load acquire
+                                                             followed by a store
+                                                             release is
+                                                             prevented by the
+                                                             s_waitcnt of
+                                                             the release, but
+                                                             there is nothing
+                                                             preventing a store
+                                                             release followed by
+                                                             load acquire from
+                                                             completing out of
+                                                             order. The s_waitcnt
+                                                             could be placed after
+                                                             seq_store or before
+                                                             the seq_load. We
+                                                             choose the load to
+                                                             make the s_waitcnt be
+                                                             as late as possible
+                                                             so that the store
+                                                             may have already
+                                                             completed.)
+
+                                                         2. *Following
+                                                            instructions same as
+                                                            corresponding load
+                                                            atomic acquire,
+                                                            except must generated
+                                                            all instructions even
+                                                            for OpenCL.*
+     load atomic  seq_cst      - workgroup    - local
+
+                                                         1. s_waitcnt vmcnt(0) & vscnt(0)
+
+                                                           - If CU wavefront execution
+                                                             mode, omit.
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and s_waitcnt
+                                                             vscnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             Must happen after
+                                                             preceding
+                                                             global/generic load
+                                                             atomic/
+                                                             atomicrmw-with-return-value
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - s_waitcnt vscnt(0)
+                                                             Must happen after
+                                                             preceding
+                                                             global/generic store
+                                                             atomic/
+                                                             atomicrmw-no-return-value
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vscnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - Ensures any
+                                                             preceding
+                                                             sequential
+                                                             consistent global
+                                                             memory instructions
+                                                             have completed
+                                                             before executing
+                                                             this sequentially
+                                                             consistent
+                                                             instruction. This
+                                                             prevents reordering
+                                                             a seq_cst store
+                                                             followed by a
+                                                             seq_cst load. (Note
+                                                             that seq_cst is
+                                                             stronger than
+                                                             acquire/release as
+                                                             the reordering of
+                                                             load acquire
+                                                             followed by a store
+                                                             release is
+                                                             prevented by the
+                                                             s_waitcnt of
+                                                             the release, but
+                                                             there is nothing
+                                                             preventing a store
+                                                             release followed by
+                                                             load acquire from
+                                                             completing out of
+                                                             order. The s_waitcnt
+                                                             could be placed after
+                                                             seq_store or before
+                                                             the seq_load. We
+                                                             choose the load to
+                                                             make the s_waitcnt be
+                                                             as late as possible
+                                                             so that the store
+                                                             may have already
+                                                             completed.)
+
+                                                         2. *Following
+                                                            instructions same as
+                                                            corresponding load
+                                                            atomic acquire,
+                                                            except must generated
+                                                            all instructions even
+                                                            for OpenCL.*
+
+     load atomic  seq_cst      - agent        - global   1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0) & vscnt(0)
+
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0), s_waitcnt
+                                                             vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             preceding
+                                                             local load
+                                                             atomic/store
+                                                             atomic/atomicrmw
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             lgkmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             preceding
+                                                             global/generic load
+                                                             atomic/
+                                                             atomicrmw-with-return-value
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - s_waitcnt vscnt(0)
+                                                             Must happen after
+                                                             preceding
+                                                             global/generic store
+                                                             atomic/
+                                                             atomicrmw-no-return-value
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vscnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - Ensures any
+                                                             preceding
+                                                             sequential
+                                                             consistent global
+                                                             memory instructions
+                                                             have completed
+                                                             before executing
+                                                             this sequentially
+                                                             consistent
+                                                             instruction. This
+                                                             prevents reordering
+                                                             a seq_cst store
+                                                             followed by a
+                                                             seq_cst load. (Note
+                                                             that seq_cst is
+                                                             stronger than
+                                                             acquire/release as
+                                                             the reordering of
+                                                             load acquire
+                                                             followed by a store
+                                                             release is
+                                                             prevented by the
+                                                             s_waitcnt of
+                                                             the release, but
+                                                             there is nothing
+                                                             preventing a store
+                                                             release followed by
+                                                             load acquire from
+                                                             completing out of
+                                                             order. The s_waitcnt
+                                                             could be placed after
+                                                             seq_store or before
+                                                             the seq_load. We
+                                                             choose the load to
+                                                             make the s_waitcnt be
+                                                             as late as possible
+                                                             so that the store
+                                                             may have already
+                                                             completed.)
+
+                                                         2. *Following
+                                                            instructions same as
+                                                            corresponding load
+                                                            atomic acquire,
+                                                            except must generated
+                                                            all instructions even
+                                                            for OpenCL.*
+     store atomic seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    store atomic release,
+                               - workgroup    - generic  except must generated
+                               - agent                   all instructions even
+                               - system                  for OpenCL.*
+     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    atomicrmw acq_rel,
+                               - workgroup    - generic  except must generated
+                               - agent                   all instructions even
+                               - system                  for OpenCL.*
+     fence        seq_cst      - singlethread *none*     *Same as corresponding
+                               - wavefront               fence acq_rel,
+                               - workgroup               except must generated
+                               - agent                   all instructions even
+                               - system                  for OpenCL.*
+     ============ ============ ============== ========== ================================
 
 Trap Handler ABI
 ~~~~~~~~~~~~~~~~
@@ -6738,7 +7684,7 @@ On entry to a function:
     after the last local allocation.
 
 9.  All other registers are unspecified.
-10. Any necessary ``waitcnt`` has been performed to ensure memory is available
+10. Any necessary ``s_waitcnt`` has been performed to ensure memory is available
     to the function.
 
 On exit from a function:
@@ -6778,7 +7724,7 @@ On exit from a function:
 2.  The PC is set to the RA provided on entry.
 3.  MODE register: *TBD*.
 4.  All other registers are clobbered.
-5.  Any necessary ``waitcnt`` has been performed to ensure memory accessed by
+5.  Any necessary ``s_waitcnt`` has been performed to ensure memory accessed by
     function is available to the caller.
 
 .. TODO::


        


More information about the llvm-commits mailing list