[llvm] 580f99b - [NFC][AMDGPU] Resize Memory Model columns in AMDGPUUsage.rst

Scott Linder via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 29 16:10:31 PDT 2020


Author: Scott Linder
Date: 2020-10-29T23:07:03Z
New Revision: 580f99bcff31f4536aeba3d625139e16ba9c7b64

URL: https://github.com/llvm/llvm-project/commit/580f99bcff31f4536aeba3d625139e16ba9c7b64
DIFF: https://github.com/llvm/llvm-project/commit/580f99bcff31f4536aeba3d625139e16ba9c7b64.diff

LOG: [NFC][AMDGPU] Resize Memory Model columns in AMDGPUUsage.rst

Make all of the "AMDGPU Machine Code GFX*" columns in the Memory Model
table a consistent width of 32-characters.

Best viewed with something like --word-diff

Differential Revision: https://reviews.llvm.org/D89977

Added: 
    

Modified: 
    llvm/docs/AMDGPUUsage.rst

Removed: 
    


################################################################################
diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 5a06a013da52..366405805655 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4392,397 +4392,397 @@ agents.
   .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX10
      :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx10-table
 
-     ============ ============ ============== ========== =============================== ==================================
-     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code             AMDGPU Machine Code
-                  Ordering     Sync Scope     Address    GFX6-9                          GFX10
+     ============ ============ ============== ========== ================================ ================================
+     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code              AMDGPU Machine Code
+                  Ordering     Sync Scope     Address    GFX6-9                           GFX10
                                               Space
-     ============ ============ ============== ========== =============================== ==================================
+     ============ ============ ============== ========== ================================ ================================
      **Non-Atomic**
-     ----------------------------------------------------------------------------------------------------------------------
-     load         *none*       *none*         - global   - !volatile & !nontemporal      - !volatile & !nontemporal
+     ---------------------------------------------------------------------------------------------------------------------
+     load         *none*       *none*         - global   - !volatile & !nontemporal       - !volatile & !nontemporal
                                               - generic
-                                              - private    1. buffer/global/flat_load      1. buffer/global/flat_load
+                                              - private    1. buffer/global/flat_load       1. buffer/global/flat_load
                                               - constant
-                                                         - volatile & !nontemporal       - volatile & !nontemporal
+                                                         - volatile & !nontemporal        - volatile & !nontemporal
 
-                                                           1. buffer/global/flat_load      1. buffer/global/flat_load
-                                                              glc=1                           glc=1 dlc=1
+                                                           1. buffer/global/flat_load       1. buffer/global/flat_load
+                                                              glc=1                            glc=1 dlc=1
 
-                                                         - nontemporal                   - nontemporal
+                                                         - nontemporal                    - nontemporal
 
-                                                           1. buffer/global/flat_load      1. buffer/global/flat_load
-                                                              glc=1 slc=1                     slc=1
+                                                           1. buffer/global/flat_load       1. buffer/global/flat_load
+                                                              glc=1 slc=1                      slc=1
 
-     load         *none*       *none*         - local    1. ds_load                      1. ds_load
-     store        *none*       *none*         - global   - !nontemporal                  - !nontemporal
+     load         *none*       *none*         - local    1. ds_load                       1. ds_load
+     store        *none*       *none*         - global   - !nontemporal                   - !nontemporal
                                               - generic
-                                              - private    1. buffer/global/flat_store     1. buffer/global/flat_store
+                                              - private    1. buffer/global/flat_store      1. buffer/global/flat_store
                                               - constant
-                                                         - nontemporal                   - nontemporal
+                                                         - nontemporal                    - nontemporal
 
-                                                           1. buffer/global/flat_store      1. buffer/global/flat_store
-                                                              glc=1 slc=1                      slc=1
+                                                           1. buffer/global/flat_store       1. buffer/global/flat_store
+                                                              glc=1 slc=1                       slc=1
 
-     store        *none*       *none*         - local    1. ds_store                     1. ds_store
+     store        *none*       *none*         - local    1. ds_store                      1. ds_store
      **Unordered Atomic**
-     ----------------------------------------------------------------------------------------------------------------------
-     load atomic  unordered    *any*          *any*      *Same as non-atomic*.           *Same as non-atomic*.
-     store atomic unordered    *any*          *any*      *Same as non-atomic*.           *Same as non-atomic*.
-     atomicrmw    unordered    *any*          *any*      *Same as monotonic              *Same as monotonic
-                                                         atomic*.                        atomic*.
+     ---------------------------------------------------------------------------------------------------------------------
+     load atomic  unordered    *any*          *any*      *Same as non-atomic*.            *Same as non-atomic*.
+     store atomic unordered    *any*          *any*      *Same as non-atomic*.            *Same as non-atomic*.
+     atomicrmw    unordered    *any*          *any*      *Same as monotonic               *Same as monotonic
+                                                         atomic*.                         atomic*.
      **Monotonic Atomic**
-     ----------------------------------------------------------------------------------------------------------------------
-     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load      1. buffer/global/flat_load
+     ---------------------------------------------------------------------------------------------------------------------
+     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load       1. buffer/global/flat_load
                                - wavefront    - generic
-     load atomic  monotonic    - workgroup    - global   1. buffer/global/flat_load      1. buffer/global/flat_load
-                                              - generic                                     glc=1
+     load atomic  monotonic    - workgroup    - global   1. buffer/global/flat_load       1. buffer/global/flat_load
+                                              - generic                                      glc=1
 
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit glc=1.
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit glc=1.
 
-     load atomic  monotonic    - singlethread - local    1. ds_load                      1. ds_load
+     load atomic  monotonic    - singlethread - local    1. ds_load                       1. ds_load
                                - wavefront
                                - workgroup
-     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load      1. buffer/global/flat_load
-                               - system       - generic     glc=1                           glc=1 dlc=1
-     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store     1. buffer/global/flat_store
+     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load       1. buffer/global/flat_load
+                               - system       - generic     glc=1                            glc=1 dlc=1
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store      1. buffer/global/flat_store
                                - wavefront    - generic
                                - workgroup
                                - agent
                                - system
-     store atomic monotonic    - singlethread - local    1. ds_store                     1. ds_store
+     store atomic monotonic    - singlethread - local    1. ds_store                      1. ds_store
                                - wavefront
                                - workgroup
-     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic    1. buffer/global/flat_atomic
+     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic     1. buffer/global/flat_atomic
                                - wavefront    - generic
                                - workgroup
                                - agent
                                - system
-     atomicrmw    monotonic    - singlethread - local    1. ds_atomic                    1. ds_atomic
+     atomicrmw    monotonic    - singlethread - local    1. ds_atomic                     1. ds_atomic
                                - wavefront
                                - workgroup
      **Acquire Atomic**
-     ----------------------------------------------------------------------------------------------------------------------
-     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load   1. buffer/global/ds/flat_load
+     ---------------------------------------------------------------------------------------------------------------------
+     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load    1. buffer/global/ds/flat_load
                                - wavefront    - local
                                               - generic
-     load atomic  acquire      - workgroup    - global   1. buffer/global_load           1. buffer/global_load glc=1
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit glc=1.
-
-                                                                                         2. s_waitcnt vmcnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Must happen before
-                                                                                             the following buffer_gl0_inv
-                                                                                             and before any following
-                                                                                             global/generic
-                                                                                             load/load
-                                                                                             atomic/store/store
-                                                                                             atomic/atomicrmw.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     load atomic  acquire      - workgroup    - local    1. ds_load                      1. ds_load
-                                                         2. s_waitcnt lgkmcnt(0)         2. s_waitcnt lgkmcnt(0)
-
-                                                           - If OpenCL, omit.              - If OpenCL, omit.
-                                                           - Must happen before            - Must happen before
-                                                             any following                   the following buffer_gl0_inv
-                                                             global/generic                  and before any following
-                                                             load/load                       global/generic load/load
-                                                             atomic/store/store              atomic/store/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures any                   - Ensures any
-                                                             following global                following global
-                                                             data read is no                 data read is no
-                                                             older than the load             older than the load
-                                                             atomic value being              atomic value being
-                                                             acquired.                       acquired.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - If OpenCL, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     load atomic  acquire      - workgroup    - generic  1. flat_load                    1. flat_load glc=1
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit glc=1.
-
-                                                         2. s_waitcnt lgkmcnt(0)         2. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             lgkmcnt(0).
-                                                           - Must happen before            - Must happen before
-                                                             any following                   the following
-                                                             global/generic                  buffer_gl0_inv and any
-                                                             load/load                       following global/generic
-                                                             atomic/store/store              load/load
-                                                             atomic/atomicrmw.               atomic/store/store
-                                                                                             atomic/atomicrmw.
-                                                           - Ensures any                   - Ensures any
-                                                             following global                following global
-                                                             data read is no                 data read is no
-                                                             older than the load             older than the load
-                                                             atomic value being              atomic value being
-                                                             acquired.                       acquired.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     load atomic  acquire      - agent        - global   1. buffer/global_load           1. buffer/global_load
-                               - system                     glc=1                           glc=1 dlc=1
-                                                         2. s_waitcnt vmcnt(0)           2. s_waitcnt vmcnt(0)
-
-                                                           - Must happen before            - Must happen before
-                                                             following                       following
-                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
-                                                           - Ensures the load              - Ensures the load
-                                                             has completed                   has completed
-                                                             before invalidating             before invalidating
-                                                             the cache.                      the caches.
-
-                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following                       following
-                                                             loads will not see              loads will not see
-                                                             stale global data.              stale global data.
-
-     load atomic  acquire      - agent        - generic  1. flat_load glc=1              1. flat_load glc=1 dlc=1
-                               - system                  2. s_waitcnt vmcnt(0) &         2. s_waitcnt vmcnt(0) &
-                                                            lgkmcnt(0)                      lgkmcnt(0)
-
-                                                           - If OpenCL omit                - If OpenCL omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                           - Must happen before            - Must happen before
-                                                             following                       following
-                                                             buffer_wbinvl1_vol.             buffer_gl*_invl.
-                                                           - Ensures the flat_load         - Ensures the flat_load
-                                                             has completed                   has completed
-                                                             before invalidating             before invalidating
-                                                             the cache.                      the caches.
-
-                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following loads                 following loads
-                                                             will not see stale              will not see stale
-                                                             global data.                    global data.
-
-     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
+     load atomic  acquire      - workgroup    - global   1. buffer/global_load            1. buffer/global_load glc=1
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit glc=1.
+
+                                                                                          2. s_waitcnt vmcnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Must happen before
+                                                                                              the following buffer_gl0_inv
+                                                                                              and before any following
+                                                                                              global/generic
+                                                                                              load/load
+                                                                                              atomic/store/store
+                                                                                              atomic/atomicrmw.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     load atomic  acquire      - workgroup    - local    1. ds_load                       1. ds_load
+                                                         2. s_waitcnt lgkmcnt(0)          2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.               - If OpenCL, omit.
+                                                           - Must happen before             - Must happen before
+                                                             any following                    the following buffer_gl0_inv
+                                                             global/generic                   and before any following
+                                                             load/load                        global/generic load/load
+                                                             atomic/store/store               atomic/store/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures any                    - Ensures any
+                                                             following global                 following global
+                                                             data read is no                  data read is no
+                                                             older than the load              older than the load
+                                                             atomic value being               atomic value being
+                                                             acquired.                        acquired.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - If OpenCL, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     load atomic  acquire      - workgroup    - generic  1. flat_load                     1. flat_load glc=1
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit glc=1.
+
+                                                         2. s_waitcnt lgkmcnt(0)          2. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              lgkmcnt(0).
+                                                           - Must happen before             - Must happen before
+                                                             any following                    the following
+                                                             global/generic                   buffer_gl0_inv and any
+                                                             load/load                        following global/generic
+                                                             atomic/store/store               load/load
+                                                             atomic/atomicrmw.                atomic/store/store
+                                                                                              atomic/atomicrmw.
+                                                           - Ensures any                    - Ensures any
+                                                             following global                 following global
+                                                             data read is no                  data read is no
+                                                             older than the load              older than the load
+                                                             atomic value being               atomic value being
+                                                             acquired.                        acquired.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     load atomic  acquire      - agent        - global   1. buffer/global_load            1. buffer/global_load
+                               - system                     glc=1                            glc=1 dlc=1
+                                                         2. s_waitcnt vmcnt(0)            2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before             - Must happen before
+                                                             following                        following
+                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
+                                                           - Ensures the load               - Ensures the load
+                                                             has completed                    has completed
+                                                             before invalidating              before invalidating
+                                                             the cache.                       the caches.
+
+                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following                        following
+                                                             loads will not see               loads will not see
+                                                             stale global data.               stale global data.
+
+     load atomic  acquire      - agent        - generic  1. flat_load glc=1               1. flat_load glc=1 dlc=1
+                               - system                  2. s_waitcnt vmcnt(0) &          2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)                       lgkmcnt(0)
+
+                                                           - If OpenCL omit                 - If OpenCL omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                           - Must happen before             - Must happen before
+                                                             following                        following
+                                                             buffer_wbinvl1_vol.              buffer_gl*_invl.
+                                                           - Ensures the flat_load          - Ensures the flat_load
+                                                             has completed                    has completed
+                                                             before invalidating              before invalidating
+                                                             the cache.                       the caches.
+
+                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following loads                  following loads
+                                                             will not see stale               will not see stale
+                                                             global data.                     global data.
+
+     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic  1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    acquire      - workgroup    - global   1. buffer/global_atomic         1. buffer/global_atomic
-                                                                                         2. s_waitcnt vm/vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Use vmcnt(0) if atomic with
-                                                                                             return and vscnt(0) if
-                                                                                             atomic with no-return.
-                                                                                           - Must happen before
-                                                                                             the following buffer_gl0_inv
-                                                                                             and before any following
-                                                                                             global/generic
-                                                                                             load/load
-                                                                                             atomic/store/store
-                                                                                             atomic/atomicrmw.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     atomicrmw    acquire      - workgroup    - local    1. ds_atomic                    1. ds_atomic
-                                                         2. waitcnt lgkmcnt(0)           2. waitcnt lgkmcnt(0)
-
-                                                           - If OpenCL, omit.              - If OpenCL, omit.
-                                                           - Must happen before            - Must happen before
-                                                             any following                   the following
-                                                             global/generic                  buffer_gl0_inv.
+     atomicrmw    acquire      - workgroup    - global   1. buffer/global_atomic          1. buffer/global_atomic
+                                                                                          2. s_waitcnt vm/vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Use vmcnt(0) if atomic with
+                                                                                              return and vscnt(0) if
+                                                                                              atomic with no-return.
+                                                                                            - Must happen before
+                                                                                              the following buffer_gl0_inv
+                                                                                              and before any following
+                                                                                              global/generic
+                                                                                              load/load
+                                                                                              atomic/store/store
+                                                                                              atomic/atomicrmw.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     atomicrmw    acquire      - workgroup    - local    1. ds_atomic                     1. ds_atomic
+                                                         2. waitcnt lgkmcnt(0)            2. waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.               - If OpenCL, omit.
+                                                           - Must happen before             - Must happen before
+                                                             any following                    the following
+                                                             global/generic                   buffer_gl0_inv.
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                   - Ensures any
-                                                             following global                following global
-                                                             data read is no                 data read is no
-                                                             older than the                  older than the
-                                                             atomicrmw value                 atomicrmw value
-                                                             being acquired.                 being acquired.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If OpenCL omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     atomicrmw    acquire      - workgroup    - generic  1. flat_atomic                  1. flat_atomic
-                                                         2. waitcnt lgkmcnt(0)           2. waitcnt lgkmcnt(0) &
-                                                                                            vm/vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vm/vscnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             waitcnt lgkmcnt(0).
-                                                                                           - Use vmcnt(0) if atomic with
-                                                                                             return and vscnt(0) if
-                                                                                             atomic with no-return.
-                                                           - Must happen before            - Must happen before
-                                                             any following                   the following
-                                                             global/generic                  buffer_gl0_inv.
+                                                           - Ensures any                    - Ensures any
+                                                             following global                 following global
+                                                             data read is no                  data read is no
+                                                             older than the                   older than the
+                                                             atomicrmw value                  atomicrmw value
+                                                             being acquired.                  being acquired.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If OpenCL omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     atomicrmw    acquire      - workgroup    - generic  1. flat_atomic                   1. flat_atomic
+                                                         2. waitcnt lgkmcnt(0)            2. waitcnt lgkmcnt(0) &
+                                                                                             vm/vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vm/vscnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              waitcnt lgkmcnt(0).
+                                                                                            - Use vmcnt(0) if atomic with
+                                                                                              return and vscnt(0) if
+                                                                                              atomic with no-return.
+                                                           - Must happen before             - Must happen before
+                                                             any following                    the following
+                                                             global/generic                   buffer_gl0_inv.
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                   - Ensures any
-                                                             following global                following global
-                                                             data read is no                 data read is no
-                                                             older than the                  older than the
-                                                             atomicrmw value                 atomicrmw value
-                                                             being acquired.                 being acquired.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     atomicrmw    acquire      - agent        - global   1. buffer/global_atomic         1. buffer/global_atomic
-                               - system                  2. s_waitcnt vmcnt(0)           2. s_waitcnt vm/vscnt(0)
-
-                                                                                           - Use vmcnt(0) if atomic with
-                                                                                             return and vscnt(0) if
-                                                                                             atomic with no-return.
-                                                                                             waitcnt lgkmcnt(0).
-                                                           - Must happen before            - Must happen before
-                                                             following                       following
-                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
-                                                           - Ensures the                   - Ensures the
-                                                             atomicrmw has                   atomicrmw has
-                                                             completed before                completed before
-                                                             invalidating the                invalidating the
-                                                             cache.                          caches.
-
-                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following loads                 following loads
-                                                             will not see stale              will not see stale
-                                                             global data.                    global data.
-
-     atomicrmw    acquire      - agent        - generic  1. flat_atomic                  1. flat_atomic
-                               - system                  2. s_waitcnt vmcnt(0) &         2. s_waitcnt vm/vscnt(0) &
-                                                            lgkmcnt(0)                      lgkmcnt(0)
-
-                                                           - If OpenCL, omit               - If OpenCL, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                                                           - Use vmcnt(0) if atomic with
-                                                                                             return and vscnt(0) if
-                                                                                             atomic with no-return.
-                                                           - Must happen before            - Must happen before
-                                                             following                       following
-                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
-                                                           - Ensures the                   - Ensures the
-                                                             atomicrmw has                   atomicrmw has
-                                                             completed before                completed before
-                                                             invalidating the                invalidating the
-                                                             cache.                          caches.
-
-                                                         3. buffer_wbinvl1_vol           3. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following loads                 following loads
-                                                             will not see stale              will not see stale
-                                                             global data.                    global data.
-
-     fence        acquire      - singlethread *none*     *none*                          *none*
+                                                           - Ensures any                    - Ensures any
+                                                             following global                 following global
+                                                             data read is no                  data read is no
+                                                             older than the                   older than the
+                                                             atomicrmw value                  atomicrmw value
+                                                             being acquired.                  being acquired.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     atomicrmw    acquire      - agent        - global   1. buffer/global_atomic          1. buffer/global_atomic
+                               - system                  2. s_waitcnt vmcnt(0)            2. s_waitcnt vm/vscnt(0)
+
+                                                                                            - Use vmcnt(0) if atomic with
+                                                                                              return and vscnt(0) if
+                                                                                              atomic with no-return.
+                                                                                              waitcnt lgkmcnt(0).
+                                                           - Must happen before             - Must happen before
+                                                             following                        following
+                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
+                                                           - Ensures the                    - Ensures the
+                                                             atomicrmw has                    atomicrmw has
+                                                             completed before                 completed before
+                                                             invalidating the                 invalidating the
+                                                             cache.                           caches.
+
+                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following loads                  following loads
+                                                             will not see stale               will not see stale
+                                                             global data.                     global data.
+
+     atomicrmw    acquire      - agent        - generic  1. flat_atomic                   1. flat_atomic
+                               - system                  2. s_waitcnt vmcnt(0) &          2. s_waitcnt vm/vscnt(0) &
+                                                            lgkmcnt(0)                       lgkmcnt(0)
+
+                                                           - If OpenCL, omit                - If OpenCL, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                                                            - Use vmcnt(0) if atomic with
+                                                                                              return and vscnt(0) if
+                                                                                              atomic with no-return.
+                                                           - Must happen before             - Must happen before
+                                                             following                        following
+                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
+                                                           - Ensures the                    - Ensures the
+                                                             atomicrmw has                    atomicrmw has
+                                                             completed before                 completed before
+                                                             invalidating the                 invalidating the
+                                                             cache.                           caches.
+
+                                                         3. buffer_wbinvl1_vol            3. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following loads                  following loads
+                                                             will not see stale               will not see stale
+                                                             global data.                     global data.
+
+     fence        acquire      - singlethread *none*     *none*                           *none*
                                - wavefront
-     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL and                 - If OpenCL and
-                                                             address space is                address space is
-                                                             not generic, omit.              not generic, omit
-                                                                                             lgkmcnt(0).
-                                                                                           - If OpenCL and
-                                                                                             address space is
-                                                                                             local, omit
-                                                                                             vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM           - However, since LLVM
-                                                             currently has no                currently has no
-                                                             address space on                address space on
-                                                             the fence need to               the fence need to
-                                                             conservatively                  conservatively
-                                                             always generate. If             always generate. If
-                                                             fence had an                    fence had an
-                                                             address space then              address space then
-                                                             set to address                  set to address
-                                                             space of OpenCL                 space of OpenCL
-                                                             fence flag, or to               fence flag, or to
-                                                             generic if both                 generic if both
-                                                             local and global                local and global
-                                                             flags are                       flags are
-                                                             specified.                      specified.
+     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL and                  - If OpenCL and
+                                                             address space is                 address space is
+                                                             not generic, omit.               not generic, omit
+                                                                                              lgkmcnt(0).
+                                                                                            - If OpenCL and
+                                                                                              address space is
+                                                                                              local, omit
+                                                                                              vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM            - However, since LLVM
+                                                             currently has no                 currently has no
+                                                             address space on                 address space on
+                                                             the fence need to                the fence need to
+                                                             conservatively                   conservatively
+                                                             always generate. If              always generate. If
+                                                             fence had an                     fence had an
+                                                             address space then               address space then
+                                                             set to address                   set to address
+                                                             space of OpenCL                  space of OpenCL
+                                                             fence flag, or to                fence flag, or to
+                                                             generic if both                  generic if both
+                                                             local and global                 local and global
+                                                             flags are                        flags are
+                                                             specified.                       specified.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic load
@@ -4806,96 +4806,96 @@ agents.
                                                              older than the
                                                              value read by the
                                                              fence-paired-atomic.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value
-                                                                                             with an equal or
-                                                                                             wider sync scope
-                                                                                             and memory ordering
-                                                                                             stronger than
-                                                                                             unordered (this is
-                                                                                             termed the
-                                                                                             fence-paired-atomic).
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             atomicrmw-no-return-value
-                                                                                             with an equal or
-                                                                                             wider sync scope
-                                                                                             and memory ordering
-                                                                                             stronger than
-                                                                                             unordered (this is
-                                                                                             termed the
-                                                                                             fence-paired-atomic).
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic load
-                                                                                             atomic/atomicrmw
-                                                                                             with an equal or
-                                                                                             wider sync scope
-                                                                                             and memory ordering
-                                                                                             stronger than
-                                                                                             unordered (this is
-                                                                                             termed the
-                                                                                             fence-paired-atomic).
-                                                                                           - Must happen before
-                                                                                             the following
-                                                                                             buffer_gl0_inv.
-                                                                                           - Ensures that the
-                                                                                             fence-paired atomic
-                                                                                             has completed
-                                                                                             before invalidating
-                                                                                             the
-                                                                                             cache. Therefore
-                                                                                             any following
-                                                                                             locations read must
-                                                                                             be no older than
-                                                                                             the value read by
-                                                                                             the
-                                                                                             fence-paired-atomic.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     fence        acquire      - agent        *none*     1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL and                 - If OpenCL and
-                                                             address space is                address space is
-                                                             not generic, omit               not generic, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                                                           - If OpenCL and
-                                                                                             address space is
-                                                                                             local, omit
-                                                                                             vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM           - However, since LLVM
-                                                             currently has no                currently has no
-                                                             address space on                address space on
-                                                             the fence need to               the fence need to
-                                                             conservatively                  conservatively
-                                                             always generate                 always generate
-                                                             (see comment for                (see comment for
-                                                             previous fence).                previous fence).
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value
+                                                                                              with an equal or
+                                                                                              wider sync scope
+                                                                                              and memory ordering
+                                                                                              stronger than
+                                                                                              unordered (this is
+                                                                                              termed the
+                                                                                              fence-paired-atomic).
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              atomicrmw-no-return-value
+                                                                                              with an equal or
+                                                                                              wider sync scope
+                                                                                              and memory ordering
+                                                                                              stronger than
+                                                                                              unordered (this is
+                                                                                              termed the
+                                                                                              fence-paired-atomic).
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic load
+                                                                                              atomic/atomicrmw
+                                                                                              with an equal or
+                                                                                              wider sync scope
+                                                                                              and memory ordering
+                                                                                              stronger than
+                                                                                              unordered (this is
+                                                                                              termed the
+                                                                                              fence-paired-atomic).
+                                                                                            - Must happen before
+                                                                                              the following
+                                                                                              buffer_gl0_inv.
+                                                                                            - Ensures that the
+                                                                                              fence-paired atomic
+                                                                                              has completed
+                                                                                              before invalidating
+                                                                                              the
+                                                                                              cache. Therefore
+                                                                                              any following
+                                                                                              locations read must
+                                                                                              be no older than
+                                                                                              the value read by
+                                                                                              the
+                                                                                              fence-paired-atomic.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     fence        acquire      - agent        *none*     1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and                  - If OpenCL and
+                                                             address space is                 address space is
+                                                             not generic, omit                not generic, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                                                            - If OpenCL and
+                                                                                              address space is
+                                                                                              local, omit
+                                                                                              vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM            - However, since LLVM
+                                                             currently has no                 currently has no
+                                                             address space on                 address space on
+                                                             the fence need to                the fence need to
+                                                             conservatively                   conservatively
+                                                             always generate                  always generate
+                                                             (see comment for                 (see comment for
+                                                             previous fence).                 previous fence).
                                                            - Could be split into
                                                              separate s_waitcnt
                                                              vmcnt(0) and
@@ -4944,288 +4944,288 @@ agents.
                                                              the value read by
                                                              the
                                                              fence-paired-atomic.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value
-                                                                                             with an equal or
-                                                                                             wider sync scope
-                                                                                             and memory ordering
-                                                                                             stronger than
-                                                                                             unordered (this is
-                                                                                             termed the
-                                                                                             fence-paired-atomic).
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             atomicrmw-no-return-value
-                                                                                             with an equal or
-                                                                                             wider sync scope
-                                                                                             and memory ordering
-                                                                                             stronger than
-                                                                                             unordered (this is
-                                                                                             termed the
-                                                                                             fence-paired-atomic).
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic load
-                                                                                             atomic/atomicrmw
-                                                                                             with an equal or
-                                                                                             wider sync scope
-                                                                                             and memory ordering
-                                                                                             stronger than
-                                                                                             unordered (this is
-                                                                                             termed the
-                                                                                             fence-paired-atomic).
-                                                                                           - Must happen before
-                                                                                             the following
-                                                                                             buffer_gl*_inv.
-                                                                                           - Ensures that the
-                                                                                             fence-paired atomic
-                                                                                             has completed
-                                                                                             before invalidating
-                                                                                             the
-                                                                                             caches. Therefore
-                                                                                             any following
-                                                                                             locations read must
-                                                                                             be no older than
-                                                                                             the value read by
-                                                                                             the
-                                                                                             fence-paired-atomic.
-
-                                                         2. buffer_wbinvl1_vol           2. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before any        - Must happen before any
-                                                             following global/generic        following global/generic
-                                                             load/load                       load/load
-                                                             atomic/store/store              atomic/store/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following loads                 following loads
-                                                             will not see stale              will not see stale
-                                                             global data.                    global data.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value
+                                                                                              with an equal or
+                                                                                              wider sync scope
+                                                                                              and memory ordering
+                                                                                              stronger than
+                                                                                              unordered (this is
+                                                                                              termed the
+                                                                                              fence-paired-atomic).
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              atomicrmw-no-return-value
+                                                                                              with an equal or
+                                                                                              wider sync scope
+                                                                                              and memory ordering
+                                                                                              stronger than
+                                                                                              unordered (this is
+                                                                                              termed the
+                                                                                              fence-paired-atomic).
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic load
+                                                                                              atomic/atomicrmw
+                                                                                              with an equal or
+                                                                                              wider sync scope
+                                                                                              and memory ordering
+                                                                                              stronger than
+                                                                                              unordered (this is
+                                                                                              termed the
+                                                                                              fence-paired-atomic).
+                                                                                            - Must happen before
+                                                                                              the following
+                                                                                              buffer_gl*_inv.
+                                                                                            - Ensures that the
+                                                                                              fence-paired atomic
+                                                                                              has completed
+                                                                                              before invalidating
+                                                                                              the
+                                                                                              caches. Therefore
+                                                                                              any following
+                                                                                              locations read must
+                                                                                              be no older than
+                                                                                              the value read by
+                                                                                              the
+                                                                                              fence-paired-atomic.
+
+                                                         2. buffer_wbinvl1_vol            2. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before any         - Must happen before any
+                                                             following global/generic         following global/generic
+                                                             load/load                        load/load
+                                                             atomic/store/store               atomic/store/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following loads                  following loads
+                                                             will not see stale               will not see stale
+                                                             global data.                     global data.
 
      **Release Atomic**
-     ----------------------------------------------------------------------------------------------------------------------
-     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store  1. buffer/global/ds/flat_store
+     ---------------------------------------------------------------------------------------------------------------------
+     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store   1. buffer/global/ds/flat_store
                                - wavefront    - local
                                               - generic
-     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             lgkmcnt(0).
+     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store
-                                                                                             atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             store.                          store.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             store that is being             store that is being
-                                                             released.                       released.
-
-                                                         2. buffer/global_store          2. buffer/global_store
-     store atomic release      - workgroup    - local                                    1. waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - If OpenCL, omit.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0) and s_waitcnt
-                                                                                             vscnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - Must happen before
-                                                                                             the following
-                                                                                             store.
-                                                                                           - Ensures that all
-                                                                                             global memory
-                                                                                             operations have
-                                                                                             completed before
-                                                                                             performing the
-                                                                                             store that is being
-                                                                                             released.
-
-                                                         1. ds_store                     2. ds_store
-     store atomic release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             lgkmcnt(0).
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store
+                                                                                              atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             store.                           store.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             store that is being              store that is being
+                                                             released.                        released.
+
+                                                         2. buffer/global_store           2. buffer/global_store
+     store atomic release      - workgroup    - local                                     1. waitcnt vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - If OpenCL, omit.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0) and s_waitcnt
+                                                                                              vscnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - Must happen before
+                                                                                              the following
+                                                                                              store.
+                                                                                            - Ensures that all
+                                                                                              global memory
+                                                                                              operations have
+                                                                                              completed before
+                                                                                              performing the
+                                                                                              store that is being
+                                                                                              released.
+
+                                                         1. ds_store                      2. ds_store
+     store atomic release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store
-                                                                                             atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             store.                          store.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             store that is being             store that is being
-                                                             released.                       released.
-
-                                                         2. flat_store                   2. flat_store
-     store atomic release      - agent        - global   1. s_waitcnt lgkmcnt(0) &         1. s_waitcnt lgkmcnt(0) &
-                               - system       - generic     vmcnt(0)                          vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL, omit               - If OpenCL, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                           - Could be split into           - Could be split into
-                                                             separate s_waitcnt              separate s_waitcnt
-                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt vscnt(0)
-                                                             s_waitcnt                       and s_waitcnt
-                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
-                                                             them to be                      them to be
-                                                             independently moved             independently moved
-                                                             according to the                according to the
-                                                             following rules.                following rules.
-                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             global/generic                  global/generic
-                                                             load/store/load                 load/load
-                                                             atomic/store                    atomic/
-                                                             atomic/atomicrmw.               atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             local/generic                   local/generic
-                                                             load/store/load                 load/store/load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             store.                          store.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to memory have                  to memory have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             store that is being             store that is being
-                                                             released.                       released.
-
-                                                         2. buffer/global/flat_store     2. buffer/global/flat_store
-     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store
+                                                                                              atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             store.                           store.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             store that is being              store that is being
+                                                             released.                        released.
+
+                                                         2. flat_store                    2. flat_store
+     store atomic release      - agent        - global   1. s_waitcnt lgkmcnt(0) &          1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0)                           vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit                - If OpenCL, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                           - Could be split into            - Could be split into
+                                                             separate s_waitcnt               separate s_waitcnt
+                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt vscnt(0)
+                                                             s_waitcnt                        and s_waitcnt
+                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
+                                                             them to be                       them to be
+                                                             independently moved              independently moved
+                                                             according to the                 according to the
+                                                             following rules.                 following rules.
+                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             global/generic                   global/generic
+                                                             load/store/load                  load/load
+                                                             atomic/store                     atomic/
+                                                             atomic/atomicrmw.                atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             local/generic                    local/generic
+                                                             load/store/load                  load/store/load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             store.                           store.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to memory have                   to memory have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             store that is being              store that is being
+                                                             released.                        released.
+
+                                                         2. buffer/global/flat_store      2. buffer/global/flat_store
+     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic  1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
+     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
 
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
                                                            - If OpenCL, omit.
 
                                                            - Must happen after
@@ -5234,1312 +5234,1312 @@ agents.
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store
-                                                                                             atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             atomicrmw.                      atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             atomicrmw that is               atomicrmw that is
-                                                             being released.                 being released.
-
-                                                         2. buffer/global_atomic         2. buffer/global_atomic
-     atomicrmw    release      - workgroup    - local                                    1. waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - If OpenCL, omit.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0) and s_waitcnt
-                                                                                             vscnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - Must happen before
-                                                                                             the following
-                                                                                             store.
-                                                                                           - Ensures that all
-                                                                                             global memory
-                                                                                             operations have
-                                                                                             completed before
-                                                                                             performing the
-                                                                                             store that is being
-                                                                                             released.
-
-                                                         1. ds_atomic                    2. ds_atomic
-     atomicrmw    release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             waitcnt lgkmcnt(0).
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store
+                                                                                              atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             atomicrmw.                       atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             atomicrmw that is                atomicrmw that is
+                                                             being released.                  being released.
+
+                                                         2. buffer/global_atomic          2. buffer/global_atomic
+     atomicrmw    release      - workgroup    - local                                     1. waitcnt vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - If OpenCL, omit.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0) and s_waitcnt
+                                                                                              vscnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - Must happen before
+                                                                                              the following
+                                                                                              store.
+                                                                                            - Ensures that all
+                                                                                              global memory
+                                                                                              operations have
+                                                                                              completed before
+                                                                                              performing the
+                                                                                              store that is being
+                                                                                              released.
+
+                                                         1. ds_atomic                     2. ds_atomic
+     atomicrmw    release      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              waitcnt lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store
-                                                                                             atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             atomicrmw.                      atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             atomicrmw that is               atomicrmw that is
-                                                             being released.                 being released.
-
-                                                         2. flat_atomic                  2. flat_atomic
-     atomicrmw    release      - agent        - global   1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lkkmcnt(0) &
-                               - system       - generic     vmcnt(0)                         vmcnt(0) & vscnt(0)
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store
+                                                                                              atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             atomicrmw.                       atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             atomicrmw that is                atomicrmw that is
+                                                             being released.                  being released.
+
+                                                         2. flat_atomic                   2. flat_atomic
+     atomicrmw    release      - agent        - global   1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lkkmcnt(0) &
+                               - system       - generic     vmcnt(0)                          vmcnt(0) & vscnt(0)
 
-                                                           - If OpenCL, omit               - If OpenCL, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                           - Could be split into           - Could be split into
-                                                             separate s_waitcnt              separate s_waitcnt
-                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
-                                                             s_waitcnt                       vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
-                                                             them to be                      them to be
-                                                             independently moved             independently moved
-                                                             according to the                according to the
-                                                             following rules.                following rules.
-                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             global/generic                  global/generic
-                                                             load/store/load                 load/load atomic/
-                                                             atomic/store                    atomicrmw-with-return-value.
+                                                           - If OpenCL, omit                - If OpenCL, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                           - Could be split into            - Could be split into
+                                                             separate s_waitcnt               separate s_waitcnt
+                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
+                                                             s_waitcnt                        vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
+                                                             them to be                       them to be
+                                                             independently moved              independently moved
+                                                             according to the                 according to the
+                                                             following rules.                 following rules.
+                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             global/generic                   global/generic
+                                                             load/store/load                  load/load atomic/
+                                                             atomic/store                     atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             local/generic                   local/generic
-                                                             load/store/load                 load/store/load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             atomicrmw.                      atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to global and local             to global and local
-                                                             have completed                  have completed
-                                                             before performing               before performing
-                                                             the atomicrmw that              the atomicrmw that
-                                                             is being released.              is being released.
-
-                                                         2. buffer/global/flat_atomic    2. buffer/global/flat_atomic
-     fence        release      - singlethread *none*     *none*                          *none*
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             local/generic                    local/generic
+                                                             load/store/load                  load/store/load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             atomicrmw.                       atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to global and local              to global and local
+                                                             have completed                   have completed
+                                                             before performing                before performing
+                                                             the atomicrmw that               the atomicrmw that
+                                                             is being released.               is being released.
+
+                                                         2. buffer/global/flat_atomic     2. buffer/global/flat_atomic
+     fence        release      - singlethread *none*     *none*                           *none*
                                - wavefront
-     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL and                 - If OpenCL and
-                                                             address space is                address space is
-                                                             not generic, omit.              not generic, omit
-                                                                                             lgkmcnt(0).
-                                                                                           - If OpenCL and
-                                                                                             address space is
-                                                                                             local, omit
-                                                                                             vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM           - However, since LLVM
-                                                             currently has no                currently has no
-                                                             address space on                address space on
-                                                             the fence need to               the fence need to
-                                                             conservatively                  conservatively
-                                                             always generate. If             always generate. If
-                                                             fence had an                    fence had an
-                                                             address space then              address space then
-                                                             set to address                  set to address
-                                                             space of OpenCL                 space of OpenCL
-                                                             fence flag, or to               fence flag, or to
-                                                             generic if both                 generic if both
-                                                             local and global                local and global
-                                                             flags are                       flags are
-                                                             specified.                      specified.
+     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL and                  - If OpenCL and
+                                                             address space is                 address space is
+                                                             not generic, omit.               not generic, omit
+                                                                                              lgkmcnt(0).
+                                                                                            - If OpenCL and
+                                                                                              address space is
+                                                                                              local, omit
+                                                                                              vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM            - However, since LLVM
+                                                             currently has no                 currently has no
+                                                             address space on                 address space on
+                                                             the fence need to                the fence need to
+                                                             conservatively                   conservatively
+                                                             always generate. If              always generate. If
+                                                             fence had an                     fence had an
+                                                             address space then               address space then
+                                                             set to address                   set to address
+                                                             space of OpenCL                  space of OpenCL
+                                                             fence flag, or to                fence flag, or to
+                                                             generic if both                  generic if both
+                                                             local and global                 local and global
+                                                             flags are                        flags are
+                                                             specified.                       specified.
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store atomic/
-                                                                                             atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             any following store             any following store
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with an equal or                with an equal or
-                                                             wider sync scope                wider sync scope
-                                                             and memory ordering             and memory ordering
-                                                             stronger than                   stronger than
-                                                             unordered (this is              unordered (this is
-                                                             termed the                      termed the
-                                                             fence-paired-atomic).           fence-paired-atomic).
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             following                       following
-                                                             fence-paired-atomic.            fence-paired-atomic.
-
-     fence        release      - agent        *none*     1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL and                 - If OpenCL and
-                                                             address space is                address space is
-                                                             not generic, omit               not generic, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                           - If OpenCL and                 - If OpenCL and
-                                                             address space is                address space is
-                                                             local, omit                     local, omit
-                                                             vmcnt(0).                       vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM           - However, since LLVM
-                                                             currently has no                currently has no
-                                                             address space on                address space on
-                                                             the fence need to               the fence need to
-                                                             conservatively                  conservatively
-                                                             always generate. If             always generate. If
-                                                             fence had an                    fence had an
-                                                             address space then              address space then
-                                                             set to address                  set to address
-                                                             space of OpenCL                 space of OpenCL
-                                                             fence flag, or to               fence flag, or to
-                                                             generic if both                 generic if both
-                                                             local and global                local and global
-                                                             flags are                       flags are
-                                                             specified.                      specified.
-                                                           - Could be split into           - Could be split into
-                                                             separate s_waitcnt              separate s_waitcnt
-                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
-                                                             s_waitcnt                       vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
-                                                             them to be                      them to be
-                                                             independently moved             independently moved
-                                                             according to the                according to the
-                                                             following rules.                following rules.
-                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             global/generic                  global/generic
-                                                             load/store/load                 load/load atomic/
-                                                             atomic/store                    atomicrmw-with-return-value.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store atomic/
+                                                                                              atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             any following store              any following store
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with an equal or                 with an equal or
+                                                             wider sync scope                 wider sync scope
+                                                             and memory ordering              and memory ordering
+                                                             stronger than                    stronger than
+                                                             unordered (this is               unordered (this is
+                                                             termed the                       termed the
+                                                             fence-paired-atomic).            fence-paired-atomic).
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             following                        following
+                                                             fence-paired-atomic.             fence-paired-atomic.
+
+     fence        release      - agent        *none*     1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and                  - If OpenCL and
+                                                             address space is                 address space is
+                                                             not generic, omit                not generic, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                           - If OpenCL and                  - If OpenCL and
+                                                             address space is                 address space is
+                                                             local, omit                      local, omit
+                                                             vmcnt(0).                        vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM            - However, since LLVM
+                                                             currently has no                 currently has no
+                                                             address space on                 address space on
+                                                             the fence need to                the fence need to
+                                                             conservatively                   conservatively
+                                                             always generate. If              always generate. If
+                                                             fence had an                     fence had an
+                                                             address space then               address space then
+                                                             set to address                   set to address
+                                                             space of OpenCL                  space of OpenCL
+                                                             fence flag, or to                fence flag, or to
+                                                             generic if both                  generic if both
+                                                             local and global                 local and global
+                                                             flags are                        flags are
+                                                             specified.                       specified.
+                                                           - Could be split into            - Could be split into
+                                                             separate s_waitcnt               separate s_waitcnt
+                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
+                                                             s_waitcnt                        vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
+                                                             them to be                       them to be
+                                                             independently moved              independently moved
+                                                             according to the                 according to the
+                                                             following rules.                 following rules.
+                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             global/generic                   global/generic
+                                                             load/store/load                  load/load atomic/
+                                                             atomic/store                     atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             local/generic                   local/generic
-                                                             load/store/load                 load/store/load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             any following store             any following store
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with an equal or                with an equal or
-                                                             wider sync scope                wider sync scope
-                                                             and memory ordering             and memory ordering
-                                                             stronger than                   stronger than
-                                                             unordered (this is              unordered (this is
-                                                             termed the                      termed the
-                                                             fence-paired-atomic).           fence-paired-atomic).
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             have                            have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             following                       following
-                                                             fence-paired-atomic.            fence-paired-atomic.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             local/generic                    local/generic
+                                                             load/store/load                  load/store/load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             any following store              any following store
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with an equal or                 with an equal or
+                                                             wider sync scope                 wider sync scope
+                                                             and memory ordering              and memory ordering
+                                                             stronger than                    stronger than
+                                                             unordered (this is               unordered (this is
+                                                             termed the                       termed the
+                                                             fence-paired-atomic).            fence-paired-atomic).
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             have                             have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             following                        following
+                                                             fence-paired-atomic.             fence-paired-atomic.
 
      **Acquire-Release Atomic**
-     ----------------------------------------------------------------------------------------------------------------------
-     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
+     ---------------------------------------------------------------------------------------------------------------------
+     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic  1. buffer/global/ds/flat_atomic
                                - wavefront    - local
                                               - generic
-     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             s_waitcnt lgkmcnt(0).
-                                                           - Must happen after             - Must happen after
-                                                             any preceding                   any preceding
-                                                             local/generic                   local/generic
-                                                             load/store/load                 load/store/load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store
-                                                                                             atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             atomicrmw.                      atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             atomicrmw that is               atomicrmw that is
-                                                             being released.                 being released.
-
-                                                         2. buffer/global_atomic         2. buffer/global_atomic
-                                                                                         3. s_waitcnt vm/vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vm/vscnt(0).
-                                                                                           - Use vmcnt(0) if atomic with
-                                                                                             return and vscnt(0) if
-                                                                                             atomic with no-return.
-                                                                                             waitcnt lgkmcnt(0).
-                                                                                           - Must happen before
-                                                                                             the following
-                                                                                             buffer_gl0_inv.
-                                                                                           - Ensures any
-                                                                                             following global
-                                                                                             data read is no
-                                                                                             older than the
-                                                                                             atomicrmw value
-                                                                                             being acquired.
-
-                                                                                         4. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     atomicrmw    acq_rel      - workgroup    - local                                    1. waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - If OpenCL, omit.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0) and s_waitcnt
-                                                                                             vscnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - Must happen before
-                                                                                             the following
-                                                                                             store.
-                                                                                           - Ensures that all
-                                                                                             global memory
-                                                                                             operations have
-                                                                                             completed before
-                                                                                             performing the
-                                                                                             store that is being
-                                                                                             released.
-
-                                                         1. ds_atomic                    2. ds_atomic
-                                                         2. s_waitcnt lgkmcnt(0)         3. s_waitcnt lgkmcnt(0)
-
-                                                           - If OpenCL, omit.              - If OpenCL, omit.
-                                                           - Must happen before            - Must happen before
-                                                             any following                   the following
-                                                             global/generic                  buffer_gl0_inv.
+     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              s_waitcnt lgkmcnt(0).
+                                                           - Must happen after              - Must happen after
+                                                             any preceding                    any preceding
+                                                             local/generic                    local/generic
+                                                             load/store/load                  load/store/load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store
+                                                                                              atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             atomicrmw.                       atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             atomicrmw that is                atomicrmw that is
+                                                             being released.                  being released.
+
+                                                         2. buffer/global_atomic          2. buffer/global_atomic
+                                                                                          3. s_waitcnt vm/vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vm/vscnt(0).
+                                                                                            - Use vmcnt(0) if atomic with
+                                                                                              return and vscnt(0) if
+                                                                                              atomic with no-return.
+                                                                                              waitcnt lgkmcnt(0).
+                                                                                            - Must happen before
+                                                                                              the following
+                                                                                              buffer_gl0_inv.
+                                                                                            - Ensures any
+                                                                                              following global
+                                                                                              data read is no
+                                                                                              older than the
+                                                                                              atomicrmw value
+                                                                                              being acquired.
+
+                                                                                          4. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     atomicrmw    acq_rel      - workgroup    - local                                     1. waitcnt vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - If OpenCL, omit.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0) and s_waitcnt
+                                                                                              vscnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - Must happen before
+                                                                                              the following
+                                                                                              store.
+                                                                                            - Ensures that all
+                                                                                              global memory
+                                                                                              operations have
+                                                                                              completed before
+                                                                                              performing the
+                                                                                              store that is being
+                                                                                              released.
+
+                                                         1. ds_atomic                     2. ds_atomic
+                                                         2. s_waitcnt lgkmcnt(0)          3. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit.               - If OpenCL, omit.
+                                                           - Must happen before             - Must happen before
+                                                             any following                    the following
+                                                             global/generic                   buffer_gl0_inv.
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                   - Ensures any
-                                                             following global                following global
-                                                             data read is no                 data read is no
-                                                             older than the load             older than the load
-                                                             atomic value being              atomic value being
-                                                             acquired.                       acquired.
-
-                                                                                         4. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - If OpenCL omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             waitcnt lgkmcnt(0).
+                                                           - Ensures any                    - Ensures any
+                                                             following global                 following global
+                                                             data read is no                  data read is no
+                                                             older than the load              older than the load
+                                                             atomic value being               atomic value being
+                                                             acquired.                        acquired.
+
+                                                                                          4. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - If OpenCL omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              waitcnt lgkmcnt(0).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/store/load
                                                              atomic/store
                                                              atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store
-                                                                                             atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             atomicrmw.                      atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             atomicrmw that is               atomicrmw that is
-                                                             being released.                 being released.
-
-                                                         2. flat_atomic                  2. flat_atomic
-                                                         3. s_waitcnt lgkmcnt(0)         3. s_waitcnt lgkmcnt(0) &
-                                                                                            vm/vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vm/vscnt(0).
-                                                           - If OpenCL, omit.              - If OpenCL, omit
-                                                                                             waitcnt lgkmcnt(0).
-                                                           - Must happen before            - Must happen before
-                                                             any following                   the following
-                                                             global/generic                  buffer_gl0_inv.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store
+                                                                                              atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             atomicrmw.                       atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             atomicrmw that is                atomicrmw that is
+                                                             being released.                  being released.
+
+                                                         2. flat_atomic                   2. flat_atomic
+                                                         3. s_waitcnt lgkmcnt(0)          3. s_waitcnt lgkmcnt(0) &
+                                                                                             vm/vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vm/vscnt(0).
+                                                           - If OpenCL, omit.               - If OpenCL, omit
+                                                                                              waitcnt lgkmcnt(0).
+                                                           - Must happen before             - Must happen before
+                                                             any following                    the following
+                                                             global/generic                   buffer_gl0_inv.
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                           - Ensures any                   - Ensures any
-                                                             following global                following global
-                                                             data read is no                 data read is no
-                                                             older than the load             older than the load
-                                                             atomic value being              atomic value being
-                                                             acquired.                       acquired.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL, omit               - If OpenCL, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                           - Could be split into           - Could be split into
-                                                             separate s_waitcnt              separate s_waitcnt
-                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
-                                                             s_waitcnt                       vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
-                                                             them to be                      them to be
-                                                             independently moved             independently moved
-                                                             according to the                according to the
-                                                             following rules.                following rules.
-                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             global/generic                  global/generic
-                                                             load/store/load                 load/load atomic/
-                                                             atomic/store                    atomicrmw-with-return-value.
+                                                           - Ensures any                    - Ensures any
+                                                             following global                 following global
+                                                             data read is no                  data read is no
+                                                             older than the load              older than the load
+                                                             atomic value being               atomic value being
+                                                             acquired.                        acquired.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit                - If OpenCL, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                           - Could be split into            - Could be split into
+                                                             separate s_waitcnt               separate s_waitcnt
+                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
+                                                             s_waitcnt                        vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
+                                                             them to be                       them to be
+                                                             independently moved              independently moved
+                                                             according to the                 according to the
+                                                             following rules.                 following rules.
+                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             global/generic                   global/generic
+                                                             load/store/load                  load/load atomic/
+                                                             atomic/store                     atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             local/generic                   local/generic
-                                                             load/store/load                 load/store/load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             atomicrmw.                      atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to global have                  to global have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             atomicrmw that is               atomicrmw that is
-                                                             being released.                 being released.
-
-                                                         2. buffer/global_atomic         2. buffer/global_atomic
-                                                         3. s_waitcnt vmcnt(0)           3. s_waitcnt vm/vscnt(0)
-
-                                                                                           - Use vmcnt(0) if atomic with
-                                                                                             return and vscnt(0) if
-                                                                                             atomic with no-return.
-                                                                                             waitcnt lgkmcnt(0).
-                                                           - Must happen before            - Must happen before
-                                                             following                       following
-                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
-                                                           - Ensures the                   - Ensures the
-                                                             atomicrmw has                   atomicrmw has
-                                                             completed before                completed before
-                                                             invalidating the                invalidating the
-                                                             cache.                          caches.
-
-                                                         4. buffer_wbinvl1_vol           4. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following loads                 following loads
-                                                             will not see stale              will not see stale
-                                                             global data.                    global data.
-
-     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL, omit               - If OpenCL, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                           - Could be split into           - Could be split into
-                                                             separate s_waitcnt              separate s_waitcnt
-                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
-                                                             s_waitcnt                       vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
-                                                             them to be                      them to be
-                                                             independently moved             independently moved
-                                                             according to the                according to the
-                                                             following rules.                following rules.
-                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             global/generic                  global/generic
-                                                             load/store/load                 load/load atomic
-                                                             atomic/store                    atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             local/generic                    local/generic
+                                                             load/store/load                  load/store/load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             atomicrmw.                       atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to global have                   to global have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             atomicrmw that is                atomicrmw that is
+                                                             being released.                  being released.
+
+                                                         2. buffer/global_atomic          2. buffer/global_atomic
+                                                         3. s_waitcnt vmcnt(0)            3. s_waitcnt vm/vscnt(0)
+
+                                                                                            - Use vmcnt(0) if atomic with
+                                                                                              return and vscnt(0) if
+                                                                                              atomic with no-return.
+                                                                                              waitcnt lgkmcnt(0).
+                                                           - Must happen before             - Must happen before
+                                                             following                        following
+                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
+                                                           - Ensures the                    - Ensures the
+                                                             atomicrmw has                    atomicrmw has
+                                                             completed before                 completed before
+                                                             invalidating the                 invalidating the
+                                                             cache.                           caches.
+
+                                                         4. buffer_wbinvl1_vol            4. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following loads                  following loads
+                                                             will not see stale               will not see stale
+                                                             global data.                     global data.
+
+     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL, omit                - If OpenCL, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                           - Could be split into            - Could be split into
+                                                             separate s_waitcnt               separate s_waitcnt
+                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
+                                                             s_waitcnt                        vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
+                                                             them to be                       them to be
+                                                             independently moved              independently moved
+                                                             according to the                 according to the
+                                                             following rules.                 following rules.
+                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             global/generic                   global/generic
+                                                             load/store/load                  load/load atomic
+                                                             atomic/store                     atomicrmw-with-return-value.
                                                              atomic/atomicrmw.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             local/generic                   local/generic
-                                                             load/store/load                 load/store/load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             atomicrmw.                      atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to global have                  have
-                                                             completed before                completed before
-                                                             performing the                  performing the
-                                                             atomicrmw that is               atomicrmw that is
-                                                             being released.                 being released.
-
-                                                         2. flat_atomic                  2. flat_atomic
-                                                         3. s_waitcnt vmcnt(0) &         3. s_waitcnt vm/vscnt(0) &
-                                                            lgkmcnt(0)                      lgkmcnt(0)
-
-                                                           - If OpenCL, omit               - If OpenCL, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                                                           - Use vmcnt(0) if atomic with
-                                                                                             return and vscnt(0) if
-                                                                                             atomic with no-return.
-                                                           - Must happen before            - Must happen before
-                                                             following                       following
-                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
-                                                           - Ensures the                   - Ensures the
-                                                             atomicrmw has                   atomicrmw has
-                                                             completed before                completed before
-                                                             invalidating the                invalidating the
-                                                             cache.                          caches.
-
-                                                         4. buffer_wbinvl1_vol           4. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following loads                 following loads
-                                                             will not see stale              will not see stale
-                                                             global data.                    global data.
-
-     fence        acq_rel      - singlethread *none*     *none*                          *none*
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             local/generic                    local/generic
+                                                             load/store/load                  load/store/load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             atomicrmw.                       atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to global have                   have
+                                                             completed before                 completed before
+                                                             performing the                   performing the
+                                                             atomicrmw that is                atomicrmw that is
+                                                             being released.                  being released.
+
+                                                         2. flat_atomic                   2. flat_atomic
+                                                         3. s_waitcnt vmcnt(0) &          3. s_waitcnt vm/vscnt(0) &
+                                                            lgkmcnt(0)                       lgkmcnt(0)
+
+                                                           - If OpenCL, omit                - If OpenCL, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                                                            - Use vmcnt(0) if atomic with
+                                                                                              return and vscnt(0) if
+                                                                                              atomic with no-return.
+                                                           - Must happen before             - Must happen before
+                                                             following                        following
+                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
+                                                           - Ensures the                    - Ensures the
+                                                             atomicrmw has                    atomicrmw has
+                                                             completed before                 completed before
+                                                             invalidating the                 invalidating the
+                                                             cache.                           caches.
+
+                                                         4. buffer_wbinvl1_vol            4. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following loads                  following loads
+                                                             will not see stale               will not see stale
+                                                             global data.                     global data.
+
+     fence        acq_rel      - singlethread *none*     *none*                           *none*
                                - wavefront
-     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                                                                            vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                           - If OpenCL and                 - If OpenCL and
-                                                             address space is                address space is
-                                                             not generic, omit.              not generic, omit
-                                                                                             lgkmcnt(0).
-                                                                                           - If OpenCL and
-                                                                                             address space is
-                                                                                             local, omit
-                                                                                             vmcnt(0) and vscnt(0).
-                                                           - However,                      - However,
-                                                             since LLVM                      since LLVM
-                                                             currently has no                currently has no
-                                                             address space on                address space on
-                                                             the fence need to               the fence need to
-                                                             conservatively                  conservatively
-                                                             always generate                 always generate
-                                                             (see comment for                (see comment for
-                                                             previous fence).                previous fence).
+     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                                                                             vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                           - If OpenCL and                  - If OpenCL and
+                                                             address space is                 address space is
+                                                             not generic, omit.               not generic, omit
+                                                                                              lgkmcnt(0).
+                                                                                            - If OpenCL and
+                                                                                              address space is
+                                                                                              local, omit
+                                                                                              vmcnt(0) and vscnt(0).
+                                                           - However,                       - However,
+                                                             since LLVM                       since LLVM
+                                                             currently has no                 currently has no
+                                                             address space on                 address space on
+                                                             the fence need to                the fence need to
+                                                             conservatively                   conservatively
+                                                             always generate                  always generate
+                                                             (see comment for                 (see comment for
+                                                             previous fence).                 previous fence).
                                                            - Must happen after
                                                              any preceding
                                                              local/generic
                                                              load/load
                                                              atomic/store/store
                                                              atomic/atomicrmw.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - s_waitcnt vmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             load/load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                                                           - s_waitcnt lgkmcnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             local/generic
-                                                                                             load/store/load
-                                                                                             atomic/store atomic/
-                                                                                             atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/store/store              atomic/store/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that all              - Ensures that all
-                                                             memory operations               memory operations
-                                                             to local have                   have
-                                                             completed before                completed before
-                                                             performing any                  performing any
-                                                             following global                following global
-                                                             memory operations.              memory operations.
-                                                           - Ensures that the              - Ensures that the
-                                                             preceding                       preceding
-                                                             local/generic load              local/generic load
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with an equal or                with an equal or
-                                                             wider sync scope                wider sync scope
-                                                             and memory ordering             and memory ordering
-                                                             stronger than                   stronger than
-                                                             unordered (this is              unordered (this is
-                                                             termed the                      termed the
-                                                             acquire-fence-paired-atomic     acquire-fence-paired-atomic
-                                                             ) has completed                 ) has completed
-                                                             before following                before following
-                                                             global memory                   global memory
-                                                             operations. This                operations. This
-                                                             satisfies the                   satisfies the
-                                                             requirements of                 requirements of
-                                                             acquire.                        acquire.
-                                                           - Ensures that all              - Ensures that all
-                                                             previous memory                 previous memory
-                                                             operations have                 operations have
-                                                             completed before a              completed before a
-                                                             following                       following
-                                                             local/generic store             local/generic store
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with an equal or                with an equal or
-                                                             wider sync scope                wider sync scope
-                                                             and memory ordering             and memory ordering
-                                                             stronger than                   stronger than
-                                                             unordered (this is              unordered (this is
-                                                             termed the                      termed the
-                                                             release-fence-paired-atomic     release-fence-paired-atomic
-                                                             ). This satisfies the           ). This satisfies the
-                                                             requirements of                 requirements of
-                                                             release.                        release.
-                                                                                           - Must happen before
-                                                                                             the following
-                                                                                             buffer_gl0_inv.
-                                                                                           - Ensures that the
-                                                                                             acquire-fence-paired
-                                                                                             atomic has completed
-                                                                                             before invalidating
-                                                                                             the
-                                                                                             cache. Therefore
-                                                                                             any following
-                                                                                             locations read must
-                                                                                             be no older than
-                                                                                             the value read by
-                                                                                             the
-                                                                                             acquire-fence-paired-atomic.
-
-                                                                                         3. buffer_gl0_inv
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Ensures that
-                                                                                             following
-                                                                                             loads will not see
-                                                                                             stale data.
-
-     fence        acq_rel      - agent        *none*     1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
-                               - system                     vmcnt(0)                        vmcnt(0) & vscnt(0)
-
-                                                           - If OpenCL and                 - If OpenCL and
-                                                             address space is                address space is
-                                                             not generic, omit               not generic, omit
-                                                             lgkmcnt(0).                     lgkmcnt(0).
-                                                                                           - If OpenCL and
-                                                                                             address space is
-                                                                                             local, omit
-                                                                                             vmcnt(0) and vscnt(0).
-                                                           - However, since LLVM           - However, since LLVM
-                                                             currently has no                currently has no
-                                                             address space on                address space on
-                                                             the fence need to               the fence need to
-                                                             conservatively                  conservatively
-                                                             always generate                 always generate
-                                                             (see comment for                (see comment for
-                                                             previous fence).                previous fence).
-                                                           - Could be split into           - Could be split into
-                                                             separate s_waitcnt              separate s_waitcnt
-                                                             vmcnt(0) and                    vmcnt(0), s_waitcnt
-                                                             s_waitcnt                       vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
-                                                             them to be                      them to be
-                                                             independently moved             independently moved
-                                                             according to the                according to the
-                                                             following rules.                following rules.
-                                                           - s_waitcnt vmcnt(0)            - s_waitcnt vmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             global/generic                  global/generic
-                                                             load/store/load                 load/load
-                                                             atomic/store                    atomic/
-                                                             atomic/atomicrmw.               atomicrmw-with-return-value.
-                                                                                           - s_waitcnt vscnt(0)
-                                                                                             must happen after
-                                                                                             any preceding
-                                                                                             global/generic
-                                                                                             store/store atomic/
-                                                                                             atomicrmw-no-return-value.
-                                                           - s_waitcnt lgkmcnt(0)          - s_waitcnt lgkmcnt(0)
-                                                             must happen after               must happen after
-                                                             any preceding                   any preceding
-                                                             local/generic                   local/generic
-                                                             load/store/load                 load/store/load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Must happen before            - Must happen before
-                                                             the following                   the following
-                                                             buffer_wbinvl1_vol.             buffer_gl*_inv.
-                                                           - Ensures that the              - Ensures that the
-                                                             preceding                       preceding
-                                                             global/local/generic            global/local/generic
-                                                             load                            load
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with an equal or                with an equal or
-                                                             wider sync scope                wider sync scope
-                                                             and memory ordering             and memory ordering
-                                                             stronger than                   stronger than
-                                                             unordered (this is              unordered (this is
-                                                             termed the                      termed the
-                                                             acquire-fence-paired-atomic     acquire-fence-paired-atomic
-                                                             ) has completed                 ) has completed
-                                                             before invalidating             before invalidating
-                                                             the cache. This                 the caches. This
-                                                             satisfies the                   satisfies the
-                                                             requirements of                 requirements of
-                                                             acquire.                        acquire.
-                                                           - Ensures that all              - Ensures that all
-                                                             previous memory                 previous memory
-                                                             operations have                 operations have
-                                                             completed before a              completed before a
-                                                             following                       following
-                                                             global/local/generic            global/local/generic
-                                                             store                           store
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with an equal or                with an equal or
-                                                             wider sync scope                wider sync scope
-                                                             and memory ordering             and memory ordering
-                                                             stronger than                   stronger than
-                                                             unordered (this is              unordered (this is
-                                                             termed the                      termed the
-                                                             release-fence-paired-atomic     release-fence-paired-atomic
-                                                             ). This satisfies the           ). This satisfies the
-                                                             requirements of                 requirements of
-                                                             release.                        release.
-
-                                                         2. buffer_wbinvl1_vol           2. buffer_gl0_inv;
-                                                                                            buffer_gl1_inv
-
-                                                           - Must happen before            - Must happen before
-                                                             any following                   any following
-                                                             global/generic                  global/generic
-                                                             load/load                       load/load
-                                                             atomic/store/store              atomic/store/store
-                                                             atomic/atomicrmw.               atomic/atomicrmw.
-                                                           - Ensures that                  - Ensures that
-                                                             following loads                 following loads
-                                                             will not see stale              will not see stale
-                                                             global data. This               global data. This
-                                                             satisfies the                   satisfies the
-                                                             requirements of                 requirements of
-                                                             acquire.                        acquire.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - s_waitcnt vmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              load/load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                                                            - s_waitcnt lgkmcnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              local/generic
+                                                                                              load/store/load
+                                                                                              atomic/store atomic/
+                                                                                              atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/store/store               atomic/store/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that all               - Ensures that all
+                                                             memory operations                memory operations
+                                                             to local have                    have
+                                                             completed before                 completed before
+                                                             performing any                   performing any
+                                                             following global                 following global
+                                                             memory operations.               memory operations.
+                                                           - Ensures that the               - Ensures that the
+                                                             preceding                        preceding
+                                                             local/generic load               local/generic load
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with an equal or                 with an equal or
+                                                             wider sync scope                 wider sync scope
+                                                             and memory ordering              and memory ordering
+                                                             stronger than                    stronger than
+                                                             unordered (this is               unordered (this is
+                                                             termed the                       termed the
+                                                             acquire-fence-paired-atomic      acquire-fence-paired-atomic
+                                                             ) has completed                  ) has completed
+                                                             before following                 before following
+                                                             global memory                    global memory
+                                                             operations. This                 operations. This
+                                                             satisfies the                    satisfies the
+                                                             requirements of                  requirements of
+                                                             acquire.                         acquire.
+                                                           - Ensures that all               - Ensures that all
+                                                             previous memory                  previous memory
+                                                             operations have                  operations have
+                                                             completed before a               completed before a
+                                                             following                        following
+                                                             local/generic store              local/generic store
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with an equal or                 with an equal or
+                                                             wider sync scope                 wider sync scope
+                                                             and memory ordering              and memory ordering
+                                                             stronger than                    stronger than
+                                                             unordered (this is               unordered (this is
+                                                             termed the                       termed the
+                                                             release-fence-paired-atomic      release-fence-paired-atomic
+                                                             ). This satisfies the            ). This satisfies the
+                                                             requirements of                  requirements of
+                                                             release.                         release.
+                                                                                            - Must happen before
+                                                                                              the following
+                                                                                              buffer_gl0_inv.
+                                                                                            - Ensures that the
+                                                                                              acquire-fence-paired
+                                                                                              atomic has completed
+                                                                                              before invalidating
+                                                                                              the
+                                                                                              cache. Therefore
+                                                                                              any following
+                                                                                              locations read must
+                                                                                              be no older than
+                                                                                              the value read by
+                                                                                              the
+                                                                                              acquire-fence-paired-atomic.
+
+                                                                                          3. buffer_gl0_inv
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Ensures that
+                                                                                              following
+                                                                                              loads will not see
+                                                                                              stale data.
+
+     fence        acq_rel      - agent        *none*     1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
+                               - system                     vmcnt(0)                         vmcnt(0) & vscnt(0)
+
+                                                           - If OpenCL and                  - If OpenCL and
+                                                             address space is                 address space is
+                                                             not generic, omit                not generic, omit
+                                                             lgkmcnt(0).                      lgkmcnt(0).
+                                                                                            - If OpenCL and
+                                                                                              address space is
+                                                                                              local, omit
+                                                                                              vmcnt(0) and vscnt(0).
+                                                           - However, since LLVM            - However, since LLVM
+                                                             currently has no                 currently has no
+                                                             address space on                 address space on
+                                                             the fence need to                the fence need to
+                                                             conservatively                   conservatively
+                                                             always generate                  always generate
+                                                             (see comment for                 (see comment for
+                                                             previous fence).                 previous fence).
+                                                           - Could be split into            - Could be split into
+                                                             separate s_waitcnt               separate s_waitcnt
+                                                             vmcnt(0) and                     vmcnt(0), s_waitcnt
+                                                             s_waitcnt                        vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
+                                                             them to be                       them to be
+                                                             independently moved              independently moved
+                                                             according to the                 according to the
+                                                             following rules.                 following rules.
+                                                           - s_waitcnt vmcnt(0)             - s_waitcnt vmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             global/generic                   global/generic
+                                                             load/store/load                  load/load
+                                                             atomic/store                     atomic/
+                                                             atomic/atomicrmw.                atomicrmw-with-return-value.
+                                                                                            - s_waitcnt vscnt(0)
+                                                                                              must happen after
+                                                                                              any preceding
+                                                                                              global/generic
+                                                                                              store/store atomic/
+                                                                                              atomicrmw-no-return-value.
+                                                           - s_waitcnt lgkmcnt(0)           - s_waitcnt lgkmcnt(0)
+                                                             must happen after                must happen after
+                                                             any preceding                    any preceding
+                                                             local/generic                    local/generic
+                                                             load/store/load                  load/store/load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Must happen before             - Must happen before
+                                                             the following                    the following
+                                                             buffer_wbinvl1_vol.              buffer_gl*_inv.
+                                                           - Ensures that the               - Ensures that the
+                                                             preceding                        preceding
+                                                             global/local/generic             global/local/generic
+                                                             load                             load
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with an equal or                 with an equal or
+                                                             wider sync scope                 wider sync scope
+                                                             and memory ordering              and memory ordering
+                                                             stronger than                    stronger than
+                                                             unordered (this is               unordered (this is
+                                                             termed the                       termed the
+                                                             acquire-fence-paired-atomic      acquire-fence-paired-atomic
+                                                             ) has completed                  ) has completed
+                                                             before invalidating              before invalidating
+                                                             the cache. This                  the caches. This
+                                                             satisfies the                    satisfies the
+                                                             requirements of                  requirements of
+                                                             acquire.                         acquire.
+                                                           - Ensures that all               - Ensures that all
+                                                             previous memory                  previous memory
+                                                             operations have                  operations have
+                                                             completed before a               completed before a
+                                                             following                        following
+                                                             global/local/generic             global/local/generic
+                                                             store                            store
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with an equal or                 with an equal or
+                                                             wider sync scope                 wider sync scope
+                                                             and memory ordering              and memory ordering
+                                                             stronger than                    stronger than
+                                                             unordered (this is               unordered (this is
+                                                             termed the                       termed the
+                                                             release-fence-paired-atomic      release-fence-paired-atomic
+                                                             ). This satisfies the            ). This satisfies the
+                                                             requirements of                  requirements of
+                                                             release.                         release.
+
+                                                         2. buffer_wbinvl1_vol            2. buffer_gl0_inv;
+                                                                                             buffer_gl1_inv
+
+                                                           - Must happen before             - Must happen before
+                                                             any following                    any following
+                                                             global/generic                   global/generic
+                                                             load/load                        load/load
+                                                             atomic/store/store               atomic/store/store
+                                                             atomic/atomicrmw.                atomic/atomicrmw.
+                                                           - Ensures that                   - Ensures that
+                                                             following loads                  following loads
+                                                             will not see stale               will not see stale
+                                                             global data. This                global data. This
+                                                             satisfies the                    satisfies the
+                                                             requirements of                  requirements of
+                                                             acquire.                         acquire.
 
      **Sequential Consistent Atomic**
-     ----------------------------------------------------------------------------------------------------------------------
-     load atomic  seq_cst      - singlethread - global   *Same as corresponding          *Same as corresponding
-                               - wavefront    - local    load atomic acquire,            load atomic acquire,
-                                              - generic  except must generated           except must generated
-                                                         all instructions even           all instructions even
-                                                         for OpenCL.*                    for OpenCL.*
-     load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkmcnt(0)         1. s_waitcnt lgkmcnt(0) &
-                                              - generic                                     vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit vmcnt(0) and
-                                                                                             vscnt(0).
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0), s_waitcnt
-                                                                                             vscnt(0) and s_waitcnt
-                                                                                             lgkmcnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                           - Must                          - waitcnt lgkmcnt(0) must
-                                                             happen after                    happen after
-                                                             preceding                       preceding
-                                                             global/generic load             local load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with memory                     with memory
-                                                             ordering of seq_cst             ordering of seq_cst
-                                                             and with equal or               and with equal or
-                                                             wider sync scope.               wider sync scope.
-                                                             (Note that seq_cst              (Note that seq_cst
-                                                             fences have their               fences have their
-                                                             own s_waitcnt                   own s_waitcnt
-                                                             lgkmcnt(0) and so do            lgkmcnt(0) and so do
-                                                             not need to be                  not need to be
-                                                             considered.)                    considered.)
-                                                                                           - waitcnt vmcnt(0)
-                                                                                             Must happen after
-                                                                                             preceding
-                                                                                             global/generic load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value
-                                                                                             with memory
-                                                                                             ordering of seq_cst
-                                                                                             and with equal or
-                                                                                             wider sync scope.
-                                                                                             (Note that seq_cst
-                                                                                             fences have their
-                                                                                             own s_waitcnt
-                                                                                             vmcnt(0) and so do
-                                                                                             not need to be
-                                                                                             considered.)
-                                                                                           - waitcnt vscnt(0)
-                                                                                             Must happen after
-                                                                                             preceding
-                                                                                             global/generic store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value
-                                                                                             with memory
-                                                                                             ordering of seq_cst
-                                                                                             and with equal or
-                                                                                             wider sync scope.
-                                                                                             (Note that seq_cst
-                                                                                             fences have their
-                                                                                             own s_waitcnt
-                                                                                             vscnt(0) and so do
-                                                                                             not need to be
-                                                                                             considered.)
-                                                           - Ensures any                   - Ensures any
-                                                             preceding                       preceding
-                                                             sequential                      sequential
-                                                             consistent local                consistent global/local
-                                                             memory instructions             memory instructions
-                                                             have completed                  have completed
-                                                             before executing                before executing
-                                                             this sequentially               this sequentially
-                                                             consistent                      consistent
-                                                             instruction. This               instruction. This
-                                                             prevents reordering             prevents reordering
-                                                             a seq_cst store                 a seq_cst store
-                                                             followed by a                   followed by a
-                                                             seq_cst load. (Note             seq_cst load. (Note
-                                                             that seq_cst is                 that seq_cst is
-                                                             stronger than                   stronger than
-                                                             acquire/release as              acquire/release as
-                                                             the reordering of               the reordering of
-                                                             load acquire                    load acquire
-                                                             followed by a store             followed by a store
-                                                             release is                      release is
-                                                             prevented by the                prevented by the
-                                                             waitcnt of                      waitcnt of
-                                                             the release, but                the release, but
-                                                             there is nothing                there is nothing
-                                                             preventing a store              preventing a store
-                                                             release followed by             release followed by
-                                                             load acquire from               load acquire from
-                                                             completing out of               completing out of
-                                                             order. The waitcnt              order. The waitcnt
-                                                             could be placed after           could be placed after
-                                                             seq_store or before             seq_store or before
-                                                             the seq_load. We                the seq_load. We
-                                                             choose the load to              choose the load to
-                                                             make the waitcnt be             make the waitcnt be
-                                                             as late as possible             as late as possible
-                                                             so that the store               so that the store
-                                                             may have already                may have already
-                                                             completed.)                     completed.)
-
-                                                         2. *Following                   2. *Following
-                                                            instructions same as            instructions same as
-                                                            corresponding load              corresponding load
-                                                            atomic acquire,                 atomic acquire,
-                                                            except must generated           except must generated
-                                                            all instructions even           all instructions even
-                                                            for OpenCL.*                    for OpenCL.*
+     ---------------------------------------------------------------------------------------------------------------------
+     load atomic  seq_cst      - singlethread - global   *Same as corresponding           *Same as corresponding
+                               - wavefront    - local    load atomic acquire,             load atomic acquire,
+                                              - generic  except must generated            except must generated
+                                                         all instructions even            all instructions even
+                                                         for OpenCL.*                     for OpenCL.*
+     load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkmcnt(0)          1. s_waitcnt lgkmcnt(0) &
+                                              - generic                                      vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit vmcnt(0) and
+                                                                                              vscnt(0).
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0), s_waitcnt
+                                                                                              vscnt(0) and s_waitcnt
+                                                                                              lgkmcnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                           - Must                           - waitcnt lgkmcnt(0) must
+                                                             happen after                     happen after
+                                                             preceding                        preceding
+                                                             global/generic load              local load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with memory                      with memory
+                                                             ordering of seq_cst              ordering of seq_cst
+                                                             and with equal or                and with equal or
+                                                             wider sync scope.                wider sync scope.
+                                                             (Note that seq_cst               (Note that seq_cst
+                                                             fences have their                fences have their
+                                                             own s_waitcnt                    own s_waitcnt
+                                                             lgkmcnt(0) and so do             lgkmcnt(0) and so do
+                                                             not need to be                   not need to be
+                                                             considered.)                     considered.)
+                                                                                            - waitcnt vmcnt(0)
+                                                                                              Must happen after
+                                                                                              preceding
+                                                                                              global/generic load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value
+                                                                                              with memory
+                                                                                              ordering of seq_cst
+                                                                                              and with equal or
+                                                                                              wider sync scope.
+                                                                                              (Note that seq_cst
+                                                                                              fences have their
+                                                                                              own s_waitcnt
+                                                                                              vmcnt(0) and so do
+                                                                                              not need to be
+                                                                                              considered.)
+                                                                                            - waitcnt vscnt(0)
+                                                                                              Must happen after
+                                                                                              preceding
+                                                                                              global/generic store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value
+                                                                                              with memory
+                                                                                              ordering of seq_cst
+                                                                                              and with equal or
+                                                                                              wider sync scope.
+                                                                                              (Note that seq_cst
+                                                                                              fences have their
+                                                                                              own s_waitcnt
+                                                                                              vscnt(0) and so do
+                                                                                              not need to be
+                                                                                              considered.)
+                                                           - Ensures any                    - Ensures any
+                                                             preceding                        preceding
+                                                             sequential                       sequential
+                                                             consistent local                 consistent global/local
+                                                             memory instructions              memory instructions
+                                                             have completed                   have completed
+                                                             before executing                 before executing
+                                                             this sequentially                this sequentially
+                                                             consistent                       consistent
+                                                             instruction. This                instruction. This
+                                                             prevents reordering              prevents reordering
+                                                             a seq_cst store                  a seq_cst store
+                                                             followed by a                    followed by a
+                                                             seq_cst load. (Note              seq_cst load. (Note
+                                                             that seq_cst is                  that seq_cst is
+                                                             stronger than                    stronger than
+                                                             acquire/release as               acquire/release as
+                                                             the reordering of                the reordering of
+                                                             load acquire                     load acquire
+                                                             followed by a store              followed by a store
+                                                             release is                       release is
+                                                             prevented by the                 prevented by the
+                                                             waitcnt of                       waitcnt of
+                                                             the release, but                 the release, but
+                                                             there is nothing                 there is nothing
+                                                             preventing a store               preventing a store
+                                                             release followed by              release followed by
+                                                             load acquire from                load acquire from
+                                                             completing out of                completing out of
+                                                             order. The waitcnt               order. The waitcnt
+                                                             could be placed after            could be placed after
+                                                             seq_store or before              seq_store or before
+                                                             the seq_load. We                 the seq_load. We
+                                                             choose the load to               choose the load to
+                                                             make the waitcnt be              make the waitcnt be
+                                                             as late as possible              as late as possible
+                                                             so that the store                so that the store
+                                                             may have already                 may have already
+                                                             completed.)                      completed.)
+
+                                                         2. *Following                    2. *Following
+                                                            instructions same as             instructions same as
+                                                            corresponding load               corresponding load
+                                                            atomic acquire,                  atomic acquire,
+                                                            except must generated            except must generated
+                                                            all instructions even            all instructions even
+                                                            for OpenCL.*                     for OpenCL.*
      load atomic  seq_cst      - workgroup    - local    *Same as corresponding
                                                          load atomic acquire,
                                                          except must generated
                                                          all instructions even
                                                          for OpenCL.*
 
-                                                                                         1. s_waitcnt vmcnt(0) & vscnt(0)
-
-                                                                                           - If CU wavefront execution
-                                                                                             mode, omit.
-                                                                                           - Could be split into
-                                                                                             separate s_waitcnt
-                                                                                             vmcnt(0) and s_waitcnt
-                                                                                             vscnt(0) to allow
-                                                                                             them to be
-                                                                                             independently moved
-                                                                                             according to the
-                                                                                             following rules.
-                                                                                           - waitcnt vmcnt(0)
-                                                                                             Must happen after
-                                                                                             preceding
-                                                                                             global/generic load
-                                                                                             atomic/
-                                                                                             atomicrmw-with-return-value
-                                                                                             with memory
-                                                                                             ordering of seq_cst
-                                                                                             and with equal or
-                                                                                             wider sync scope.
-                                                                                             (Note that seq_cst
-                                                                                             fences have their
-                                                                                             own s_waitcnt
-                                                                                             vmcnt(0) and so do
-                                                                                             not need to be
-                                                                                             considered.)
-                                                                                           - waitcnt vscnt(0)
-                                                                                             Must happen after
-                                                                                             preceding
-                                                                                             global/generic store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value
-                                                                                             with memory
-                                                                                             ordering of seq_cst
-                                                                                             and with equal or
-                                                                                             wider sync scope.
-                                                                                             (Note that seq_cst
-                                                                                             fences have their
-                                                                                             own s_waitcnt
-                                                                                             vscnt(0) and so do
-                                                                                             not need to be
-                                                                                             considered.)
-                                                                                           - Ensures any
-                                                                                             preceding
-                                                                                             sequential
-                                                                                             consistent global
-                                                                                             memory instructions
-                                                                                             have completed
-                                                                                             before executing
-                                                                                             this sequentially
-                                                                                             consistent
-                                                                                             instruction. This
-                                                                                             prevents reordering
-                                                                                             a seq_cst store
-                                                                                             followed by a
-                                                                                             seq_cst load. (Note
-                                                                                             that seq_cst is
-                                                                                             stronger than
-                                                                                             acquire/release as
-                                                                                             the reordering of
-                                                                                             load acquire
-                                                                                             followed by a store
-                                                                                             release is
-                                                                                             prevented by the
-                                                                                             waitcnt of
-                                                                                             the release, but
-                                                                                             there is nothing
-                                                                                             preventing a store
-                                                                                             release followed by
-                                                                                             load acquire from
-                                                                                             completing out of
-                                                                                             order. The waitcnt
-                                                                                             could be placed after
-                                                                                             seq_store or before
-                                                                                             the seq_load. We
-                                                                                             choose the load to
-                                                                                             make the waitcnt be
-                                                                                             as late as possible
-                                                                                             so that the store
-                                                                                             may have already
-                                                                                             completed.)
-
-                                                                                         2. *Following
-                                                                                            instructions same as
-                                                                                            corresponding load
-                                                                                            atomic acquire,
-                                                                                            except must generated
-                                                                                            all instructions even
-                                                                                            for OpenCL.*
-
-     load atomic  seq_cst      - agent        - global   1. s_waitcnt lgkmcnt(0) &       1. s_waitcnt lgkmcnt(0) &
-                               - system       - generic     vmcnt(0)                        vmcnt(0) & vscnt(0)
-
-                                                           - Could be split into           - Could be split into
-                                                             separate s_waitcnt              separate s_waitcnt
-                                                             vmcnt(0)                        vmcnt(0), s_waitcnt
-                                                             and s_waitcnt                   vscnt(0) and s_waitcnt
-                                                             lgkmcnt(0) to allow             lgkmcnt(0) to allow
-                                                             them to be                      them to be
-                                                             independently moved             independently moved
-                                                             according to the                according to the
-                                                             following rules.                following rules.
-                                                           - waitcnt lgkmcnt(0)            - waitcnt lgkmcnt(0)
-                                                             must happen after               must happen after
-                                                             preceding                       preceding
-                                                             global/generic load             local load
-                                                             atomic/store                    atomic/store
-                                                             atomic/atomicrmw                atomic/atomicrmw
-                                                             with memory                     with memory
-                                                             ordering of seq_cst             ordering of seq_cst
-                                                             and with equal or               and with equal or
-                                                             wider sync scope.               wider sync scope.
-                                                             (Note that seq_cst              (Note that seq_cst
-                                                             fences have their               fences have their
-                                                             own s_waitcnt                   own s_waitcnt
-                                                             lgkmcnt(0) and so do            lgkmcnt(0) and so do
-                                                             not need to be                  not need to be
-                                                             considered.)                    considered.)
-                                                           - waitcnt vmcnt(0)              - waitcnt vmcnt(0)
-                                                             must happen after               must happen after
-                                                             preceding                       preceding
-                                                             global/generic load             global/generic load
-                                                             atomic/store                    atomic/
-                                                             atomic/atomicrmw                atomicrmw-with-return-value
-                                                             with memory                     with memory
-                                                             ordering of seq_cst             ordering of seq_cst
-                                                             and with equal or               and with equal or
-                                                             wider sync scope.               wider sync scope.
-                                                             (Note that seq_cst              (Note that seq_cst
-                                                             fences have their               fences have their
-                                                             own s_waitcnt                   own s_waitcnt
-                                                             vmcnt(0) and so do              vmcnt(0) and so do
-                                                             not need to be                  not need to be
-                                                             considered.)                    considered.)
-                                                                                           - waitcnt vscnt(0)
-                                                                                             Must happen after
-                                                                                             preceding
-                                                                                             global/generic store
-                                                                                             atomic/
-                                                                                             atomicrmw-no-return-value
-                                                                                             with memory
-                                                                                             ordering of seq_cst
-                                                                                             and with equal or
-                                                                                             wider sync scope.
-                                                                                             (Note that seq_cst
-                                                                                             fences have their
-                                                                                             own s_waitcnt
-                                                                                             vscnt(0) and so do
-                                                                                             not need to be
-                                                                                             considered.)
-                                                           - Ensures any                   - Ensures any
-                                                             preceding                       preceding
-                                                             sequential                      sequential
-                                                             consistent global               consistent global
-                                                             memory instructions             memory instructions
-                                                             have completed                  have completed
-                                                             before executing                before executing
-                                                             this sequentially               this sequentially
-                                                             consistent                      consistent
-                                                             instruction. This               instruction. This
-                                                             prevents reordering             prevents reordering
-                                                             a seq_cst store                 a seq_cst store
-                                                             followed by a                   followed by a
-                                                             seq_cst load. (Note             seq_cst load. (Note
-                                                             that seq_cst is                 that seq_cst is
-                                                             stronger than                   stronger than
-                                                             acquire/release as              acquire/release as
-                                                             the reordering of               the reordering of
-                                                             load acquire                    load acquire
-                                                             followed by a store             followed by a store
-                                                             release is                      release is
-                                                             prevented by the                prevented by the
-                                                             waitcnt of                      waitcnt of
-                                                             the release, but                the release, but
-                                                             there is nothing                there is nothing
-                                                             preventing a store              preventing a store
-                                                             release followed by             release followed by
-                                                             load acquire from               load acquire from
-                                                             completing out of               completing out of
-                                                             order. The waitcnt              order. The waitcnt
-                                                             could be placed after           could be placed after
-                                                             seq_store or before             seq_store or before
-                                                             the seq_load. We                the seq_load. We
-                                                             choose the load to              choose the load to
-                                                             make the waitcnt be             make the waitcnt be
-                                                             as late as possible             as late as possible
-                                                             so that the store               so that the store
-                                                             may have already                may have already
-                                                             completed.)                     completed.)
-
-                                                         2. *Following                   2. *Following
-                                                            instructions same as            instructions same as
-                                                            corresponding load              corresponding load
-                                                            atomic acquire,                 atomic acquire,
-                                                            except must generated           except must generated
-                                                            all instructions even           all instructions even
-                                                            for OpenCL.*                    for OpenCL.*
-     store atomic seq_cst      - singlethread - global   *Same as corresponding          *Same as corresponding
-                               - wavefront    - local    store atomic release,           store atomic release,
-                               - workgroup    - generic  except must generated           except must generated
-                                                         all instructions even           all instructions even
-                                                         for OpenCL.*                    for OpenCL.*
-     store atomic seq_cst      - agent        - global   *Same as corresponding          *Same as corresponding
-                               - system       - generic  store atomic release,           store atomic release,
-                                                         except must generated           except must generated
-                                                         all instructions even           all instructions even
-                                                         for OpenCL.*                    for OpenCL.*
-     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding          *Same as corresponding
-                               - wavefront    - local    atomicrmw acq_rel,              atomicrmw acq_rel,
-                               - workgroup    - generic  except must generated           except must generated
-                                                         all instructions even           all instructions even
-                                                         for OpenCL.*                    for OpenCL.*
-     atomicrmw    seq_cst      - agent        - global   *Same as corresponding          *Same as corresponding
-                               - system       - generic  atomicrmw acq_rel,              atomicrmw acq_rel,
-                                                         except must generated           except must generated
-                                                         all instructions even           all instructions even
-                                                         for OpenCL.*                    for OpenCL.*
-     fence        seq_cst      - singlethread *none*     *Same as corresponding          *Same as corresponding
-                               - wavefront               fence acq_rel,                  fence acq_rel,
-                               - workgroup               except must generated           except must generated
-                               - agent                   all instructions even           all instructions even
-                               - system                  for OpenCL.*                    for OpenCL.*
-     ============ ============ ============== ========== =============================== ==================================
+                                                                                          1. s_waitcnt vmcnt(0) & vscnt(0)
+
+                                                                                            - If CU wavefront execution
+                                                                                              mode, omit.
+                                                                                            - Could be split into
+                                                                                              separate s_waitcnt
+                                                                                              vmcnt(0) and s_waitcnt
+                                                                                              vscnt(0) to allow
+                                                                                              them to be
+                                                                                              independently moved
+                                                                                              according to the
+                                                                                              following rules.
+                                                                                            - waitcnt vmcnt(0)
+                                                                                              Must happen after
+                                                                                              preceding
+                                                                                              global/generic load
+                                                                                              atomic/
+                                                                                              atomicrmw-with-return-value
+                                                                                              with memory
+                                                                                              ordering of seq_cst
+                                                                                              and with equal or
+                                                                                              wider sync scope.
+                                                                                              (Note that seq_cst
+                                                                                              fences have their
+                                                                                              own s_waitcnt
+                                                                                              vmcnt(0) and so do
+                                                                                              not need to be
+                                                                                              considered.)
+                                                                                            - waitcnt vscnt(0)
+                                                                                              Must happen after
+                                                                                              preceding
+                                                                                              global/generic store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value
+                                                                                              with memory
+                                                                                              ordering of seq_cst
+                                                                                              and with equal or
+                                                                                              wider sync scope.
+                                                                                              (Note that seq_cst
+                                                                                              fences have their
+                                                                                              own s_waitcnt
+                                                                                              vscnt(0) and so do
+                                                                                              not need to be
+                                                                                              considered.)
+                                                                                            - Ensures any
+                                                                                              preceding
+                                                                                              sequential
+                                                                                              consistent global
+                                                                                              memory instructions
+                                                                                              have completed
+                                                                                              before executing
+                                                                                              this sequentially
+                                                                                              consistent
+                                                                                              instruction. This
+                                                                                              prevents reordering
+                                                                                              a seq_cst store
+                                                                                              followed by a
+                                                                                              seq_cst load. (Note
+                                                                                              that seq_cst is
+                                                                                              stronger than
+                                                                                              acquire/release as
+                                                                                              the reordering of
+                                                                                              load acquire
+                                                                                              followed by a store
+                                                                                              release is
+                                                                                              prevented by the
+                                                                                              waitcnt of
+                                                                                              the release, but
+                                                                                              there is nothing
+                                                                                              preventing a store
+                                                                                              release followed by
+                                                                                              load acquire from
+                                                                                              completing out of
+                                                                                              order. The waitcnt
+                                                                                              could be placed after
+                                                                                              seq_store or before
+                                                                                              the seq_load. We
+                                                                                              choose the load to
+                                                                                              make the waitcnt be
+                                                                                              as late as possible
+                                                                                              so that the store
+                                                                                              may have already
+                                                                                              completed.)
+
+                                                                                          2. *Following
+                                                                                             instructions same as
+                                                                                             corresponding load
+                                                                                             atomic acquire,
+                                                                                             except must generated
+                                                                                             all instructions even
+                                                                                             for OpenCL.*
+
+     load atomic  seq_cst      - agent        - global   1. s_waitcnt lgkmcnt(0) &        1. s_waitcnt lgkmcnt(0) &
+                               - system       - generic     vmcnt(0)                         vmcnt(0) & vscnt(0)
+
+                                                           - Could be split into            - Could be split into
+                                                             separate s_waitcnt               separate s_waitcnt
+                                                             vmcnt(0)                         vmcnt(0), s_waitcnt
+                                                             and s_waitcnt                    vscnt(0) and s_waitcnt
+                                                             lgkmcnt(0) to allow              lgkmcnt(0) to allow
+                                                             them to be                       them to be
+                                                             independently moved              independently moved
+                                                             according to the                 according to the
+                                                             following rules.                 following rules.
+                                                           - waitcnt lgkmcnt(0)             - waitcnt lgkmcnt(0)
+                                                             must happen after                must happen after
+                                                             preceding                        preceding
+                                                             global/generic load              local load
+                                                             atomic/store                     atomic/store
+                                                             atomic/atomicrmw                 atomic/atomicrmw
+                                                             with memory                      with memory
+                                                             ordering of seq_cst              ordering of seq_cst
+                                                             and with equal or                and with equal or
+                                                             wider sync scope.                wider sync scope.
+                                                             (Note that seq_cst               (Note that seq_cst
+                                                             fences have their                fences have their
+                                                             own s_waitcnt                    own s_waitcnt
+                                                             lgkmcnt(0) and so do             lgkmcnt(0) and so do
+                                                             not need to be                   not need to be
+                                                             considered.)                     considered.)
+                                                           - waitcnt vmcnt(0)               - waitcnt vmcnt(0)
+                                                             must happen after                must happen after
+                                                             preceding                        preceding
+                                                             global/generic load              global/generic load
+                                                             atomic/store                     atomic/
+                                                             atomic/atomicrmw                 atomicrmw-with-return-value
+                                                             with memory                      with memory
+                                                             ordering of seq_cst              ordering of seq_cst
+                                                             and with equal or                and with equal or
+                                                             wider sync scope.                wider sync scope.
+                                                             (Note that seq_cst               (Note that seq_cst
+                                                             fences have their                fences have their
+                                                             own s_waitcnt                    own s_waitcnt
+                                                             vmcnt(0) and so do               vmcnt(0) and so do
+                                                             not need to be                   not need to be
+                                                             considered.)                     considered.)
+                                                                                            - waitcnt vscnt(0)
+                                                                                              Must happen after
+                                                                                              preceding
+                                                                                              global/generic store
+                                                                                              atomic/
+                                                                                              atomicrmw-no-return-value
+                                                                                              with memory
+                                                                                              ordering of seq_cst
+                                                                                              and with equal or
+                                                                                              wider sync scope.
+                                                                                              (Note that seq_cst
+                                                                                              fences have their
+                                                                                              own s_waitcnt
+                                                                                              vscnt(0) and so do
+                                                                                              not need to be
+                                                                                              considered.)
+                                                           - Ensures any                    - Ensures any
+                                                             preceding                        preceding
+                                                             sequential                       sequential
+                                                             consistent global                consistent global
+                                                             memory instructions              memory instructions
+                                                             have completed                   have completed
+                                                             before executing                 before executing
+                                                             this sequentially                this sequentially
+                                                             consistent                       consistent
+                                                             instruction. This                instruction. This
+                                                             prevents reordering              prevents reordering
+                                                             a seq_cst store                  a seq_cst store
+                                                             followed by a                    followed by a
+                                                             seq_cst load. (Note              seq_cst load. (Note
+                                                             that seq_cst is                  that seq_cst is
+                                                             stronger than                    stronger than
+                                                             acquire/release as               acquire/release as
+                                                             the reordering of                the reordering of
+                                                             load acquire                     load acquire
+                                                             followed by a store              followed by a store
+                                                             release is                       release is
+                                                             prevented by the                 prevented by the
+                                                             waitcnt of                       waitcnt of
+                                                             the release, but                 the release, but
+                                                             there is nothing                 there is nothing
+                                                             preventing a store               preventing a store
+                                                             release followed by              release followed by
+                                                             load acquire from                load acquire from
+                                                             completing out of                completing out of
+                                                             order. The waitcnt               order. The waitcnt
+                                                             could be placed after            could be placed after
+                                                             seq_store or before              seq_store or before
+                                                             the seq_load. We                 the seq_load. We
+                                                             choose the load to               choose the load to
+                                                             make the waitcnt be              make the waitcnt be
+                                                             as late as possible              as late as possible
+                                                             so that the store                so that the store
+                                                             may have already                 may have already
+                                                             completed.)                      completed.)
+
+                                                         2. *Following                    2. *Following
+                                                            instructions same as             instructions same as
+                                                            corresponding load               corresponding load
+                                                            atomic acquire,                  atomic acquire,
+                                                            except must generated            except must generated
+                                                            all instructions even            all instructions even
+                                                            for OpenCL.*                     for OpenCL.*
+     store atomic seq_cst      - singlethread - global   *Same as corresponding           *Same as corresponding
+                               - wavefront    - local    store atomic release,            store atomic release,
+                               - workgroup    - generic  except must generated            except must generated
+                                                         all instructions even            all instructions even
+                                                         for OpenCL.*                     for OpenCL.*
+     store atomic seq_cst      - agent        - global   *Same as corresponding           *Same as corresponding
+                               - system       - generic  store atomic release,            store atomic release,
+                                                         except must generated            except must generated
+                                                         all instructions even            all instructions even
+                                                         for OpenCL.*                     for OpenCL.*
+     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding           *Same as corresponding
+                               - wavefront    - local    atomicrmw acq_rel,               atomicrmw acq_rel,
+                               - workgroup    - generic  except must generated            except must generated
+                                                         all instructions even            all instructions even
+                                                         for OpenCL.*                     for OpenCL.*
+     atomicrmw    seq_cst      - agent        - global   *Same as corresponding           *Same as corresponding
+                               - system       - generic  atomicrmw acq_rel,               atomicrmw acq_rel,
+                                                         except must generated            except must generated
+                                                         all instructions even            all instructions even
+                                                         for OpenCL.*                     for OpenCL.*
+     fence        seq_cst      - singlethread *none*     *Same as corresponding           *Same as corresponding
+                               - wavefront               fence acq_rel,                   fence acq_rel,
+                               - workgroup               except must generated            except must generated
+                               - agent                   all instructions even            all instructions even
+                               - system                  for OpenCL.*                     for OpenCL.*
+     ============ ============ ============== ========== ================================ ================================
 
 The memory order also adds the single thread optimization constrains defined in
 table


        


More information about the llvm-commits mailing list