[llvm] 3a37d08 - [AMDGPU] Correct gfx940 memory model documentation.

Wed Mar 16 12:02:42 PDT 2022

Author: Stanislav Mekhanoshin
Date: 2022-03-16T11:59:40-07:00
New Revision: 3a37d08b3521e90f0f0619dce74afb1527bba58a

URL: https://github.com/llvm/llvm-project/commit/3a37d08b3521e90f0f0619dce74afb1527bba58a
DIFF: https://github.com/llvm/llvm-project/commit/3a37d08b3521e90f0f0619dce74afb1527bba58a.diff

LOG: [AMDGPU] Correct gfx940 memory model documentation.

Differential Revision: https://reviews.llvm.org/D121397

Added: 
    

Modified: 
    llvm/docs/AMDGPUUsage.rst

Removed: 
    


################################################################################
diff  --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 539cda827a8d6..ae567eab59360 100644

--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -8712,12 +8712,17 @@ For GFX940:
     work-group since they execute on the same CU. The exception is when in
     tgsplit execution mode as wavefronts of the same work-group can be in
     
diff erent CUs and so a ``buffer_inv sc0`` is required which will invalidate
-    the L1 cache is in tgsplit mode.
+    the L1 cache.
 
-  * A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
+  * A ``buffer_inv sc0`` is required to invalidate the L1 cache for coherence
     between wavefronts executing in 
diff erent work-groups as they may be
     executing on 
diff erent CUs.
 
+  * Atomic read-modify-write instructions implicitly bypass the L1 cache.
+    Therefore, they do not use the sc0 bit for coherence and instead use it to
+    indicate if the instruction returns the original value being updated. They
+    do use sc1 to indicate system or agent scope coherence.
+
 * The scalar memory operations access a scalar L1 cache shared by all wavefronts
   on a group of CUs. The scalar and vector L1 caches are not coherent. However,
   scalar operations are used in a restricted way so do not impact the memory
@@ -8891,8 +8896,6 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
                                               - generic     sc0=1 sc1=1
      store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
                                - wavefront    - generic
-     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
-                               - wavefront    - generic
      store atomic monotonic    - workgroup    - global   1. buffer/global/flat_store
                                               - generic     sc0=1
      store atomic monotonic    - agent        - global   1. buffer/global/flat_store
@@ -9639,7 +9642,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
                                                              store that is being
                                                              released.
 
-                                                         3. buffer/global/flat_store  sc1=1
+                                                         3. buffer/global/flat_store sc1=1
      store atomic release      - system       - global   1. buffer_wbl2 sc0=1 sc1=1
                                               - generic
                                                            - Must happen before
@@ -9694,7 +9697,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
                                                              store that is being
                                                              released.
 
-                                                         2. buffer/global/flat_store
+                                                         3. buffer/global/flat_store
                                                             sc0=1 sc1=1
      atomicrmw    release      - singlethread - global   1. buffer/global/flat_atomic
                                - wavefront    - generic
@@ -10878,7 +10881,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
      ------------------------------------------------------------------------------------
      load atomic  seq_cst      - singlethread - global   *Same as corresponding
                                - wavefront    - local    load atomic acquire,
-                                              - generic  except must generated
+                                              - generic  except must generate
                                                          all instructions even
                                                          for OpenCL.*
      load atomic  seq_cst      - workgroup    - global   1. s_waitcnt lgkm/vmcnt(0)
@@ -10963,7 +10966,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
                                                             instructions same as
                                                             corresponding load
                                                             atomic acquire,
-                                                            except must generated
+                                                            except must generate
                                                             all instructions even
                                                             for OpenCL.*
      load atomic  seq_cst      - workgroup    - local    *If TgSplit execution mode,
@@ -10972,7 +10975,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
 
                                                          *Same as corresponding
                                                          load atomic acquire,
-                                                         except must generated
+                                                         except must generate
                                                          all instructions even
                                                          for OpenCL.*
 
@@ -11066,22 +11069,22 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
                                                             instructions same as
                                                             corresponding load
                                                             atomic acquire,
-                                                            except must generated
+                                                            except must generate
                                                             all instructions even
                                                             for OpenCL.*
      store atomic seq_cst      - singlethread - global   *Same as corresponding
                                - wavefront    - local    store atomic release,
-                               - workgroup    - generic  except must generated
+                               - workgroup    - generic  except must generate
                                - agent                   all instructions even
                                - system                  for OpenCL.*
      atomicrmw    seq_cst      - singlethread - global   *Same as corresponding
                                - wavefront    - local    atomicrmw acq_rel,
-                               - workgroup    - generic  except must generated
+                               - workgroup    - generic  except must generate
                                - agent                   all instructions even
                                - system                  for OpenCL.*
      fence        seq_cst      - singlethread *none*     *Same as corresponding
                                - wavefront               fence acq_rel,
-                               - workgroup               except must generated
+                               - workgroup               except must generate
                                - agent                   all instructions even
                                - system                  for OpenCL.*
      ============ ============ ============== ========== ================================