[llvm] 3a37d08 - [AMDGPU] Correct gfx940 memory model documentation.
Stanislav Mekhanoshin via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 16 12:02:42 PDT 2022
Author: Stanislav Mekhanoshin
Date: 2022-03-16T11:59:40-07:00
New Revision: 3a37d08b3521e90f0f0619dce74afb1527bba58a
URL: https://github.com/llvm/llvm-project/commit/3a37d08b3521e90f0f0619dce74afb1527bba58a
DIFF: https://github.com/llvm/llvm-project/commit/3a37d08b3521e90f0f0619dce74afb1527bba58a.diff
LOG: [AMDGPU] Correct gfx940 memory model documentation.
Differential Revision: https://reviews.llvm.org/D121397
Added:
Modified:
llvm/docs/AMDGPUUsage.rst
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 539cda827a8d6..ae567eab59360 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -8712,12 +8712,17 @@ For GFX940:
work-group since they execute on the same CU. The exception is when in
tgsplit execution mode as wavefronts of the same work-group can be in
diff erent CUs and so a ``buffer_inv sc0`` is required which will invalidate
- the L1 cache is in tgsplit mode.
+ the L1 cache.
- * A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
+ * A ``buffer_inv sc0`` is required to invalidate the L1 cache for coherence
between wavefronts executing in
diff erent work-groups as they may be
executing on
diff erent CUs.
+ * Atomic read-modify-write instructions implicitly bypass the L1 cache.
+ Therefore, they do not use the sc0 bit for coherence and instead use it to
+ indicate if the instruction returns the original value being updated. They
+ do use sc1 to indicate system or agent scope coherence.
+
* The scalar memory operations access a scalar L1 cache shared by all wavefronts
on a group of CUs. The scalar and vector L1 caches are not coherent. However,
scalar operations are used in a restricted way so do not impact the memory
@@ -8891,8 +8896,6 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
- generic sc0=1 sc1=1
store atomic monotonic - singlethread - global 1. buffer/global/flat_store
- wavefront - generic
- store atomic monotonic - singlethread - global 1. buffer/global/flat_store
- - wavefront - generic
store atomic monotonic - workgroup - global 1. buffer/global/flat_store
- generic sc0=1
store atomic monotonic - agent - global 1. buffer/global/flat_store
@@ -9639,7 +9642,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
store that is being
released.
- 3. buffer/global/flat_store sc1=1
+ 3. buffer/global/flat_store sc1=1
store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
- generic
- Must happen before
@@ -9694,7 +9697,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
store that is being
released.
- 2. buffer/global/flat_store
+ 3. buffer/global/flat_store
sc0=1 sc1=1
atomicrmw release - singlethread - global 1. buffer/global/flat_atomic
- wavefront - generic
@@ -10878,7 +10881,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
------------------------------------------------------------------------------------
load atomic seq_cst - singlethread - global *Same as corresponding
- wavefront - local load atomic acquire,
- - generic except must generated
+ - generic except must generate
all instructions even
for OpenCL.*
load atomic seq_cst - workgroup - global 1. s_waitcnt lgkm/vmcnt(0)
@@ -10963,7 +10966,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
instructions same as
corresponding load
atomic acquire,
- except must generated
+ except must generate
all instructions even
for OpenCL.*
load atomic seq_cst - workgroup - local *If TgSplit execution mode,
@@ -10972,7 +10975,7 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
*Same as corresponding
load atomic acquire,
- except must generated
+ except must generate
all instructions even
for OpenCL.*
@@ -11066,22 +11069,22 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-table`.
instructions same as
corresponding load
atomic acquire,
- except must generated
+ except must generate
all instructions even
for OpenCL.*
store atomic seq_cst - singlethread - global *Same as corresponding
- wavefront - local store atomic release,
- - workgroup - generic except must generated
+ - workgroup - generic except must generate
- agent all instructions even
- system for OpenCL.*
atomicrmw seq_cst - singlethread - global *Same as corresponding
- wavefront - local atomicrmw acq_rel,
- - workgroup - generic except must generated
+ - workgroup - generic except must generate
- agent all instructions even
- system for OpenCL.*
fence seq_cst - singlethread *none* *Same as corresponding
- wavefront fence acq_rel,
- - workgroup except must generated
+ - workgroup except must generate
- agent all instructions even
- system for OpenCL.*
============ ============ ============== ========== ================================
More information about the llvm-commits
mailing list