[llvm] fb37943 - [AMDGPU] Update Memory Model in AMDGPUUsage.rst
Scott Linder via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 29 16:10:28 PDT 2020
Author: Scott Linder
Date: 2020-10-29T23:07:03Z
New Revision: fb37943cc8be4d5eb1cf7adf4f0b99afcd3f2904
URL: https://github.com/llvm/llvm-project/commit/fb37943cc8be4d5eb1cf7adf4f0b99afcd3f2904
DIFF: https://github.com/llvm/llvm-project/commit/fb37943cc8be4d5eb1cf7adf4f0b99afcd3f2904.diff
LOG: [AMDGPU] Update Memory Model in AMDGPUUsage.rst
Mostly NFC, but some changes are "bug fixes" rather than just e.g.
formatting changes or typo corrections.
- Fix typo "competing" -> "completing".
- Document why waintcnt is added to stores and not loads for
sequentially consistent ordering.
- Lowercase some mentions of `buffer_gl{0,1}_inv`.
- Make mentions of `*cnt(0)` consistently include the `(0)` count.
- Remove some mentions of instructions for incorrect address spaces. For
example, remove mention of `flat_load` from
`load atomic acquire workgroup global`.
- Re-flow some text to get all the target columns to fit in a
32-character wide column. Makes a future NFC patch to make these columns
both 32-character wide more straightforward.
Modified cherry-pick of patch by Tony Tye
Reviewed By: t-tye
Differential Revision: https://reviews.llvm.org/D89596
Added:
Modified:
llvm/docs/AMDGPUUsage.rst
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 2770a7432244..5a06a013da52 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4264,9 +4264,9 @@ For GFX10:
* The vector memory operations access a vector L0 cache. There is a single L0
cache per CU. Each SIMD of a CU accesses the same L0 cache. Therefore, no
special action is required for coherence between the lanes of a single
- wavefront. However, a ``BUFFER_GL0_INV`` is required for coherence between
+ wavefront. However, a ``buffer_gl0_inv`` is required for coherence between
wavefronts executing in the same work-group as they may be executing on SIMDs
- of
diff erent CUs that access
diff erent L0s. A ``BUFFER_GL0_INV`` is also
+ of
diff erent CUs that access
diff erent L0s. A ``buffer_gl0_inv`` is also
required for coherence between wavefronts executing in
diff erent work-groups
as they may be executing on
diff erent WGPs.
* The scalar memory operations access a scalar L0 cache shared by all wavefronts
@@ -4275,7 +4275,7 @@ For GFX10:
:ref:`amdgpu-amdhsa-memory-spaces`.
* The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on
the same SA. Therefore, no special action is required for coherence between
- the wavefronts of a single work-group. However, a ``BUFFER_GL1_INV`` is
+ the wavefronts of a single work-group. However, a ``buffer_gl1_inv`` is
required for coherence between wavefronts executing in
diff erent work-groups
as they may be executing on
diff erent SAs that access
diff erent L1s.
* The L1 caches have independent quadrants to service disjoint ranges of virtual
@@ -4437,7 +4437,8 @@ agents.
load atomic monotonic - workgroup - global 1. buffer/global/flat_load 1. buffer/global/flat_load
- generic glc=1
- - If CU wavefront execution mode, omit glc=1.
+ - If CU wavefront execution
+ mode, omit glc=1.
load atomic monotonic - singlethread - local 1. ds_load 1. ds_load
- wavefront
@@ -4465,13 +4466,15 @@ agents.
load atomic acquire - singlethread - global 1. buffer/global/ds/flat_load 1. buffer/global/ds/flat_load
- wavefront - local
- generic
- load atomic acquire - workgroup - global 1. buffer/global/flat_load 1. buffer/global_load glc=1
+ load atomic acquire - workgroup - global 1. buffer/global_load 1. buffer/global_load glc=1
- - If CU wavefront execution mode, omit glc=1.
+ - If CU wavefront execution
+ mode, omit glc=1.
2. s_waitcnt vmcnt(0)
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Must happen before
the following buffer_gl0_inv
and before any following
@@ -4482,7 +4485,8 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
@@ -4507,7 +4511,8 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- If OpenCL, omit.
- Ensures that
following
@@ -4516,12 +4521,14 @@ agents.
load atomic acquire - workgroup - generic 1. flat_load 1. flat_load glc=1
- - If CU wavefront execution mode, omit glc=1.
+ - If CU wavefront execution
+ mode, omit glc=1.
2. s_waitcnt lgkmcnt(0) 2. s_waitcnt lgkmcnt(0) &
vmcnt(0)
- - If CU wavefront execution mode, omit vmcnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0).
- If OpenCL, omit. - If OpenCL, omit
lgkmcnt(0).
- Must happen before - Must happen before
@@ -4540,13 +4547,14 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
stale data.
- load atomic acquire - agent - global 1. buffer/global/flat_load 1. buffer/global_load
+ load atomic acquire - agent - global 1. buffer/global_load 1. buffer/global_load
- system glc=1 glc=1 dlc=1
2. s_waitcnt vmcnt(0) 2. s_waitcnt vmcnt(0)
@@ -4601,13 +4609,14 @@ agents.
atomicrmw acquire - singlethread - global 1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
- wavefront - local
- generic
- atomicrmw acquire - workgroup - global 1. buffer/global/flat_atomic 1. buffer/global_atomic
+ atomicrmw acquire - workgroup - global 1. buffer/global_atomic 1. buffer/global_atomic
2. s_waitcnt vm/vscnt(0)
- - If CU wavefront execution mode, omit.
- - Use vmcnt if atomic with
- return and vscnt if atomic
- with no-return.
+ - If CU wavefront execution
+ mode, omit.
+ - Use vmcnt(0) if atomic with
+ return and vscnt(0) if
+ atomic with no-return.
- Must happen before
the following buffer_gl0_inv
and before any following
@@ -4618,7 +4627,8 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
@@ -4653,13 +4663,13 @@ agents.
2. waitcnt lgkmcnt(0) 2. waitcnt lgkmcnt(0) &
vm/vscnt(0)
- - If CU wavefront execution mode, omit vm/vscnt.
+ - If CU wavefront execution
+ mode, omit vm/vscnt(0).
- If OpenCL, omit. - If OpenCL, omit
- waitcnt lgkmcnt(0)..
- - Use vmcnt if atomic with
- return and vscnt if atomic
- with no-return.
waitcnt lgkmcnt(0).
+ - Use vmcnt(0) if atomic with
+ return and vscnt(0) if
+ atomic with no-return.
- Must happen before - Must happen before
any following the following
global/generic buffer_gl0_inv.
@@ -4675,18 +4685,19 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
stale data.
- atomicrmw acquire - agent - global 1. buffer/global/flat_atomic 1. buffer/global_atomic
+ atomicrmw acquire - agent - global 1. buffer/global_atomic 1. buffer/global_atomic
- system 2. s_waitcnt vmcnt(0) 2. s_waitcnt vm/vscnt(0)
- - Use vmcnt if atomic with
- return and vscnt if atomic
- with no-return.
+ - Use vmcnt(0) if atomic with
+ return and vscnt(0) if
+ atomic with no-return.
waitcnt lgkmcnt(0).
- Must happen before - Must happen before
following following
@@ -4716,9 +4727,9 @@ agents.
- If OpenCL, omit - If OpenCL, omit
lgkmcnt(0). lgkmcnt(0).
- - Use vmcnt if atomic with
- return and vscnt if atomic
- with no-return.
+ - Use vmcnt(0) if atomic with
+ return and vscnt(0) if
+ atomic with no-return.
- Must happen before - Must happen before
following following
buffer_wbinvl1_vol. buffer_gl*_inv.
@@ -4746,8 +4757,9 @@ agents.
fence acquire - workgroup *none* 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL and - If OpenCL and
address space is address space is
not generic, omit. not generic, omit
@@ -4858,7 +4870,8 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
@@ -5014,8 +5027,9 @@ agents.
store atomic release - workgroup - global 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL, omit. - If OpenCL, omit
lgkmcnt(0).
- Must happen after
@@ -5064,10 +5078,11 @@ agents.
store that is being store that is being
released. released.
- 2. buffer/global/flat_store 2. buffer/global_store
+ 2. buffer/global_store 2. buffer/global_store
store atomic release - workgroup - local 1. waitcnt vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- If OpenCL, omit.
- Could be split into
separate s_waitcnt
@@ -5104,8 +5119,9 @@ agents.
store atomic release - workgroup - generic 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL, omit. - If OpenCL, omit
lgkmcnt(0).
- Must happen after
@@ -5139,8 +5155,10 @@ agents.
- s_waitcnt lgkmcnt(0)
must happen after
any preceding
- local/generic load/store/load
- atomic/store atomic/atomicrmw.
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
- Must happen before - Must happen before
the following the following
store. store.
@@ -5198,15 +5216,16 @@ agents.
store that is being store that is being
released. released.
- 2. buffer/global/ds/flat_store 2. buffer/global/ds/flat_store
+ 2. buffer/global/flat_store 2. buffer/global/flat_store
atomicrmw release - singlethread - global 1. buffer/global/ds/flat_atomic 1. buffer/global/ds/flat_atomic
- wavefront - local
- generic
atomicrmw release - workgroup - global 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL, omit.
- Must happen after
@@ -5255,10 +5274,11 @@ agents.
atomicrmw that is atomicrmw that is
being released. being released.
- 2. buffer/global/flat_atomic 2. buffer/global_atomic
+ 2. buffer/global_atomic 2. buffer/global_atomic
atomicrmw release - workgroup - local 1. waitcnt vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- If OpenCL, omit.
- Could be split into
separate s_waitcnt
@@ -5295,8 +5315,9 @@ agents.
atomicrmw release - workgroup - generic 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL, omit. - If OpenCL, omit
waitcnt lgkmcnt(0).
- Must happen after
@@ -5330,8 +5351,10 @@ agents.
- s_waitcnt lgkmcnt(0)
must happen after
any preceding
- local/generic load/store/load
- atomic/store atomic/atomicrmw.
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
- Must happen before - Must happen before
the following the following
atomicrmw. atomicrmw.
@@ -5389,14 +5412,15 @@ agents.
the atomicrmw that the atomicrmw that
is being released. is being released.
- 2. buffer/global/ds/flat_atomic 2. buffer/global/ds/flat_atomic
+ 2. buffer/global/flat_atomic 2. buffer/global/flat_atomic
fence release - singlethread *none* *none* *none*
- wavefront
fence release - workgroup *none* 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL and - If OpenCL and
address space is address space is
not generic, omit. not generic, omit
@@ -5554,8 +5578,9 @@ agents.
atomicrmw acq_rel - workgroup - global 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL, omit. - If OpenCL, omit
s_waitcnt lgkmcnt(0).
- Must happen after - Must happen after
@@ -5589,8 +5614,10 @@ agents.
- s_waitcnt lgkmcnt(0)
must happen after
any preceding
- local/generic load/store/load
- atomic/store atomic/atomicrmw.
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
- Must happen before - Must happen before
the following the following
atomicrmw. atomicrmw.
@@ -5602,13 +5629,14 @@ agents.
atomicrmw that is atomicrmw that is
being released. being released.
- 2. buffer/global/flat_atomic 2. buffer/global_atomic
+ 2. buffer/global_atomic 2. buffer/global_atomic
3. s_waitcnt vm/vscnt(0)
- - If CU wavefront execution mode, omit vm/vscnt.
- - Use vmcnt if atomic with
- return and vscnt if atomic
- with no-return.
+ - If CU wavefront execution
+ mode, omit vm/vscnt(0).
+ - Use vmcnt(0) if atomic with
+ return and vscnt(0) if
+ atomic with no-return.
waitcnt lgkmcnt(0).
- Must happen before
the following
@@ -5622,7 +5650,8 @@ agents.
4. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
@@ -5630,7 +5659,8 @@ agents.
atomicrmw acq_rel - workgroup - local 1. waitcnt vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- If OpenCL, omit.
- Could be split into
separate s_waitcnt
@@ -5682,7 +5712,8 @@ agents.
4. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- If OpenCL omit.
- Ensures that
following
@@ -5692,8 +5723,9 @@ agents.
atomicrmw acq_rel - workgroup - generic 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL, omit. - If OpenCL, omit
waitcnt lgkmcnt(0).
- Must happen after
@@ -5727,8 +5759,10 @@ agents.
- s_waitcnt lgkmcnt(0)
must happen after
any preceding
- local/generic load/store/load
- atomic/store atomic/atomicrmw.
+ local/generic
+ load/store/load
+ atomic/store
+ atomic/atomicrmw.
- Must happen before - Must happen before
the following the following
atomicrmw. atomicrmw.
@@ -5744,7 +5778,8 @@ agents.
3. s_waitcnt lgkmcnt(0) 3. s_waitcnt lgkmcnt(0) &
vm/vscnt(0)
- - If CU wavefront execution mode, omit vm/vscnt.
+ - If CU wavefront execution
+ mode, omit vm/vscnt(0).
- If OpenCL, omit. - If OpenCL, omit
waitcnt lgkmcnt(0).
- Must happen before - Must happen before
@@ -5762,7 +5797,8 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
@@ -5813,12 +5849,12 @@ agents.
atomicrmw that is atomicrmw that is
being released. being released.
- 2. buffer/global/flat_atomic 2. buffer/global_atomic
+ 2. buffer/global_atomic 2. buffer/global_atomic
3. s_waitcnt vmcnt(0) 3. s_waitcnt vm/vscnt(0)
- - Use vmcnt if atomic with
- return and vscnt if atomic
- with no-return.
+ - Use vmcnt(0) if atomic with
+ return and vscnt(0) if
+ atomic with no-return.
waitcnt lgkmcnt(0).
- Must happen before - Must happen before
following following
@@ -5893,9 +5929,9 @@ agents.
- If OpenCL, omit - If OpenCL, omit
lgkmcnt(0). lgkmcnt(0).
- - Use vmcnt if atomic with
- return and vscnt if atomic
- with no-return.
+ - Use vmcnt(0) if atomic with
+ return and vscnt(0) if
+ atomic with no-return.
- Must happen before - Must happen before
following following
buffer_wbinvl1_vol. buffer_gl*_inv.
@@ -5923,8 +5959,9 @@ agents.
fence acq_rel - workgroup *none* 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- If OpenCL and - If OpenCL and
address space is address space is
not generic, omit. not generic, omit
@@ -6043,7 +6080,8 @@ agents.
3. buffer_gl0_inv
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Ensures that
following
loads will not see
@@ -6164,8 +6202,9 @@ agents.
load atomic seq_cst - workgroup - global 1. s_waitcnt lgkmcnt(0) 1. s_waitcnt lgkmcnt(0) &
- generic vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit vmcnt and
- vscnt.
+ - If CU wavefront execution
+ mode, omit vmcnt(0) and
+ vscnt(0).
- Could be split into
separate s_waitcnt
vmcnt(0), s_waitcnt
@@ -6251,8 +6290,17 @@ agents.
preventing a store preventing a store
release followed by release followed by
load acquire from load acquire from
- competing out of competing out of
- order.) order.)
+ completing out of completing out of
+ order. The waitcnt order. The waitcnt
+ could be placed after could be placed after
+ seq_store or before seq_store or before
+ the seq_load. We the seq_load. We
+ choose the load to choose the load to
+ make the waitcnt be make the waitcnt be
+ as late as possible as late as possible
+ so that the store so that the store
+ may have already may have already
+ completed.) completed.)
2. *Following 2. *Following
instructions same as instructions same as
@@ -6269,7 +6317,8 @@ agents.
1. s_waitcnt vmcnt(0) & vscnt(0)
- - If CU wavefront execution mode, omit.
+ - If CU wavefront execution
+ mode, omit.
- Could be split into
separate s_waitcnt
vmcnt(0) and s_waitcnt
@@ -6338,8 +6387,17 @@ agents.
preventing a store
release followed by
load acquire from
- competing out of
- order.)
+ completing out of
+ order. The waitcnt
+ could be placed after
+ seq_store or before
+ the seq_load. We
+ choose the load to
+ make the waitcnt be
+ as late as possible
+ so that the store
+ may have already
+ completed.)
2. *Following
instructions same as
@@ -6437,8 +6495,17 @@ agents.
preventing a store preventing a store
release followed by release followed by
load acquire from load acquire from
- competing out of competing out of
- order.) order.)
+ completing out of completing out of
+ order. The waitcnt order. The waitcnt
+ could be placed after could be placed after
+ seq_store or before seq_store or before
+ the seq_load. We the seq_load. We
+ choose the load to choose the load to
+ make the waitcnt be make the waitcnt be
+ as late as possible as late as possible
+ so that the store so that the store
+ may have already may have already
+ completed.) completed.)
2. *Following 2. *Following
instructions same as instructions same as
More information about the llvm-commits
mailing list