[PATCH] D121242: [AMDGPU] gfx940 memory model
Scott Linder via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 9 14:01:37 PST 2022
scott.linder added a comment.
I noted some small nits, but otherwise LGTM. I won't accept, as there are others who should probably review.
================
Comment at: llvm/docs/AMDGPUUsage.rst:8715
+ different CUs and so a ``buffer_inv sc0`` is required which will invalidate
+ the L1 cache is in tgsplit mode.
+
----------------
I think this can be dropped, as the sentence starts with the `tgsplit` condition.
================
Comment at: llvm/docs/AMDGPUUsage.rst:8717
+
+ * A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
+ between wavefronts executing in different work-groups as they may be
----------------
Is this the correct scope? This seems like the same case as the last bullet, which states `sc0`. The table below also seems to indicate that `sc0` will invalidate L1, not `sc1`.
================
Comment at: llvm/docs/AMDGPUUsage.rst:8893-8895
+ - wavefront - generic
+ store atomic monotonic - singlethread - global 1. buffer/global/flat_store
+ - wavefront - generic
----------------
Duplicate clause, seems like it can be deleted
================
Comment at: llvm/docs/AMDGPUUsage.rst:8912
+ atomicrmw monotonic - system - global 1. buffer/global/flat_atomic
+ - generic sc1=1
+ atomicrmw monotonic - singlethread - local *If TgSplit execution mode,
----------------
It might be useful to note the "re-use" of the sc0 bit in atomircrmw instructions somewhere, maybe above in the long-form overview portion. To give context to the possible confusion, my original comment here was:
I'm trying to piece together an understanding of the memory model changes, so bear with me, but why is this not `sc0=1 sc1=1`? I guess my misunderstanding boils down to the difference in the `sc0`/`sc1` bits between the `_load`/`_store` instructions and the `_atomic` instructions.
================
Comment at: llvm/docs/AMDGPUUsage.rst:9642
+
+ 3. buffer/global/flat_store sc1=1
+ store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1
----------------
Can collapse these two spaces
================
Comment at: llvm/docs/AMDGPUUsage.rst:9697
+
+ 2. buffer/global/flat_store
+ sc0=1 sc1=1
----------------
3
================
Comment at: llvm/docs/AMDGPUUsage.rst:10881
+ - wavefront - local load atomic acquire,
+ - generic except must generated
+ all instructions even
----------------
Should this be `must generate`? It occurs in a few places
================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:1543
+ case SIAtomicScope::SINGLETHREAD:
+ // RMW atomic operations implicitly bypass the L1 cache and only use SC1
+ // to indicate system or agent scope. The SC0 bit is used to indicate if
----------------
I think including a paragraph along these lines in `AMDGPUUsage.rst` would cover the confusion I originally had
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D121242/new/
https://reviews.llvm.org/D121242
More information about the llvm-commits
mailing list