[PATCH] D121242: [AMDGPU] gfx940 memory model

Wed Mar 9 15:21:54 PST 2022

t-tye added inline comments.

================
Comment at: llvm/docs/AMDGPUUsage.rst:8717
+
+  * A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
+    between wavefronts executing in different work-groups as they may be
----------------
scott.linder wrote:
> Is this the correct scope? This seems like the same case as the last bullet, which states `sc0`. The table below also seems to indicate that `sc0` will invalidate L1, not `sc1`.
Yes this should be sc0.

================
Comment at: llvm/docs/AMDGPUUsage.rst:8912
+     atomicrmw    monotonic    - system       - global   1. buffer/global/flat_atomic
+                                              - generic     sc1=1
+     atomicrmw    monotonic    - singlethread - local    *If TgSplit execution mode,
----------------
scott.linder wrote:
> It might be useful to note the "re-use" of the sc0 bit in atomircrmw instructions somewhere, maybe above in the long-form overview portion. To give context to the possible confusion, my original comment here was:
> 
> I'm trying to piece together an understanding of the memory model changes, so bear with me, but why is this not `sc0=1 sc1=1`? I guess my misunderstanding boils down to the difference in the `sc0`/`sc1` bits between the `_load`/`_store` instructions and the `_atomic` instructions.
Adding a bullet to the preamble text to explain this would be a good suggestion.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121242/new/

https://reviews.llvm.org/D121242