[PATCH] D121242: [AMDGPU] gfx940 memory model

Wed Mar 9 14:01:37 PST 2022

scott.linder added a comment.

I noted some small nits, but otherwise LGTM. I won't accept, as there are others who should probably review.

================
Comment at: llvm/docs/AMDGPUUsage.rst:8715
+    different CUs and so a ``buffer_inv sc0`` is required which will invalidate
+    the L1 cache is in tgsplit mode.
+
----------------
I think this can be dropped, as the sentence starts with the `tgsplit` condition.

================
Comment at: llvm/docs/AMDGPUUsage.rst:8717
+
+  * A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence
+    between wavefronts executing in different work-groups as they may be
----------------
Is this the correct scope? This seems like the same case as the last bullet, which states `sc0`. The table below also seems to indicate that `sc0` will invalidate L1, not `sc1`.

================
Comment at: llvm/docs/AMDGPUUsage.rst:8893-8895
+                               - wavefront    - generic
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
+                               - wavefront    - generic
----------------
Duplicate clause, seems like it can be deleted

================
Comment at: llvm/docs/AMDGPUUsage.rst:8912
+     atomicrmw    monotonic    - system       - global   1. buffer/global/flat_atomic
+                                              - generic     sc1=1
+     atomicrmw    monotonic    - singlethread - local    *If TgSplit execution mode,
----------------
It might be useful to note the "re-use" of the sc0 bit in atomircrmw instructions somewhere, maybe above in the long-form overview portion. To give context to the possible confusion, my original comment here was:

I'm trying to piece together an understanding of the memory model changes, so bear with me, but why is this not `sc0=1 sc1=1`? I guess my misunderstanding boils down to the difference in the `sc0`/`sc1` bits between the `_load`/`_store` instructions and the `_atomic` instructions.

================
Comment at: llvm/docs/AMDGPUUsage.rst:9642
+
+                                                         3. buffer/global/flat_store  sc1=1
+     store atomic release      - system       - global   1. buffer_wbl2 sc0=1 sc1=1
----------------
Can collapse these two spaces

================
Comment at: llvm/docs/AMDGPUUsage.rst:9697
+
+                                                         2. buffer/global/flat_store
+                                                            sc0=1 sc1=1
----------------
3

================
Comment at: llvm/docs/AMDGPUUsage.rst:10881
+                               - wavefront    - local    load atomic acquire,
+                                              - generic  except must generated
+                                                         all instructions even
----------------
Should this be `must generate`? It occurs in a few places

================
Comment at: llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp:1543
+    case SIAtomicScope::SINGLETHREAD:
+      // RMW atomic operations implicitly bypass the L1 cache and only use SC1
+      // to indicate system or agent scope. The SC0 bit is used to indicate if
----------------
I think including a paragraph along these lines in `AMDGPUUsage.rst` would cover the confusion I originally had

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121242/new/

https://reviews.llvm.org/D121242