[llvm] [AMDGPU] Set glc/slc on volatile/nontemporal SMEM loads (PR #77443)

Thu Jan 11 15:39:10 PST 2024

================
@@ -5813,6 +5813,18 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
                                                               be reordered by
                                                               hardware.
 
+     load         *none*       *none*         - constant - !volatile & !nontemporal
+
+                                                           1. s_load/s_buffer_load
+
+                                                         - !volatile & nontemporal
+
+                                                           1. s_load/s_buffer_load glc=1 slc=1
+
+                                                         - volatile
+
+                                                           1. s_load/s_buffer_load glc=1
----------------
t-tye wrote:

We need a waitcny lgkm(0) here for the same reason. It is waiting for the proceeding scalar load to complete before continuing. That ensures each volatile memory is complete before moving to the next one. There is no need to wait for VMEM as any previous VMEM will have been followed by its own waitcnt vmem(0).

https://github.com/llvm/llvm-project/pull/77443