[llvm] [AMDGPU] Set glc/slc on volatile/nontemporal SMEM loads (PR #77443)
Tony Tye via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 11 15:39:10 PST 2024
================
@@ -5813,6 +5813,18 @@ in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
be reordered by
hardware.
+ load *none* *none* - constant - !volatile & !nontemporal
+
+ 1. s_load/s_buffer_load
+
+ - !volatile & nontemporal
+
+ 1. s_load/s_buffer_load glc=1 slc=1
+
+ - volatile
+
+ 1. s_load/s_buffer_load glc=1
----------------
t-tye wrote:
We need a waitcny lgkm(0) here for the same reason. It is waiting for the proceeding scalar load to complete before continuing. That ensures each volatile memory is complete before moving to the next one. There is no need to wait for VMEM as any previous VMEM will have been followed by its own waitcnt vmem(0).
https://github.com/llvm/llvm-project/pull/77443
More information about the llvm-commits
mailing list