[llvm] [AMDGPU] Enable volatile and non-temporal for loads to LDS (PR #153244)
Robert Imschweiler via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 14 01:38:03 PDT 2025
================
@@ -218,3 +218,172 @@ main_body:
ret void
}
+define amdgpu_ps void @global_load_lds_dword_volatile(ptr addrspace(1) nocapture %gptr, ptr addrspace(3) inreg %lptr) {
+; GFX90A-LABEL: global_load_lds_dword_volatile:
+; GFX90A: ; %bb.0: ; %main_body
+; GFX90A-NEXT: s_mov_b32 m0, s0
+; GFX90A-NEXT: s_nop 0
+; GFX90A-NEXT: global_load_dword v[0:1], off lds
+; GFX90A-NEXT: s_waitcnt vmcnt(0)
+; GFX90A-NEXT: global_load_dword v[0:1], off offset:256 lds
+; GFX90A-NEXT: global_load_dword v[0:1], off offset:512 lds
+; GFX90A-NEXT: s_endpgm
+;
+; GFX942-LABEL: global_load_lds_dword_volatile:
+; GFX942: ; %bb.0: ; %main_body
+; GFX942-NEXT: s_mov_b32 m0, s0
+; GFX942-NEXT: s_nop 0
+; GFX942-NEXT: global_load_lds_dword v[0:1], off sc0 sc1
+; GFX942-NEXT: s_waitcnt vmcnt(0)
+; GFX942-NEXT: global_load_lds_dword v[0:1], off offset:256
+; GFX942-NEXT: global_load_lds_dword v[0:1], off offset:512
+; GFX942-NEXT: s_endpgm
+;
+; GFX10-LABEL: global_load_lds_dword_volatile:
+; GFX10: ; %bb.0: ; %main_body
+; GFX10-NEXT: s_mov_b32 m0, s0
+; GFX10-NEXT: global_load_dword v[0:1], off lds
----------------
ro-i wrote:
I think the waitcnt is missing on this one? Also, shoudn't the glc bit be set for volatile loads on GFX10 and GFX90A? (For GFX10, it's a bit unclear from [the docs](https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table) if dlc should alco be set for volatile loads, but ig that the comment "If GFX10, omit dlc=1" also applies to that one.)
https://github.com/llvm/llvm-project/pull/153244
More information about the llvm-commits
mailing list