[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

Fri May 2 14:59:10 PDT 2025

================
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
 // GFX9 Intrinsics
 //===----------------------------------------------------------------------===//
 
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and use it to perform a gather to LDS.
+/// This allows abstracting over both global pointers (address space 1) and
+/// the buffer-resource-wrapper pointers (address space 7 and 9).
+/// TODO: add support for address space 5 and scratch_load_lds.
+class AMDGPULoadToLDS :
+  Intrinsic <
+    [],
+    [llvm_anyptr_ty,                    // Base pointer to load from. Varies per lane.
+     LLVMQualPointerType<3>,            // LDS base pointer to store to. Must be wave-uniform.
+     llvm_i32_ty,                       // Data byte size: 1/2/4 (/12/16 for gfx950)
+     llvm_i32_ty,                       // imm offset (applied to both input and LDS address)
----------------
krzysz00 wrote:

Oh, thanks for finding the context! `git blame` failed me. So ... we're having the discussion from that thread again, and therefore I'd like to appeal to precedent in the short term (regarding the immoffset parameter) in the interests of making some sort of progress.

If we ever fix the immoffset issue, upgrading into making the immoffset a constant 0 and adding it to both pointers should be fine? But that'd require a sufficiently robust pattern match, which I'm not sure we're convinced of

https://github.com/llvm/llvm-project/pull/137425