[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

Thu May 22 07:48:03 PDT 2025

JonChesterfield wrote:

I think we could do with an additional overload here.

Currently a bunch of code (notably CK but probably elsewhere) uses the v4i32 version of the LDS intrinsics. I think this patch lets one use the addrspace(7) pointer of 128 bits alternative. So callers could transform the v4i32 into an addrspace(7) and then call this.

It's not very clear from the backend docs how this stuff is supposed to be wired up by the user. Possibly bitcast from the 4i32 into an addrspace(8) annotated i128, and then addrspacecast to 7 to provide an extra 32 bits of zero, and then onward to this builtin? Whatever the proper sequence might be, adding an overload which takes a v4i32 and does the conversion is likely to improve adoption for the new builtin.

https://github.com/llvm/llvm-project/pull/137425