[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)
Krzysztof Drewniak via cfe-commits
cfe-commits at lists.llvm.org
Fri May 2 11:12:51 PDT 2025
================
@@ -2641,6 +2641,28 @@ def int_amdgcn_perm :
// GFX9 Intrinsics
//===----------------------------------------------------------------------===//
+/// This is a general-purpose intrinsic for all operations that take a pointer
+/// a base location in LDS, and a data size and use it to perform a gather to LDS.
+/// This allows abstracting over both global pointers (address space 1) and
+/// the buffer-resource-wrapper pointers (address space 7 and 9).
+/// TODO: add support for address space 5 and scratch_load_lds.
+class AMDGPULoadToLDS :
+ Intrinsic <
+ [],
+ [llvm_anyptr_ty, // Base pointer to load from. Varies per lane.
+ LLVMQualPointerType<3>, // LDS base pointer to store to. Must be wave-uniform.
+ llvm_i32_ty, // Data byte size: 1/2/4 (/12/16 for gfx950)
+ llvm_i32_ty, // imm offset (applied to both input and LDS address)
----------------
krzysz00 wrote:
... Oh. Consider the case that you have global p + N and LDS q + N. Then the LDS combiner can rewrite this to (q' + O) + N, aka q' + (O + N).
Then the two pointers won't have the same offset anymore and so it's unclear if you can slide it onto the instruction immediate
https://github.com/llvm/llvm-project/pull/137425
More information about the cfe-commits
mailing list