[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)
Jay Foad via cfe-commits
cfe-commits at lists.llvm.org
Mon Apr 28 06:48:47 PDT 2025
jayfoad wrote:
> > High level question: I don't understand why you call this a "gather" operation. What do you mean by that? Isn't it semantically just a memcpy, or a (global/buffer) load followed by a (LDS) store?
>
> The semantics of this operation (at least in the pre-gfx950 cases) are
>
> ```
> lds_load(vector globalAddr, scalar ldsAddr) {
> lds[ldsAddr + 4 * laneId] = global[globalAddr];
> }
> ```
>
> Note that your lane-varying global address can point all over memory, but that the values to written to LDS always go at base, base + 4 bytes, base + 8 bytes, ... base + (wavesize - 1) * 4 bytes
>
> From where I'm standing, this is a gather
I see. The LDS part is doing "addtid" addressing. There are other instructions that do this like `DS_LOAD_ADDTID_B32` and `GLOBAL_LOAD_ADDTID_B32` but I don't think we have any codegen support for them.
I think we _could_ add the codegen support just by pattern-matching the address, so `DS_LOAD_ADDTID_B32` would match something like `load ptr addrspace(3) (constant_base + tid *4)`.
Then buffer-load-to-lds could be pattern-matched as a regular (fat pointer) buffer load followed by an addtid-style LDS store, right? So no intrinsic is really _needed_?
https://github.com/llvm/llvm-project/pull/137425
More information about the cfe-commits
mailing list