[Mlir-commits] [clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)
Krzysztof Drewniak
llvmlistbot at llvm.org
Mon Apr 28 00:00:28 PDT 2025
krzysz00 wrote:
@jayfoad
> High level question: I don't understand why you call this a "gather" operation. What do you mean by that? Isn't it semantically just a memcpy, or a (global/buffer) load followed by a (LDS) store?
The semantics of this operation (at least in the pre-gfx950 cases) are
```
lds_load(vector globalAddr, scalar ldsAddr) {
lds[ldsAddr + 4 * laneId] = global[globalAddr];
}
```
Note that your lane-varying global address can point all over memory, but that the values to written to LDS always go at base, base + 4 bytes, base + 8 bytes, ... base + (wavesize - 1) * 4 bytes
>From where I'm standing, this is a gather
https://github.com/llvm/llvm-project/pull/137425
More information about the Mlir-commits
mailing list