[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

Krzysztof Drewniak via cfe-commits cfe-commits at lists.llvm.org
Fri May 2 14:32:17 PDT 2025


krzysz00 wrote:

Re discussion on the other PR about "why is this even an intrinsic" - since this probably shouldn't just be in @jayfoad's DMs:

The reason I disagree with "just pattern-match it" is that you can't get the scheduling you want without a guarantee of the intrinssic
 
Namely, while 
```
global_load_b32 v1, v0
ds_write_addtid_b32 v1, s0
```
is obviously
```
s_mov_b32 m0, s0
global_load_lds_b32 v0
```
if we turn that first example into
``` 
pipelined_loop: {
  global_load_b32 v2, v0
  ...
  waitcnt(lds only) + barrier
  ds_read v*, ...
  mfmas(v)
  waitcnt(lds)+s_barrier
  waitcnt(vmem) ;; and not substantially earlier please
  ds_write_addtid_b32 v2, s0
  jle pipelined_loop
}
```
for example, we really don't want that match firing because LDS gets overridden.
 
... *unless* we're double-buffering into LDS and so trying to do
```
pipelined_lds: {
  waitcnt(vmem,lds)+barrier
  load_lds(global1(iv), lds2)
  do_compute(lds1)
  waitcnt(vmem,lds)+barrier
  load_lds(global2(iv), lds1)
  do_compute(lds2) ;; We'd better not be waiting on LDS1 to settle at/before here
  iv += 2
}
```
where, if the pattern match for the addtid load fails, say by waitcnt insertion, that'll cause proglems for the program
 
Not to mention, because we don't have an intrinsic for ds_addtid, and because there are a *lot* of ways to spell the lane ID (mbcnt, workitem.id.x with annotations, a bunch of workitem IDs mod 64, etc etc), that'll be quite fragile
 
So in the context of GEMM stuff, I'd rather not have this at "hope the compiler recognizes what we're trying to do". If the compiler can be made to recognize what we're trying to do reliably in the future, that'll be cool, but I can't be the one to write that patch and I don't think there's infinite bandwidth among the AMDGPU crowd for this improvement either

https://github.com/llvm/llvm-project/pull/137425


More information about the cfe-commits mailing list