[clang] [llvm] [mlir] [AMDGPU] Add a new amdgcn.load.to.lds intrinsic (PR #137425)

Tue Apr 29 10:15:46 PDT 2025

lialan wrote:

> > I still think we need an intrinsic here because a load + an addtid store can be scheduled much different from the asynchronous "gather to LDS" - and because we don't want this load/store to not be optimized
> 
> IMO the intrinsic should only be added as a last resort if we really can't get the pattern based codegen to work well enough.

Beg to differ in particularly this case. In downstream application, I want to fine control to use this particular instruction so this gets propagated down to LLVM IR, without being changed or modified along the way.

Well, actual reason: we need this instruction now. :-p

https://github.com/llvm/llvm-project/pull/137425