[llvm] [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. (PR #87265)

Thu Aug 22 21:03:21 PDT 2024

b-sumner wrote:

> > The runtime doesn't split the dispatch into machine-sized chunks.  If it does have a limit, then it is probably much larger than we want to allocate for.
> 
> I thought it already had to do this if stack was enabled to avoid going over a device wide limit

Yes, there is a special mode when scratch space is low but something like that would not be desirable to impose on every dispatch.

https://github.com/llvm/llvm-project/pull/87265