[llvm] [AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. (PR #87265)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 22 11:16:05 PDT 2024
arsenm wrote:
> OK. Suppose the launch has a million work groups. How much memory should the runtime allocate, and how will workgroup J decode what part of that memory to use?
The runtime is already bounded on how many groups it can dispatch at once; the allocation is tied to the dispatch size.
> It can certainly be done but I'm wondering if we really need to do it now? And how much do we really need an independently working SW LDS?
I think having the trap door of pure software LDS would enable some useful experiments, such as not depending on any whole program visibility to lower function defined local variables. It also reduces the number of parts that need to directly interact in the compiler pipeline. With the current approach I foresee having to fix the same bugs twice in the module LDS lowering, and the asan version of module LDS lowering
https://github.com/llvm/llvm-project/pull/87265
More information about the llvm-commits
mailing list