[llvm] [AMDGPU] Add AMDGPU-specific module splitting (PR #89245)

Fri May 17 00:38:12 PDT 2024

Pierre-vh wrote:

> My understanding is that `--lto-partitions` doesn't work because the AMDGPU backend needs to consume the SCC in order. I'm assuming this patch maintains that logic but splits the independent kernel calls up to they can be done in parallel?

Exactly. If a function is called directly or indirectly by a kernel, it stays in that kernel's module.

> I've thought that the true solution to resolving this would just be emitting the kernel resource usage inside of `ld.lld`. Presumably that would require emitting resource usage per-function in parallel and then the callgraph information (There's some small support for this already). Then `ld.lld` would need to traverse the callgraph to get the diameter of said graph. However I'm assuming that'd take more effort to define an actual linking ABI.

Indeed, and this is where I started looking as well (so we could just enable thinLTO), but it's hard to get right and this solution was by far the easiest and the most realistic one to do in the short term.

https://github.com/llvm/llvm-project/pull/89245