[llvm] [AMDGPU] Add AMDGPU-specific module splitting (PR #89245)

Thu May 16 06:40:07 PDT 2024

https://github.com/jhuber6 commented:

My understanding is that `--lto-partitions` doesn't work because the AMDGPU backend needs to consume the SCC in order. I'm assuming this patch maintains that logic but splits the independent kernel calls up to they can be done in parallel?

I've thought  that the true solution to resolving this would just be emitting the kernel resource usage inside of `ld.lld`. Presumably that would require emitting resource usage per-function in parallel and then the callgraph information (There's some small support for this already). Then `ld.lld` would need to traverse the callgraph to get the diameter of said graph. However I'm assuming that'd take more effort to define an actual linking ABI.

https://github.com/llvm/llvm-project/pull/89245