[llvm] [offload][SYCL] Add SYCL Module splitting. (PR #131347)

Mon May 5 10:55:40 PDT 2025

jdoerfert wrote:

> Do you have any objections?

Yes. As mentioned [before](https://github.com/llvm/llvm-project/pull/131347#issuecomment-2738719826), there is nothing "sycl" about this functionality. I want to use this, as it becomes available, for OpenMP offload as well. I can easily see C++ users on the host wanting to use this, etc.

> I am a bit skeptical about recent FunctionCategorizer design since other OffloadKind users would rather interested of reusing a thin-LTO framework.

I'm not sure how you mean this but it is either not true or unrelated. Let me walk you through what I want for OpenMP offload on an AMDGPU:

1) 10 TUs come in
2) We split them by kernels into N TUs, and include/duplicate stuff known to be connected to each kernel.
3) We run the middle-end in parallel on N TUs.
4) We perform an aggressive thin-lto step to ensure self-contained modules.
5) We run the backend in parallel and create N device binary modules.

In such a setup, thin-lto and early splitting are disconnected. The thin-lto part uses the existing infrastructure, and one could argue so should this, but either way, we want the splitting to happen early as well to gain compilation parallelism on the device side. Similarly, one could imagine doing this on the host as well. In both cases, there is nothing "sycl-specific" about it. 

https://github.com/llvm/llvm-project/pull/131347