[clang] [Clang][SYCL] Introduce clang-sycl-link-wrapper to link SYCL offloading device code (Part 1 of many) (PR #112245)

Tue Oct 15 17:26:19 PDT 2024

jhuber6 wrote:

> From the discourse post and everything I've found reading about the SYCL tooling, it seems to me like this should really just all be integrated into LLD and performed with the linking phase. It seems like a huge waste of IO to read objects, rip out device-specific bits, process that separately, then read the objects again in another process to link the host bits, then in a third process read the linked host and device bits and combine them...

Doing all of this in `lld` is something I floated around at the inception of the offload handling but decided it was more trouble than it was worth. You can think of this flow as a sort of linker plugin, where we preprocess a bunch of input files and then give a result back. In this case the result is an object with the embedded device code and the runtime calls necessary to register it. I didn't go this route for two reasons. First, it forces the user to change their host linker which people don't like. Second, we would need to go through great lengths to make the NVPTX target work because they don't use `ld.lld` and their toolchain is proprietary. i doubted it was going to be a very popular option to make `ld.lld` a frontend for `nvlink`, but we could go that route if so inclined, it's the same thing as another tool I wrote to work around NVIDIA's linker being subpar.

https://github.com/llvm/llvm-project/pull/112245