[llvm] [clang] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

Wed Jan 31 10:36:03 PST 2024

jhuber6 wrote:

> Supporting such mixed mode opens an interesting set of issues we may need to consider going forward:
> 
> who/where/how runs initializers in the fully linked parts?

I'm assuming you're talking about GPU-side constructors? I don't think the CUDA runtime supports those, but OpenMP runs them when the image is loaded, so it would handle both independantly.

> Are public functions in the fully linked parts visible to the functions in partially linked parts? In the full-rdc mode they would, as if it's a plain C++ compilation. In partial they would not as the main GPU executable and the partial parts will be in separate executables.
>

This has the same semantics as a `-fno-gpu-rdc` compilation. Any public `__device__` function will not be available to be linked if someone did this across a boundary. 

> This would be OK for something like CUDA where cross-TU references are usually limited to host, but would be surprising for someone who would expect C++-like behavior, which sort of the ultimate goal for offloading use case. This will eventually become a problem if/when we grow large enough subset of independent offload-enabled libraries. The top-level user will have a hard time figuring out what's visible and what is not, unless the libraries deliberately expose only host-level APIs, if/when they fully link GPU side code.

The idea is that users already get C++-like behavior with the new driver and `-fgpu-rdc` generally. But in some cases they may wish to keep GPU code "private" to a subset of the project for some other purposes. Doing a relocatable link with the offloading toolchain shows enough intent in my mind that we don't need to worry about people being confused so long as we document what it does.

https://github.com/llvm/llvm-project/pull/80066