[clang] [llvm] [LinkerWrapper] Support relocatable linking for offloading (PR #80066)

Wed Jan 31 11:29:04 PST 2024

jhuber6 wrote:

> > I'm assuming you're talking about GPU-side constructors? I don't think the CUDA runtime supports those, but OpenMP runs them when the image is loaded, so it would handle both independantly.
> 
> Yes. I'm thinking of the expectations from a C++ user standpoint, and this is one of the areas where there will be observable differences. First, because there will be subsets of the code that are no longer part of the main GPU-side executable. Second, the side effects of the initializers will be different depending on whether we do link such subsets separately or not. E.g. the initializer call order will change. The global state changes in one subset will not be visible in the other. Weak symbol resolution will produce different results. Etc.

It'll definitely have an effect different from full linking, but the idea is that it would be the desired effect if someone went out of their way to do this GPU subset linking thing.
> 
> > The idea is that users already get C++-like behavior with the new driver and -fgpu-rdc generally
> 
> Yes. That will set the default expectations that things work just like in C++, which is a great thing. But introduction of partial subset linking will break some of those "just works" assumptions and it may be triggered by the parts of the build outside of user's control (e.g. by a third-party library).

This was one of the things I was wondering about, since we could alternatively make a new flag for this outside of `-r` so it's explicit. Right now I just kind of assumed that passing `-r` through the offloading toolchain (via CUDA or whatever) was somewhat explicit enough, as if regular `-r` behaviour is desired they could just use `clang` or `ld` normally.

> 
> Side note: we do need a good term for this kind of subset linking. "partial linking" already has established meaning and it's not a good fit here as we actually produce a fully linked GPU executable.
> 

Yeah, coming up with a name is difficult. You could just call it device linking, since it's more or less just doing the device link step ahead of time instead of passing it to when we make the final executable.

> We do need to document how it works. Documenting what does not work, or works differently is also important, IMO. We _do_ need to worry about users and their expectations.

Yes, I should probably update this with some documentation. I'm not sure where it would go however, maybe just in the `clang-linker-wrapper`'s page.

https://github.com/llvm/llvm-project/pull/80066