[PATCH] D123441: [CUDA][HIP] Fix host used external kernel in archive

Tue Apr 12 11:59:14 PDT 2022

tra added a comment.

LGTM in principle. This will keep around the GPU code we do need.

That said, it seems to be a rather blunt hammer. I think we'll end up linking almost everything in an archive into the final executable as we'll likely have a host-visible symbol in most of the GPU objects (e.g. most of them would have a kernel).
Device-side linking would also be unaware of which objects were actually linked into the host executable and thus would link in more objects than necessary. We could have achieved about the same result by linking with `--whole-archive`.

The root of the problem here is that in isolation GPU-side linking does not know what will really be needed by the host and thus has to link in everything, except, maybe, object files where we may have `__device__` functions only.
Ideally, the linking should be a two-phase process -- link CPU side, extract references to the GPU symbols (host-side compilation would have to be augmented to place them in a well known location) and pass them to the GPU-side linker which would then have all the info necessary to pull in relevant GPU-side objects without compiler having to force having nearly all of them linked in.

I realize that this would be a nontrivial change to the compilation pipeline. As a short-to-medium term solution, this patch may do, though I'd probably prefer just linking with `--whole-archive` as it would, in theory, be simpler.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123441/new/

https://reviews.llvm.org/D123441