[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Wed Jun 6 03:23:30 PDT 2018

Hahnfeld added a comment.

In https://reviews.llvm.org/D47394#1123044, @tra wrote:

> While I'm not completely convinced that [fatbin]->.c->[clang]->.o (with device code only)->[ld -r] -> host.o (host+device code) is ideal (things could be done with smaller number of tool invocations), it should help to deal with -rdc compilation until we get a chance to improve support for it in Clang. We may revisit and change this portion of the pipeline when clang can incorporate -rdc GPU binaries in a way compatible with CUDA tools.

I think this should work with current trunk, Clang puts the GPU binary into a section called `__nv_relfatbin` when also passing `-fcuda-rdc` (see https://reviews.llvm.org/D42922).
What will probably result in problems are the registration functions as shown above by @gtbercea (`undefined references`...). But as we don't need them for OpenMP (we have our own registration machinery) it might be worth implementing something like `-fno-cuda-registration`. Maybe then `clang -cc1 <host> -fcuda-include-gpubinary <device> -fcuda-rdc -fno-cuda-registration` can be used to embed the device object, replacing the dance ending in `ld -r`?

Repository:
  rC Clang

https://reviews.llvm.org/D47394