[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Gheorghe-Teodor Bercea via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu May 31 17:18:18 PDT 2018


gtbercea added a comment.

The error is related to lack of device linking, just like you explained two paragraphs down. This is the error I get:

  main.o: In function `__cuda_module_ctor':
  main.cu:(.text+0x674): undefined reference to `__cudaRegisterLinkedBinary__nv_c5b75865'

You nailed the problem on the head: the device linking step is the tricky bit.

The OpenMP toolchain has the advantage that it already calls NVLINK (upstreamed a long time ago). This patch doesn't change that. This patch "fixes" (for a lack of a better word) the way in which objects are created on the device side. By adding the FATBINARY + CLANG++ steps to the device toolchain, I ensure that the existing call to NVLINK will be able to "detect" the device-part of individual or archived objects. This is not a valid statement in today's compiler in which NVLINK would not be able to do so with archived objects (static libs).

In general, for offloading toolchains, I don't see the reliance on vendor specific tools as a problem **if and only if** the calls to vendor-specific tools remain confined to a device-speicifc toolchain. This patch respects this condition. All the calls to CUDA tools in this patch are part of the OpenMP NVPTX device offloading toolchain (which is an NVPTX device specific toolchain).

The only host-side change is the call to "ld -r" which replaces a call to the "openmp-offload-bundler" tool.


Repository:
  rC Clang

https://reviews.llvm.org/D47394





More information about the cfe-commits mailing list