[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Thu May 31 14:57:19 PDT 2018

gtbercea added a comment.

> Assuming we do proceed with back-to-CUDA approach, one thing I'd consider would be using clang's -fcuda-include-gpubinary option which CUDA uses to include GPU code into the host object. You may be able to use it to avoid compiling and partially linking .fatbin and host .o.

I tried this example (https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/). It worked with NVCC but not with clang++. I can produce the main.o particle.o and v.o objects as relocatable (-fcuda-rdc) but the final step fails with a missing reference error.
This leads me to believe that embedding the CUDA fatbin code in the host object comes with limitations. If I were to change the OpenMP NVPTX toolchain to do the same then I would run into similar problems.

On the other hand., the example, ported to use OpenMP declare target regions (instead of __device__) it all compiles, links and runs correctly.

In general, I feel that if we go the way you propose then the solution is truly confined to NVPTX. If we instead implement a scheme like the one in this patch then we give other toolchains a chance to perhaps fill the nvlink "gap" and eventually be able to handle offloading in a similar manner and support static linking.

Repository:
  rC Clang

https://reviews.llvm.org/D47394