[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Tue May 29 11:37:23 PDT 2018

tra added a comment.

"Interoperability with other compilers" is probably a statement that's a bit too strong. At best it's kind of compatible with CUDA tools and I don't think it's feasible for other compilers. I.e. it will be useless for AMD GPUs and whatever compiler they use.

In general it sounds like you're going back to what regular CUDA compilation pipeline does:

- [clang] C++->.ptx
- [ptxas] .ptx -> .cubin
- [fatbin] .cubin -> .fatbin
- [clang] C++ + .fatbin -> host .o

On one hand I can see how being able to treat GPU-side binaries as any other host files is convenient. On the other hand, this convenience comes with the price of targeting only NVPTX. This seems contrary to OpenMP's goal of supporting many different kinds of accelerators. I'm not sure what's the consensus in the OpenMP community these days, but I vaguely recall that generic bundling/unbundling was explicitly chosen over vendor-specific encapsulation in host .o when the bundling was implemented. If the underlying reasons have changed since then it would be great to hear more details about that.

Assuming we do proceed with back-to-CUDA approach, one thing I'd consider would be using clang's -fcuda-include-gpubinary option which CUDA uses to include GPU code into the host object. You may be able to use it to avoid compiling and partially linking .fatbin and host .o.

Repository:
  rC Clang

https://reviews.llvm.org/D47394