[PATCH] D140158: [CUDA] Allow targeting NVPTX directly without a host toolchain

Thu Dec 15 15:31:10 PST 2022

tra added a comment.

In D140158#3999716 <https://reviews.llvm.org/D140158#3999716>, @jhuber6 wrote:

> I just realized the method of copying the `.o` to a `.cubin` doesn't work if the link step is done in the same compilation because it doesn't exist yet. To fix this I could either make the tool chain emit `.cubin` if we're going straight to `nvlink`, or use a symbolic link. The former is uglier, the latter will probably only work on Linux.

Does it have to be `.cubin` ? Does nvlink require it?

> Also do you think I should include the CUDA headers in with this? We can always get rid of them with `nogpuinc` or similar if they're not needed. The AMDGPU compilation still links in the device libraries so I'm wondering if we should at least be consistent.

Only some CUDA headers can be used from C++ and they tend to contain host-only declarations, and stubs or CPU implementations of GPU-side functions in that case.
The rest rely on CUDA extensions, attributes, So, in this case doing a C++ compilation will give you a host-side view of the headers in the best case, which is not going to be particularly useful.

If we want to make GPU-side functions available, we'll need to have a pre-included wrapper with more preprocessor magic to make CUDA headers usable.  Doing that in a C++ compilation would be questionable as for a C++ compilation a user would normally assume that no headers are pre-included by the compiler.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140158/new/

https://reviews.llvm.org/D140158