[PATCH] D75811: [CUDA] Choose default architecture based on CUDA installation

Thu Mar 12 11:23:52 PDT 2020

tra added a comment.

In D75811#1919368 <https://reviews.llvm.org/D75811#1919368>, @tambre wrote:

> After some work on my CMake changes, Clang detection as a CUDA compiler works and I can compile CUDA code.

\o/ Nice! Having cmake supporting clang as a cuda compiler out of the box would be really nice.

> However code using separable compilation doesn't compile. What is the Clang equivalent of NVCC's `-dc` (`--device-c`) option for this case?

Ah, `-rdc` compilation is somewhat tricky. NVCC does quite a bit of extra stuff under the hood that would be rather hard to implement in clang's driver, so it falls on the build system.
Clang will generate relocatable GPU code if you pass `-fcuda-rdc`, but that's only part of the story. Someone somewhere will need to perform the final linking step. There's also additional initialization glue to be handled.
Here's how it's implemented in bazel in Tensorflow: https://github.com/tensorflow/tensorflow/blob/ed371aa5d266222c799a7192e438cdd8c00464fe/third_party/nccl/build_defs.bzl.tpl
The file has fairly detailed description of what needs to be done.

> The CMake code review for CUDA Clang support is here <https://gitlab.kitware.com/cmake/cmake/-/merge_requests/4442>.

I'll take a look.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75811/new/

https://reviews.llvm.org/D75811