[PATCH] D75811: [CUDA] Choose default architecture based on CUDA installation

Mon Mar 9 14:35:36 PDT 2020

tra added a comment.

In D75811#1913148 <https://reviews.llvm.org/D75811#1913148>, @tambre wrote:

> > Magically changing compiler target based on something external to compiler is a bad idea IMO. I would expect a compilation with exactly the same compiler options to do exactly the same thing. If we magically change default target, that will not be the case.
>
> It'd be the same behaviour as NVCC, which compiles for the lowest architecture it supports.

The difference is NVCC is closely tied to the CUDA SDK itself while clang is expected to work with all of the CUDA versions since 7.x. 
There's no way to match behavior of all NVCC versions at once. Bumping up the current default is fine. Matching particular NVCC version based on the CUDA SDK we happen to find is, IMO, somewhat similar to -march=native. We could implement it via --cuda-gpu-arch=auto or something like that, but I do not want it to be the default.

> I'm currently implementing Clang CUDA support for CMake and lack of this behaviour compared to other languages and compilers complicates matters.
>  During compiler detection CMake compiles a simple program, which includes preprocessor stuff for embedding compiler info in the output. Then it parses that and determines the compiler vendor, version, etc.
> 
> The general assumption is that a compiler can compile a simple program for its language without us having to do compiler-specific options, flags, etc.

Bumping up the default to sm_35 would satisfy this criteria.

> If the compiler fails on this simple program, it's considered broken.

I'm not sure how applicable this criteria for cross-compilation, which is effectively what clang does when we compile CUDA sources.
You are expected to provide correct path to the CUDA installation and correct set of target GPUs to compile for. Only the end user may know it. While we do hardcode few default CUDA locations and deal with quirks of some linux distributions, it does not remove the fact that in general cross-compilation does need the end-user to supply additional inputs.

> A limited list of flags is usually cycled through to support exotic compilers and I could do the same here, but it'd require us invoking the compiler multiple times and increasingly more as old architectures are deprecated.

You can use `--cuda-gpu-arch=sm_30` and that should cover all CUDA versions currently supported by clang.  Maybe, even sm_50 -- I can no longer find any docs for CUDA-7.0, so can't say if it did support Maxwell already.

> We could detect the CUDA installation ourselves and specify a list of arches for each. This seems quite unnecessary when Clang already knows the version and could select a default that at least compiles.
>  Note that this detection happens before any user CMake files are ran, so we can't pass the user's preferred arch (which could also differ per file).

See above. Repeated iteration is indeed unnecessary and bumped up default target should do the job.

In general, though, relying on this check without taking into-account the information supplied by user will be rather fragile.
The CUDA version clang finds by default may not be correct or working and clang *relies* on it in order to do anything useful with CUDA. E.g. if I have an ARM version of CUDA installed under /usr/local/cuda where clang looks for CUDA by default. It will happily find it, but it will not be able to compile anything with it. It may work fine if it's pointed to the correct CUDA location via user-specified options.

Can you elaborate on what exactly does cmake attempts to establish with the test?
If it looks for a working end-to-end CUDA compilation, then it will need to rely on user input to make sure that correct CUDA location is used.
If it wants to check if clang is capable of CUDA compilation, then it should be told *not* to look for CUDA (though you will need to provide a bit of glue similar to what we use for tests https://github.com/llvm/llvm-project/blob/master/clang/test/Driver/cuda-simple.cu). Would something like that be sufficient?

The farthest you can push clang w/o relying on the CUDA SDK is by using `--cuda-gpu-arch=sm_30 --cuda-device-only -S` -- it will verify that clang does have NVPTX back-end compiled in and can generate PTX which will then be passed to CUDA's ptxas. If this part works, then clang is likely to work with any supported CUDA version.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75811/new/

https://reviews.llvm.org/D75811