[PATCH] D77670: [CUDA] Add partial support for recent CUDA versions.

Fri Aug 13 13:38:52 PDT 2021

Hahnfeld added a comment.

In D77670#2944192 <https://reviews.llvm.org/D77670#2944192>, @tra wrote:

> In D77670#2943753 <https://reviews.llvm.org/D77670#2943753>, @Hahnfeld wrote:
>
>> @tra The split between `LATEST` and `LATEST_SUPPORTED` leads to very weird warning and error messages:
>
> Agreed, it's far from ideal. There's also more than one issue involved.

Unfortunately, yes...

>> clang-14: warning: unknown CUDA version: cuda.h: CUDA_VERSION=11040.; assuming the latest supported version 10.1 [-Wunknown-cuda-version]
>
> The good news is that we've grown support for enough clang builtins and PTX instructions to bump the "latest supported" to ~CUDA-11.3 or, maybe, even 11.4.  At least, clang  should be able to compile all CUDA headers in those versions.
> This should reduce the noise.

Great!

>> clang-14: error: cannot find libdevice for sm_20; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice
>
> It's also time to bump the default GPU target to something that's supported by the CUDA versions we reasonably expect to see. That should probably be sm_35 as that's probably the oldest GPU platform that's still widely available (e.g. there are tons of them on Google cloud and AWS) and is still supported by all CUDA versions clang accepts.

+1 for at least `sm_35` - that would match recent `nvcc`s, right?

>> clang-14: error: GPU arch sm_20 is supported by CUDA versions between 7.0 and 8.0 (inclusive), but installation at /usr/local/cuda-11.4 is 11.2; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'
>
> Perhaps it's time to start considering decommisioning sm_20 support in clang and NVPTX. nvcc has done that long ago and is already on the way to dropping sm_3x, too. sm_30 is no longer supported and sm_35 has been deprecated and is expected be gone in the next CUDA release.

+1 - given that Clang 13.x just branched, now may be an ideal moment to make this cut.

>> Clang is mentioning three different CUDA versions here: 11.4 is what I really have installed, 11.2 is `LATEST` and therefore the one returned by `getCudaVersion` or as the "last resort" in `CudaInstallationDetector`, and the first warning says that Clang assumes the latest supported version 10.1. As a developer looking into the code, I get that the first warning is about saying that 10.1 is the latest fully supported version in terms of features, but I think this is really confusing to users. Do you see a chance to improve this? (other than adding just 11.3 and 11.4 to the enumerations where we'll always run behind)
>
> I'm open to suggestions. This was the least bad compromise I managed to come up with.
>
> We could report the actually detected version, instead of the 'latest' version clang knows about. Or not report it at all as it's not particularly helpful for the end user. That would mitigate one source of confusion.
>
> As for the `latest supported`, I think we may still want to have it in some form. Clang has to deal with version-specific CUDA quirks, so a CUDA version outside of the range that clang is known to work with puts the user in uncharted waters. E.g. until recently clang worked well enough with CUDA-11.3, but only if you were compiling for the older GPUs. Attempts to compile some headers for sm_80 would fail and that *was* confusing to users who ran into that when the warning was disabled.

Yeah, the problem was that I didn't have better suggestions either when I wrote the first comment. But maybe now: How about having a "past-the-latest" value in the enum that Clang remembers if it detects a version more recent than it knows about? Then we could have two warnings:

- If we have a "past-the-latest" version, tell the user that Clang has no clue about this version and we assume the `LATEST` version; things might work, but no guarantees.
- If we have a version that is greater than the latest supported version, emit the current warning and say that support is "best-effort" (or something along that line). In that case, both the detected version and the "assumed" supported version should make sense to the user.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77670/new/

https://reviews.llvm.org/D77670