[PATCH] D114326: Update the list of CUDA versions up to 11.5

Thu Nov 25 01:13:10 PST 2021

carlosgalvezp accepted this revision.
carlosgalvezp added a comment.

LGTM but I think @tra should have the final word.

================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131
+  std::initializer_list<const char *> Versions = {
+      "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1",
+      "10.0", "9.2",  "9.1",  "9.0",  "8.0",  "7.5",  "7.0"};
----------------
tra wrote:
> carlosgalvezp wrote:
> > mojca wrote:
> > > tra wrote:
> > > > tra wrote:
> > > > > mojca wrote:
> > > > > > carlosgalvezp wrote:
> > > > > > > mojca wrote:
> > > > > > > > tra wrote:
> > > > > > > > > tra wrote:
> > > > > > > > > > mojca wrote:
> > > > > > > > > > > kadircet wrote:
> > > > > > > > > > > > looks like the list is getting big and hard to maintain. considering that this is done only once per compiler invocation (and we check for existence of directories down in the loop anyway). what about throwing in an extra directory listing to base-directories mentioned down below and populate `Candidates` while preserving the newest-version-first order?
> > > > > > > > > > > I totally agree with the sentiment, and that was my initial thought as well, but with zero experience I was too scared to make any more significant changes.
> > > > > > > > > > > 
> > > > > > > > > > > I can try to come up with a new patch (that doesn't need further maintenance whenever a new CUDA version gets released) if that's what you are suggesting. I would nevertheless merge this one, and prepare a new more advanced patch separately, but that's finally your call.
> > > > > > > > > > > 
> > > > > > > > > > > What's your suggestion about D.SysRoot + "Program Files/..."? At the time when this function gets called it looks like D.SysRoot is empty (or at least my debugger says so) and in my case it resolves to D: while the CUDA support is installed under C:.
> > > > > > > > > > > 
> > > > > > > > > > > Is there any special LLVM-specific/preferrable way to iterate through directories?
> > > > > > > > > > > 
> > > > > > > > > > > (What I also miss a bit in the whole process in an option to simply say "I want CUDA 11.1" without the need to explicitly spell out the full path.)
> > > > > > > > > > > 
> > > > > > > > > > > If you provide me give some general guidelines, I'll prepare another, hopefully more future-proof patch.
> > > > > > > > > > > 
> > > > > > > > > > > (Side note: I'm not sure if I'm calling clang-format correctly, but if I call it, it keeps reformatting the rest of this file.)
> > > > > > > > > > This whole list may no longer be particularly useful. The most common use case on Linux, AFAICT, is to install only one CUDA version using system-provided package manager.
> > > > > > > > > > E.g. https://packages.ubuntu.com/focal/amd64/nvidia-cuda-toolkit/filelist
> > > > > > > > > > 
> > > > > > > > > > TBH, I'm tempted to limit autodetection to only that one system-default version and require user to use --cuda-path if they need something else.
> > > > > > > > > I think on windows (I mean the windows environment itself, not WSL), CUDA installer sets an environment variable which could be used to detect the default CUDA version, so it may warrant a windows-specific way to find it. 
> > > > > > > > On Windows this is certainly not the case. Unless the installation is changed manually, one always gets the new version installed into a new directory.
> > > > > > > > 
> > > > > > > > I really do need multiple versions on Windows (and the ability to pick an older one) if I want to compile a binary that works on someone else's computer (if I compile against the latest CUDA, users need "the latest" drivers that may sometimes not even be available for their machine).
> > > > > > > > 
> > > > > > > > (That said, at least when using CMake, the selection needs to be done by CMake anyway, and maybe CMake could be forced to specify the correct flag automatically.)
> > > > > > > > 
> > > > > > > > So even if the functionality gets removed from non-Windows platforms, it would be really nice to keep it for Windows.
> > > > > > > > 
> > > > > > > > Now, there are three "conflicting" feedbacks/suggestions above. I can try to improve support, but it would be really helpful to reach the consensus of what needs to be done before coding it.
> > > > > > > > one always gets the new version installed into a new directory.
> > > > > > > A similar thing happens on Linux.
> > > > > > > 
> > > > > > > > users need "the latest" drivers
> > > > > > > Since CUDA 10.2, there's some "[[ https://docs.nvidia.com/deploy/cuda-compatibility/ | compatibility mode ]]" that allows to run newer CUDA on older drivers. As long as you are not using the latest features, of course.
> > > > > > > 
> > > > > > > I'm personally all up for removing redundancy and duplication. 
> > > > > > I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under `/usr/local/cuda-11.x`.
> > > > > > 
> > > > > > That sounds significant enough to me to argue against the removal of versioned folders from search.
> > > > > > I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under /usr/local/cuda-11.x.
> > > > > 
> > > > > OK, that's something that will be used often enough, so still need to deal with versioned directories. :-(
> > > > > 
> > > > > > That sounds significant enough to me to argue against the removal of versioned folders from search.
> > > > > 
> > > > > I'm not against probing for multiple versions in principle.
> > > > > 
> > > > > What I'm saying is that:
> > > > > - blindly increasing the number of probed directories comes with a price and should be done cautiously.
> > > > > - it may be a good time to revisit how we detect CUDA installations and make sure it makes sense now.
> > > > > 
> > > > > There are two considerations.
> > > > > 
> > > > > First is the overhead it adds to compiler driver. While for most users running locally it's negligible, it would be noticed for the users who may have /usr/local mounted over NFS, which is not that unusual in institutional environments. Another thing to consider that it will be noticed by *all* such users, even if they don't use CUDA. E.g. all C++ compilations will be probing for those directories.
> > > > > 
> > > > > The other issue is that we should differentiate between finding the canonical location of CUDA SDK, vs picking one out of many CUDA SDK versions that may be installed simultaneously. I'd argue that in case of having multiple versions compiler has no business picking one version over another (though we could make an attempt to use the most recent one as we do now). It's up to the user to explicitly specify the one they want. When we allow to pick one version out of many, it makes it way too easy for a user to end up with a mix of the default version and the version they want, all they need to do is to forget `--cuda-path` somewhere in their build.
> > > > > 
> > > > > 
> > > > > For this patch, I propose to drop the 9.x and 8.x so we keep the number of probed paths under control.
> > > > > Since CUDA 10.2, there's some "compatibility mode" that allows to run newer CUDA on older drivers.
> > > > 
> > > > This only works with very specific versions of the drivers and those are not very common on the end-user machines. It's mostly for the datacenter use.
> > > > I think on windows (I mean the windows environment itself, not WSL), CUDA installer sets an environment variable which could be used to detect the default CUDA version, so it may warrant a windows-specific way to find it.
> > > 
> > > I see that I have
> > > ```
> > > CUDA_PATH       = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
> > > CUDA_PATH_V11_4 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
> > > CUDA_PATH_V11_3 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3
> > > ```
> > > etc. Using those directly might in fact be a good idea.
> > > 
> > > > The other issue is that we should differentiate between finding the canonical location of CUDA SDK, vs picking one out of many CUDA SDK versions that may be installed simultaneously.
> > > 
> > > Agreed. (I also wish clang would accept something like `--cuda-version=11.4` that would automatically work on Windows and Linux whenever CUDA resides under `/usr/local/cuda-x.y`)
> > > 
> > > I wanted to add another strange observation. Today I installed the latest CUDA (11.5) and clang ("14") under Ubuntu 20.04 inside WSL 2 on Windows 11 (equivalent to a regular Ubuntu, I would say). I had to explicitly provide the `-L` flag despite using `--cuda-path` (else the linker fails) which sounds relatively strange to me:
> > > ```
> > > clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart -L /usr/local/cuda-11.5/lib64 hello.c -o hello
> > > ```
> > > 
> > > > First is the overhead it adds to compiler driver. While for most users running locally it's negligible, it would be noticed for the users who may have /usr/local mounted over NFS, which is not that unusual in institutional environments. Another thing to consider that it will be noticed by *all* such users, even if they don't use CUDA. E.g. all C++ compilations will be probing for those directories.
> > > 
> > > Are you saying that clang executes this portion of the code even when it knows that it's not compiling *.cu files? Why does it need to know anything at all about directories related to CUDA? (I'm curious, but there's no real need to answer, it's outside the scope of this ticket ;)
> > > 
> > > > For this patch, I propose to drop the 9.x and 8.x so we keep the number of probed paths under control.
> > > 
> > > Do you want me to upload another patch that only keeps versions 10.0 up to 11.5 on the list, and then open a new ticket in the tracker describing what the rework should be, and potentially start working towards that goal as a separate patch?
> > > It's mostly for the datacenter use.
> > I'm not sure about that. We have recently switched to CUDA 11.4 with old 460 drivers on regular desktop machines (CUDA 11.4 comes with 470 drivers) and everything works just fine (not using CUDA 11.4 features, naturally)
> > 
> > The docs specify the minimum version required:
> > CUDA 11.x	>= 450.80.02*
> > 
> > I think the "Enterprise" thing you are referring to only applies to the "3. Forward compatibility" section, not the "2. Minor version compatibility" section.
> > 
> > >  I had to explicitly provide the -L flag 
> > Yes, this is [[ https://llvm.org/docs/CompileCudaWithLLVM.html | documented ]]
> I'm not saying that cuda_compat is exclusively for the datacenter use, just that the datacenter is the primary audience.
> 
> Regular users are not as constrained in updating the drivers to a more recent version and are much more likely to use the drivers that are *not* supported by cuda_compat.
> The end result is that in practice "upgrade your drivers" is much more likely to work for most of the end users than "use cuda_compat" -- less moving parts this way and the result is more predictable. 
> 
> > The docs specify the minimum version required: CUDA 11.x >= 450.80.02*
> 
> It's complicated. See compatibility between CUDA versions and various driver versions:
> https://docs.nvidia.com/deploy/cuda-compatibility/#use-the-right-compat-package
> 
> It also has the downside of lack of interoperability with OpenGL, which may be needed by some users.
> 
> Anyways, we've got way off topic here. 
> we've got way off topic here.

Agreed, apologies for that and thanks for the discussion!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114326/new/

https://reviews.llvm.org/D114326