[PATCH] D114326: Update the list of CUDA versions up to 11.5

Wed Nov 24 10:36:48 PST 2021

tra added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131
+  std::initializer_list<const char *> Versions = {
+      "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1",
+      "10.0", "9.2",  "9.1",  "9.0",  "8.0",  "7.5",  "7.0"};
----------------
tra wrote:
> mojca wrote:
> > kadircet wrote:
> > > looks like the list is getting big and hard to maintain. considering that this is done only once per compiler invocation (and we check for existence of directories down in the loop anyway). what about throwing in an extra directory listing to base-directories mentioned down below and populate `Candidates` while preserving the newest-version-first order?
> > I totally agree with the sentiment, and that was my initial thought as well, but with zero experience I was too scared to make any more significant changes.
> > 
> > I can try to come up with a new patch (that doesn't need further maintenance whenever a new CUDA version gets released) if that's what you are suggesting. I would nevertheless merge this one, and prepare a new more advanced patch separately, but that's finally your call.
> > 
> > What's your suggestion about D.SysRoot + "Program Files/..."? At the time when this function gets called it looks like D.SysRoot is empty (or at least my debugger says so) and in my case it resolves to D: while the CUDA support is installed under C:.
> > 
> > Is there any special LLVM-specific/preferrable way to iterate through directories?
> > 
> > (What I also miss a bit in the whole process in an option to simply say "I want CUDA 11.1" without the need to explicitly spell out the full path.)
> > 
> > If you provide me give some general guidelines, I'll prepare another, hopefully more future-proof patch.
> > 
> > (Side note: I'm not sure if I'm calling clang-format correctly, but if I call it, it keeps reformatting the rest of this file.)
> This whole list may no longer be particularly useful. The most common use case on Linux, AFAICT, is to install only one CUDA version using system-provided package manager.
> E.g. https://packages.ubuntu.com/focal/amd64/nvidia-cuda-toolkit/filelist
> 
> TBH, I'm tempted to limit autodetection to only that one system-default version and require user to use --cuda-path if they need something else.
I think on windows (I mean the windows environment itself, not WSL), CUDA installer sets an environment variable which could be used to detect the default CUDA version, so it may warrant a windows-specific way to find it. 

================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131
+  std::initializer_list<const char *> Versions = {
+      "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1",
+      "10.0", "9.2",  "9.1",  "9.0",  "8.0",  "7.5",  "7.0"};
----------------
mojca wrote:
> carlosgalvezp wrote:
> > mojca wrote:
> > > tra wrote:
> > > > tra wrote:
> > > > > mojca wrote:
> > > > > > kadircet wrote:
> > > > > > > looks like the list is getting big and hard to maintain. considering that this is done only once per compiler invocation (and we check for existence of directories down in the loop anyway). what about throwing in an extra directory listing to base-directories mentioned down below and populate `Candidates` while preserving the newest-version-first order?
> > > > > > I totally agree with the sentiment, and that was my initial thought as well, but with zero experience I was too scared to make any more significant changes.
> > > > > > 
> > > > > > I can try to come up with a new patch (that doesn't need further maintenance whenever a new CUDA version gets released) if that's what you are suggesting. I would nevertheless merge this one, and prepare a new more advanced patch separately, but that's finally your call.
> > > > > > 
> > > > > > What's your suggestion about D.SysRoot + "Program Files/..."? At the time when this function gets called it looks like D.SysRoot is empty (or at least my debugger says so) and in my case it resolves to D: while the CUDA support is installed under C:.
> > > > > > 
> > > > > > Is there any special LLVM-specific/preferrable way to iterate through directories?
> > > > > > 
> > > > > > (What I also miss a bit in the whole process in an option to simply say "I want CUDA 11.1" without the need to explicitly spell out the full path.)
> > > > > > 
> > > > > > If you provide me give some general guidelines, I'll prepare another, hopefully more future-proof patch.
> > > > > > 
> > > > > > (Side note: I'm not sure if I'm calling clang-format correctly, but if I call it, it keeps reformatting the rest of this file.)
> > > > > This whole list may no longer be particularly useful. The most common use case on Linux, AFAICT, is to install only one CUDA version using system-provided package manager.
> > > > > E.g. https://packages.ubuntu.com/focal/amd64/nvidia-cuda-toolkit/filelist
> > > > > 
> > > > > TBH, I'm tempted to limit autodetection to only that one system-default version and require user to use --cuda-path if they need something else.
> > > > I think on windows (I mean the windows environment itself, not WSL), CUDA installer sets an environment variable which could be used to detect the default CUDA version, so it may warrant a windows-specific way to find it. 
> > > On Windows this is certainly not the case. Unless the installation is changed manually, one always gets the new version installed into a new directory.
> > > 
> > > I really do need multiple versions on Windows (and the ability to pick an older one) if I want to compile a binary that works on someone else's computer (if I compile against the latest CUDA, users need "the latest" drivers that may sometimes not even be available for their machine).
> > > 
> > > (That said, at least when using CMake, the selection needs to be done by CMake anyway, and maybe CMake could be forced to specify the correct flag automatically.)
> > > 
> > > So even if the functionality gets removed from non-Windows platforms, it would be really nice to keep it for Windows.
> > > 
> > > Now, there are three "conflicting" feedbacks/suggestions above. I can try to improve support, but it would be really helpful to reach the consensus of what needs to be done before coding it.
> > > one always gets the new version installed into a new directory.
> > A similar thing happens on Linux.
> > 
> > > users need "the latest" drivers
> > Since CUDA 10.2, there's some "[[ https://docs.nvidia.com/deploy/cuda-compatibility/ | compatibility mode ]]" that allows to run newer CUDA on older drivers. As long as you are not using the latest features, of course.
> > 
> > I'm personally all up for removing redundancy and duplication. 
> I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under `/usr/local/cuda-11.x`.
> 
> That sounds significant enough to me to argue against the removal of versioned folders from search.
> I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under /usr/local/cuda-11.x.

OK, that's something that will be used often enough, so still need to deal with versioned directories. :-(

> That sounds significant enough to me to argue against the removal of versioned folders from search.

I'm not against probing for multiple versions in principle.

What I'm saying is that:
- blindly increasing the number of probed directories comes with a price and should be done cautiously.
- it may be a good time to revisit how we detect CUDA installations and make sure it makes sense now.

There are two considerations.

First is the overhead it adds to compiler driver. While for most users running locally it's negligible, it would be noticed for the users who may have /usr/local mounted over NFS, which is not that unusual in institutional environments. Another thing to consider that it will be noticed by *all* such users, even if they don't use CUDA. E.g. all C++ compilations will be probing for those directories.

The other issue is that we should differentiate between finding the canonical location of CUDA SDK, vs picking one out of many CUDA SDK versions that may be installed simultaneously. I'd argue that in case of having multiple versions compiler has no business picking one version over another (though we could make an attempt to use the most recent one as we do now). It's up to the user to explicitly specify the one they want. When we allow to pick one version out of many, it makes it way too easy for a user to end up with a mix of the default version and the version they want, all they need to do is to forget `--cuda-path` somewhere in their build.

For this patch, I propose to drop the 9.x and 8.x so we keep the number of probed paths under control.

================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131
+  std::initializer_list<const char *> Versions = {
+      "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1",
+      "10.0", "9.2",  "9.1",  "9.0",  "8.0",  "7.5",  "7.0"};
----------------
tra wrote:
> mojca wrote:
> > carlosgalvezp wrote:
> > > mojca wrote:
> > > > tra wrote:
> > > > > tra wrote:
> > > > > > mojca wrote:
> > > > > > > kadircet wrote:
> > > > > > > > looks like the list is getting big and hard to maintain. considering that this is done only once per compiler invocation (and we check for existence of directories down in the loop anyway). what about throwing in an extra directory listing to base-directories mentioned down below and populate `Candidates` while preserving the newest-version-first order?
> > > > > > > I totally agree with the sentiment, and that was my initial thought as well, but with zero experience I was too scared to make any more significant changes.
> > > > > > > 
> > > > > > > I can try to come up with a new patch (that doesn't need further maintenance whenever a new CUDA version gets released) if that's what you are suggesting. I would nevertheless merge this one, and prepare a new more advanced patch separately, but that's finally your call.
> > > > > > > 
> > > > > > > What's your suggestion about D.SysRoot + "Program Files/..."? At the time when this function gets called it looks like D.SysRoot is empty (or at least my debugger says so) and in my case it resolves to D: while the CUDA support is installed under C:.
> > > > > > > 
> > > > > > > Is there any special LLVM-specific/preferrable way to iterate through directories?
> > > > > > > 
> > > > > > > (What I also miss a bit in the whole process in an option to simply say "I want CUDA 11.1" without the need to explicitly spell out the full path.)
> > > > > > > 
> > > > > > > If you provide me give some general guidelines, I'll prepare another, hopefully more future-proof patch.
> > > > > > > 
> > > > > > > (Side note: I'm not sure if I'm calling clang-format correctly, but if I call it, it keeps reformatting the rest of this file.)
> > > > > > This whole list may no longer be particularly useful. The most common use case on Linux, AFAICT, is to install only one CUDA version using system-provided package manager.
> > > > > > E.g. https://packages.ubuntu.com/focal/amd64/nvidia-cuda-toolkit/filelist
> > > > > > 
> > > > > > TBH, I'm tempted to limit autodetection to only that one system-default version and require user to use --cuda-path if they need something else.
> > > > > I think on windows (I mean the windows environment itself, not WSL), CUDA installer sets an environment variable which could be used to detect the default CUDA version, so it may warrant a windows-specific way to find it. 
> > > > On Windows this is certainly not the case. Unless the installation is changed manually, one always gets the new version installed into a new directory.
> > > > 
> > > > I really do need multiple versions on Windows (and the ability to pick an older one) if I want to compile a binary that works on someone else's computer (if I compile against the latest CUDA, users need "the latest" drivers that may sometimes not even be available for their machine).
> > > > 
> > > > (That said, at least when using CMake, the selection needs to be done by CMake anyway, and maybe CMake could be forced to specify the correct flag automatically.)
> > > > 
> > > > So even if the functionality gets removed from non-Windows platforms, it would be really nice to keep it for Windows.
> > > > 
> > > > Now, there are three "conflicting" feedbacks/suggestions above. I can try to improve support, but it would be really helpful to reach the consensus of what needs to be done before coding it.
> > > > one always gets the new version installed into a new directory.
> > > A similar thing happens on Linux.
> > > 
> > > > users need "the latest" drivers
> > > Since CUDA 10.2, there's some "[[ https://docs.nvidia.com/deploy/cuda-compatibility/ | compatibility mode ]]" that allows to run newer CUDA on older drivers. As long as you are not using the latest features, of course.
> > > 
> > > I'm personally all up for removing redundancy and duplication. 
> > I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under `/usr/local/cuda-11.x`.
> > 
> > That sounds significant enough to me to argue against the removal of versioned folders from search.
> > I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under /usr/local/cuda-11.x.
> 
> OK, that's something that will be used often enough, so still need to deal with versioned directories. :-(
> 
> > That sounds significant enough to me to argue against the removal of versioned folders from search.
> 
> I'm not against probing for multiple versions in principle.
> 
> What I'm saying is that:
> - blindly increasing the number of probed directories comes with a price and should be done cautiously.
> - it may be a good time to revisit how we detect CUDA installations and make sure it makes sense now.
> 
> There are two considerations.
> 
> First is the overhead it adds to compiler driver. While for most users running locally it's negligible, it would be noticed for the users who may have /usr/local mounted over NFS, which is not that unusual in institutional environments. Another thing to consider that it will be noticed by *all* such users, even if they don't use CUDA. E.g. all C++ compilations will be probing for those directories.
> 
> The other issue is that we should differentiate between finding the canonical location of CUDA SDK, vs picking one out of many CUDA SDK versions that may be installed simultaneously. I'd argue that in case of having multiple versions compiler has no business picking one version over another (though we could make an attempt to use the most recent one as we do now). It's up to the user to explicitly specify the one they want. When we allow to pick one version out of many, it makes it way too easy for a user to end up with a mix of the default version and the version they want, all they need to do is to forget `--cuda-path` somewhere in their build.
> 
> 
> For this patch, I propose to drop the 9.x and 8.x so we keep the number of probed paths under control.
> Since CUDA 10.2, there's some "compatibility mode" that allows to run newer CUDA on older drivers.

This only works with very specific versions of the drivers and those are not very common on the end-user machines. It's mostly for the datacenter use.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114326/new/

https://reviews.llvm.org/D114326