[PATCH] D140433: [Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Jan 3 15:46:05 PST 2023
tra added inline comments.
================
Comment at: clang/tools/nvptx-arch/CMakeLists.txt:19
+if (NOT CUDA_FOUND OR NOT cuda-library)
+ message(STATUS "Not building nvptx-arch: cuda runtime not found")
+ return()
----------------
Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API , It has nothing to do with the CUDA runtime.
Here, it's actually not even the libcuda.so itself that's not found, but it's stub.
I think a sensible error here should say "Failed to find stubs/libcuda.so in CUDA_LIBDIR"
================
Comment at: clang/tools/nvptx-arch/CMakeLists.txt:25
+
+set_target_properties(nvptx-arch PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
+target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
----------------
Does it mean that the executable will have RPATH pointing to CUDA_LIBDIR/stubs?
This should not be necessary. The stub shipped with CUDA comes as "libcuda.so" only. It's SONAME is libcuda.so.1, but there's no symlink with that name in stubs, so RPATH pointing there will do nothing. At runtime, dynamic linker will attempt to open libcuda.so.1 and it will only be found among the actual libraries installed by NVIDIA drivers.
================
Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:26
+#if !CUDA_HEADER_FOUND
+int main() { return 1; }
+#else
----------------
How do we distinguish "we didn't have CUDA at build time" reported here from "some driver API failed with CUDA_ERROR_INVALID_VALUE=1" ?
================
Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:34
+ const char *ErrStr = nullptr;
+ CUresult Result = cuGetErrorString(Err, &ErrStr);
+ if (Result != CUDA_SUCCESS)
----------------
One problem with this approach is that `nvptx-arch` will fail to run on a machine without NVIDIA drivers installed because dynamic linker will not find `libcuda.so.1`.
Ideally we want it to run on any machine and fail the way we want.
A typical way to achieve that is to dlopen("libcuda.so.1"), and obtain the pointers to the functions we need via `dlsym()`.
================
Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:63
+
+ printf("sm_%d%d\n", Major, Minor);
+ }
----------------
jhuber6 wrote:
> tianshilei1992 wrote:
> > Do we want to include device number here?
> For `amdgpu-arch` and here we just have it implicitly in the order, so the n-th line is the n-th device, i.e.
> ```
> sm_70 // device 0
> sm_80 // device 1
> sm_70 // device 2
> ```
NVIDIA GPU enumeration order is more or less arbitrary. By default it's arranged by "sort of fastest GPU first", but can be rearranged in order of PCI(e) bus IDs or in an arbitrary user-specified order using `CUDA_VISIBLE_DEVICES`. Printing compute capability in the enumeration order is pretty much all the user needs. If we want to print something uniquely identifying the device, we would need to pring the device UUID, similarly to what `nvidia-smi -L` does. Or PCIe bus IDs. In other words -- we can uniquely identify devices, but there's no such thing as inherent canonical order among the devices.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140433/new/
https://reviews.llvm.org/D140433
More information about the cfe-commits
mailing list