[Openmp-commits] [PATCH] D130371: [Libomptarget] Don't report lack of CUDA devices

Fri Jul 22 10:13:52 PDT 2022

jdenny created this revision.
jdenny added reviewers: jdoerfert, jhuber6, RaviNarayanaswamy, tianshilei1992, JonChesterfield.
jdenny added a project: OpenMP.
Herald added subscribers: kosarev, mattd, yaxunl.
Herald added a project: All.
jdenny requested review of this revision.
Herald added a subscriber: sstefan1.

Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics
about a lack of CUDA devices before an application runs:

  $ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c
  $ ./a.out
  CUDA error: Error returned from cuInit
  CUDA error: no CUDA-capable device is detected
  Hello World: 4

This can happen when the CUDA plugin was built but all CUDA devices
are currently disabled in some manner, perhaps because
`CUDA_VISIBLE_DEVICES` is set to the empty string.  As shown in the 
above example, it can even happen when we haven't compiled the 
application for offloading to CUDA.

The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp`
appears to be intended to handle this case, and it chooses not to
write a diagnostic to stderr unless debugging is enabled:

  if (NumberOfDevices == 0) {
    DP("There are no devices supporting CUDA.\n");
    return;
  }

The problem is that the above code is never reached because the 
earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`.  This patch handles
that `cuInit` case in the same manner as the above code handles the 
`NumberOfDevices == 0` case.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D130371

Files:
  openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
  openmp/libomptarget/plugins/cuda/src/rtl.cpp
  openmp/libomptarget/test/offloading/cuda_no_devices.c


Index: openmp/libomptarget/test/offloading/cuda_no_devices.c
===================================================================

--- /dev/null
+++ openmp/libomptarget/test/offloading/cuda_no_devices.c
@@ -0,0 +1,20 @@
+// The CUDA plugin used to complain on stderr when no CUDA devices were enabled,
+// and then it let the application run anyway.  Check that there's no such
+// complaint anymore, especially when the user isn't targeting CUDA.
+
+// RUN: %libomptarget-compile-generic
+// RUN: env CUDA_VISIBLE_DEVICES= \
+// RUN:   %libomptarget-run-generic 2>&1 | %fcheck-generic
+
+#include <stdio.h>
+
+// CHECK-NOT: {{.}}
+//     CHECK: Hello World: 4
+// CHECK-NOT: {{.}}
+int main() {
+  int x = 0;
+  #pragma omp target teams num_teams(2) reduction(+:x)
+  x += 2;
+  printf("Hello World: %d\n", x);
+  return 0;
+}
Index: openmp/libomptarget/plugins/cuda/src/rtl.cpp
===================================================================
--- openmp/libomptarget/plugins/cuda/src/rtl.cpp
+++ openmp/libomptarget/plugins/cuda/src/rtl.cpp
@@ -507,6 +507,10 @@
       DP("Failed to load CUDA shared library\n");
       return;
     }
+    if (Err == CUDA_ERROR_NO_DEVICE) {
+      DP("There are no devices supporting CUDA.\n");
+      return;
+    }
     if (!checkResult(Err, "Error returned from cuInit\n")) {
       return;
     }
Index: openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
===================================================================
--- openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
+++ openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h
@@ -27,6 +27,7 @@
 typedef enum cudaError_enum {
   CUDA_SUCCESS = 0,
   CUDA_ERROR_INVALID_VALUE = 1,
+  CUDA_ERROR_NO_DEVICE = 100,
   CUDA_ERROR_INVALID_HANDLE = 400,
 } CUresult;
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D130371.446863.patch
Type: text/x-patch
Size: 1765 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20220722/80a6c62b/attachment.bin>