[Openmp-commits] [openmp] r366224 - [OPENMP][NVPTX]Fixed checks for cuda versions.

Tue Jul 16 09:07:10 PDT 2019

Author: abataev
Date: Tue Jul 16 09:07:10 2019
New Revision: 366224

URL: http://llvm.org/viewvc/llvm-project?rev=366224&view=rev
Log:
[OPENMP][NVPTX]Fixed checks for cuda versions.

Summary:
We used CUDART_VERSION macro to check for the installed cuda version
but this macro is defined in cuda_runtime_api.h, which is not used by
project. Better to use CUDA_VERSION macro, which is defined in cuda.h.
Also, added the check if this macro is defined. If macro is undefined,
there is something wrong with the cuda configuration and we should not
continue the compilation.
This also fixes problems with runtime building in cuda 10+.

Reviewers: grokos

Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D64648

Modified:
    openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h

Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h?rev=366224&r1=366223&r2=366224&view=diff
==============================================================================

--- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h (original)
+++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h Tue Jul 16 09:07:10 2019
@@ -48,7 +48,9 @@
 // Macros for Cuda intrinsics
 // In Cuda 9.0, the *_sync() version takes an extra argument 'mask'.
 // Also, __ballot(1) in Cuda 8.0 is replaced with __activemask().
-#if defined(CUDART_VERSION) && CUDART_VERSION >= 9000
+#ifndef CUDA_VERSION
+#error CUDA_VERSION macro is undefined, something wrong with cuda.
+#elif CUDA_VERSION >= 9000
 #define __SHFL_SYNC(mask, var, srcLane) __shfl_sync((mask), (var), (srcLane))
 #define __SHFL_DOWN_SYNC(mask, var, delta, width)                              \
   __shfl_down_sync((mask), (var), (delta), (width))
@@ -58,7 +60,7 @@
 #define __SHFL_DOWN_SYNC(mask, var, delta, width)                              \
   __shfl_down((var), (delta), (width))
 #define __ACTIVEMASK() __ballot(1)
-#endif
+#endif // CUDA_VERSION
 
 #define __SYNCTHREADS_N(n) asm volatile("bar.sync %0;" : : "r"(n) : "memory");
 // Use original __syncthreads if compiled by nvcc or clang >= 9.0.