[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

Justin Lebar via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 25 15:45:41 PST 2018


jlebar added a comment.

Gosh I think the chances we get through this without a copy-paste error are way too high.  I'm not sure what to do about it, though.  I don't have the focus to check this as carefully as it needs.

Perhaps we need to test this in the test-suite (eventually)?  We've seen that even when our own implementation is correct, sometimes nvptx breaks it, and so we need to test e2e *anyway*...



================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:125
+    if (Version >= CudaVersion::CUDA_90) {
       // CUDA-9 uses single libdevice file for all GPU variants.
       std::string FilePath = LibDevicePath + "/libdevice.10.bc";
----------------
Perhaps update to CUDA 9+


================
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32
+
+#define __DEVICE__ static __device__ __forceinline__
+// There are number of functions that became compiler builtins in CUDA-9 and are
----------------
I don't think we should need `__forceinline__`?  I'm sure they had that in their headers, but clang doesn't rely on its inliner for correctness, unlike some *other* compilers.  :)

I'm also not sure about `static`.  I think we mark all CUDA code as internal, so it shouldn't be necessary?  If they're not static, they should be marked as `inline`, though.


================
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:51
+__DEVICE__ void __brkpt() { asm volatile("brkpt;"); }
+__DEVICE__ void __brkpt(int __a) { __brkpt(); }
+__DEVICE__ unsigned int __byte_perm(unsigned int __a, unsigned int __b,
----------------
Sanity check: Ignoring the __a argument here is correct?


================
Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:143
 
-// We need decls for functions in CUDA's libdevice with __device__
-// attribute only. Alas they come either as __host__ __device__ or
-// with no attributes at all. To work around that, define __CUDA_RTC__
-// which produces HD variant and undef __host__ which gives us desided
-// decls with __device__ attribute.
-#pragma push_macro("__host__")
-#define __host__
-#define __CUDACC_RTC__
-#include "device_functions_decls.h"
-#undef __CUDACC_RTC__
+// Declare or define device-side functions that particular CUDA version relies
+// on but does (no longer) declare in its headers. E.g. some of the device-side
----------------
a particular (or "our particular", if you like)


================
Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:144
+// Declare or define device-side functions that particular CUDA version relies
+// on but does (no longer) declare in its headers. E.g. some of the device-side
+// functions that used to be implemented in a header in CUDA-8, became NVCC's
----------------
Sentence doesn't read right if you skip the parens.

But also: Our header provides *everything* from cuda/device_functions_decls.h, not just the things that were once in that header and, in newer CUDA versions, are no longer there, right?  I'm actually even more confused now that I read the big header, because here we say the header "defines or declares" these functions, but that suggests that for every function, it decides whether to define or to declare it, but that's not the case...


================
Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:155
+#undef __THROW
+#define __THROW
 
----------------
Should we be pushing/popping the macro?  That is, do we want this macro exposed to user code?


https://reviews.llvm.org/D42513





More information about the cfe-commits mailing list