[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Jan 25 17:17:19 PST 2018
tra marked 2 inline comments as done.
tra added inline comments.
================
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32
+
+#define __DEVICE__ static __device__ __forceinline__
+// There are number of functions that became compiler builtins in CUDA-9 and are
----------------
jlebar wrote:
> I don't think we should need `__forceinline__`? I'm sure they had that in their headers, but clang doesn't rely on its inliner for correctness, unlike some *other* compilers. :)
>
> I'm also not sure about `static`. I think we mark all CUDA code as internal, so it shouldn't be necessary? If they're not static, they should be marked as `inline`, though.
The idea is to make those wrappers always inlined, so they are effectively replaced with a call to __nv_* libdevice function. Whether the callee itself get inlined is controlled by the attributes in the libdevice bitcode.
`__attribute__((always_inline))` which `__forceinline__` expands to is handled differently from regular inline. We have Transforms/IPO/AlwaysInliner.cpp which handles them with -O1. Higher opt levels will presumably inline these functions, too. Without always_inline these wrappers will remain uninlined at -O1. Replacing it with just `inline` will work OK (only affects -O1 in a minor way), but `__forceinline__` is a bit closer to what I want from these wrappers.
As for the `static`, unused non-static functions will be emitted by clang -- as far as clang is concerned they are externally visible. I've checked and we do end up emitting them in PTX, even though they don't have .visible directive (I think that's what you referred to when you mentioned 'mark as internal'). I don't think we want that.
I'll keep this combination for now. We can adjust it later, if necessary.
================
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:51
+__DEVICE__ void __brkpt() { asm volatile("brkpt;"); }
+__DEVICE__ void __brkpt(int __a) { __brkpt(); }
+__DEVICE__ unsigned int __byte_perm(unsigned int __a, unsigned int __b,
----------------
jlebar wrote:
> Sanity check: Ignoring the __a argument here is correct?
Yes. It's a leftover from the old versions and has been deprecated in CUDA-8:
```
__DEPRECATED__("Please use __brkpt() instead.") void brkpt(int c = 0)
```
CUDA-9.1 does not have it at all.
================
Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:144
+// Declare or define device-side functions that particular CUDA version relies
+// on but does (no longer) declare in its headers. E.g. some of the device-side
+// functions that used to be implemented in a header in CUDA-8, became NVCC's
----------------
jlebar wrote:
> Sentence doesn't read right if you skip the parens.
>
> But also: Our header provides *everything* from cuda/device_functions_decls.h, not just the things that were once in that header and, in newer CUDA versions, are no longer there, right? I'm actually even more confused now that I read the big header, because here we say the header "defines or declares" these functions, but that suggests that for every function, it decides whether to define or to declare it, but that's not the case...
I should rephrase that then.
`__clang_cuda_device_functions.h` does two jobs. It always includes `__clang_cuda_libdevice_declares.h` which provides **declarations** for the libdevice functions. Those are gone in CUDA-9.1.
It also provides **definitions** for the standard library and __* device-side functions that call libdevice.
I can explicitly include _clang_cuda_libdevice_declares.h here and update the comment to make more sense.
================
Comment at: clang/lib/Headers/__clang_cuda_runtime_wrapper.h:155
+#undef __THROW
+#define __THROW
----------------
jlebar wrote:
> Should we be pushing/popping the macro? That is, do we want this macro exposed to user code?
We do push __THROW at the beginning of this header and pop it closer to the end, so this redefinition is not visible outside.
https://reviews.llvm.org/D42513
More information about the cfe-commits
mailing list