[PATCH] D42513: [CUDA] Added partial support for CUDA-9.1

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 25 17:33:26 PST 2018


tra added inline comments.


================
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:32
+
+#define __DEVICE__ static __device__ __forceinline__
+// There are number of functions that became compiler builtins in CUDA-9 and are
----------------
jlebar wrote:
> tra wrote:
> > jlebar wrote:
> > > I don't think we should need `__forceinline__`?  I'm sure they had that in their headers, but clang doesn't rely on its inliner for correctness, unlike some *other* compilers.  :)
> > > 
> > > I'm also not sure about `static`.  I think we mark all CUDA code as internal, so it shouldn't be necessary?  If they're not static, they should be marked as `inline`, though.
> > The idea is to make those wrappers always inlined, so they are effectively replaced with a call to __nv_* libdevice function. Whether the callee itself get inlined is controlled by the attributes in the libdevice bitcode.
> > 
> > `__attribute__((always_inline))` which `__forceinline__` expands to is handled differently from regular inline. We have Transforms/IPO/AlwaysInliner.cpp which handles them with -O1. Higher opt levels will presumably inline these functions, too. Without always_inline these wrappers will remain uninlined at -O1. Replacing it with just `inline` will work OK (only affects -O1 in a minor way), but `__forceinline__` is a bit closer to what I want from these wrappers.
> > 
> > As for the `static`, unused non-static functions will be emitted by clang -- as far as clang is concerned they are externally visible. I've checked and we do end up emitting them in PTX, even though they don't have .visible directive (I think that's what you referred to when you mentioned 'mark as internal'). I don't think we want that.
> > 
> > I'll keep this combination for now. We can adjust it later, if necessary.
> Can you help me understand what makes you want this particular inlining behavior (i.e., always inline, even at -O0) for these wrappers?
> 
> Does e.g. libc++ do the same thing for the functions it has that are thin wrappers around e.g. libc functions?
> 
> > As for the static, unused non-static functions will be emitted by clang -- as far as clang is concerned they are externally visible. I've checked and we do end up emitting them in PTX, even though they don't have .visible directive (I think that's what you referred to when you mentioned 'mark as internal'). I don't think we want that.
> 
> Agree.  This is kind of unfortunate, though -- it's not How You Write Headers.  Could we add a comment?
libc++ does seem to use _LIBCPP_ALWAYS_INLINE for a lot of wrapper-like functions:

E.g. libcxx/include/__locale:
```
535:    _LIBCPP_ALWAYS_INLINE
536-    const char_type* tolower(char_type* __low, const char_type* __high) const
537-    {
538-        return do_tolower(__low, __high);
539-    }
```




https://reviews.llvm.org/D42513





More information about the cfe-commits mailing list