[cfe-dev] How does Clang CUDA handle host device template instantiation with host or device only constructs?

Fri Oct 26 15:14:46 PDT 2018

CUDA has some complications today when we want to write to write generic __host__ __device__ code (e.g. __host__ __device__ template functions or __host__ __device__ member functions of template classes) that can be instantiated with parameters that cause the instantiation to use __host__ or __device__ only constructs.

Consider the following generic function:

template <typename Range, typename F>
__host__ __device__
void for_each(Range r, F f)
{
  for (auto e : r)
    f(e);
}

template <typename T>
__host__ void foo(T);

for_each(data, foo);
// Instantiates a __host__  __device__ template with a __host__ only
// function, so the instantiation is __host__ only

template <typename T>
__device__ void bar(T);

for_each(data, bar);
// Instantiates a __host__  __device__ template with a __device__ only
// function, so the instantiation is __device__

template <typename T>
__host__ __device__ void void foobar(T);

for_each(data, foobar);
// Instantiates a __host__  __device__ template with a __host__ __device__
// function, so the instantiation is __host__ __device__

With NVCC, when a __host__ __device__ template is instantiated with __host__ only or __device__ only entities, it is treated as described above, and an unnecessary warning is emitted.

There is a pragma to suppress this warning:

#pragma nv_exec_check_disable
template <typename Range, typename F>
__host__ __device__
void for_each(Range r, F f)
{
  for (auto e : r)
    f(e);
}

In Thrust, we decorate many __host__ __device__ template functions and __host__ __device__ member functions of template classes with this pragma.

Basically, NVCC does this:
* Check if the instantiation leads to any __host__ only or __device__ evaluations.
* If both __host__ only and __device__ only evaluations are found, it's a hard compilation error.
* If there are __host__ only evaluations, the instantiated function is __host__ only. Warning emitted.
* If there are __device__ only evaluations, the instantiate function is __device__ only. Warning emitted.

For template classes, the same checks are be performed, but only when a member function of said instantiation is actually used. This follows how template class instantiation works in general; for example, you can instantiate a std::vector with a move-only type, and call emplace_back on it, and everything will compile fine. But if you use a std::vector member that requires copyable types, such as push_back, you'd get a compilation error. 

What does Clang CUDA do? Does it emit a warning for instantiations of __host__ __device__ templates with __host__ or __device__ only constructs? Is this warning useful? (I have some examples indicating it is, but I want to hear what others think)

------------------------------------------------------
Bryce Adelstein Lelbach aka wash
ISO C++ Committee Member
CppCon and C++Now Program Chair
Thrust Maintainer, HPX Developer
CUDA Convert and Reformed AVX Junkie

Sleep is for the weak
------------------------------------------------------
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

[cfe-dev] How does Clang CUDA handle __host__ __device__ template instantiation with __host__ or __device__ only constructs?

[cfe-dev] How does Clang CUDA handle host device template instantiation with host or device only constructs?