[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue May 5 14:04:34 PDT 2020


yaxunl added a comment.

In D77954#2021026 <https://reviews.llvm.org/D77954#2021026>, @tra wrote:

> It appears that re-landed b46b1a916d44216f0c70de55ae2123eb9de69027 <https://reviews.llvm.org/rGb46b1a916d44216f0c70de55ae2123eb9de69027> has created another compilation regression. I don't have a simple reproducer yet, so here's the error message for now:
>
>   llvm_unstable/toolchain/bin/../include/c++/v1/tuple:232:15: error: call to implicitly-deleted copy constructor of 'std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>'
>               : __value_(_VSTD::forward<_Tp>(__t))
>                 ^        ~~~~~~~~~~~~~~~~~~~~~~~~
>   llvm_unstable/toolchain/bin/../include/c++/v1/tuple:388:13: note: in instantiation of function template specialization 'std::__u::__tuple_leaf<0, std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>, false>::__tuple_leaf<std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>, void>' requested here
>               __tuple_leaf<_Uf, _Tf>(_VSTD::forward<_Up>(__u))...,
>               ^
>   llvm_unstable/toolchain/bin/../include/c++/v1/tuple:793:15: note: in instantiation of function template specialization 'std::__u::__tuple_impl<std::__u::__tuple_indices<0, 1>, std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>, std::__u::function<void ()>>::__tuple_impl<0, 1, std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>, std::__u::function<void ()>, std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>, std::__u::function<void ()>>' requested here
>               : __base_(typename __make_tuple_indices<sizeof...(_Up)>::type(),
>                 ^
>   llvm_unstable/toolchain/bin/../include/c++/v1/thread:297:17: note: in instantiation of function template specialization 'std::__u::tuple<std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>, std::__u::function<void ()>>::tuple<std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>, std::__u::function<void ()>, false, false>' requested here
>               new _Gp(std::move(__tsp),
>                   ^
>   ./third_party/eigen3/unsupported/Eigen/CXX11/src/ThreadPool/ThreadEnvironment.h:24:42: note: in instantiation of function template specialization 'std::__u::thread::thread<std::__u::function<void ()>, void>' requested here
>       EnvThread(std::function<void()> f) : thr_(std::move(f)) {}
>                                            ^
>   llvm_unstable/toolchain/bin/../include/c++/v1/memory:2583:3: note: copy constructor is implicitly deleted because 'unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct>>' has a user-declared move constructor
>     unique_ptr(unique_ptr&& __u) _NOEXCEPT
>     ^
>   1 error generated when compiling for sm_60.
>


For implicit `__host__ __device__` functions, they may be promoted by pragma but themselves may not be qualified as `__host__ __device__` functions.

Since they are promoted from host functions, they are good citizens in host compilation, but may incur diagnostics in device compilation, because their callees may be missing in device side. Since we cannot defer all the diagnostics, once such things happen, we are doomed.

So now we can understand why the previous behavior: that is, in a `__host__ __device__` function, same-side candidate is always preferred over wrong-sided candidate. However, `__device__ __host__` candidate is not preferred over wrong-sided candidate. On the other hand, their other properties take precedence. Only if all others are equal, `__device__ __host__` candidate is preferred over wrong-sided candidate.

I will put a workaround: In device compilation, in implicit `__device__ __host__` callers, I will keep the old behavior, that is, implicit `__device__ __host__` candidate has equal preference with wrong-sided candidate. By doing this, we will in most cases resolve the overloading the same way as if the callers and callees are host functions, therefore resolved the same way as in their expected environment. This will make sure: 1. we will not end up with no viable candidate 2. we will not have ambiguity, since we know it is resolvable in host compilation.

For explicit `__device__ __host__` functions, we do not need the workaround, since they are intended for host and device and are supposed to work for both host and device.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954





More information about the cfe-commits mailing list