[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

Sun May 10 06:55:25 PDT 2020

yaxunl marked 4 inline comments as done.
yaxunl added a comment.

In D79526#2027695 <https://reviews.llvm.org/D79526#2027695>, @tra wrote:

> This one is just a FYI. I've managed to reduce the failure in the first version of this patch and it looks rather odd because the reduced test case has nothing to do with CUDA. Instead it appears to introduce a difference in compilation of regular host-only C++ code with `-x cuda` vs -x `c++`. I'm not sure how/why first version caused this and why the latest one fixes it. It may be worth double checking that we're not missing something here.
>
>   template <class a> a b;
>   auto c(...);
>   template <class d> constexpr auto c(d) -> decltype(0);
>   struct e {
>     template <class ad, class... f> static auto g(ad, f...) {
>       h<e, decltype(b<f>)...>;
>     }
>     struct i {
>       template <class, class... f> static constexpr auto j(f... k) { c(k...); }
>     };
>     template <class, class... f> static auto h() { i::j<int, f...>; }
>   };
>   class l {
>     l() {
>       e::g([] {}, this);
>     }
>   };
>

function j is an implicit host device function, it calls function c. There are two candidates: the first one is a host function, the second one is an implicit host device function.

Assuming this code is originally C++ code, the author intends the second to be chosen since it is a better match. The code will fail to compile if the first one is chosen since its return type cannot be deduced.

Now we compile it as CUDA code and constexpr functions automatically become implicit host device function. In host compilation we do not need special handling since host device candidates and same-sided candidates are both viable. There was a bug which used special handling of implicit host device function in host compilation, which was fixed by my last update.

Basically we only need special handling for implicit host device function in device compilation. In host compilation we always use the normal overloading resolution. For explicit host device functions we always use the normal overloading resolution.

================
Comment at: clang/include/clang/Sema/Sema.h:11663
+                                        bool IgnoreImplicitHDAttr = false,
+                                        bool *IsImplicitHDAttr = nullptr);
   CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView &Attrs);
----------------
tra wrote:
> Plumbing an optional output argument it through multiple levels of callers as an output argument is rather hard to follow, especially considering that it's not set in all code paths. Perhaps we can turn IsImplicitHDAttr into a separate function and call it from isBetterOverloadCandidate().
will do

================
Comment at: clang/test/SemaCUDA/function-overload.cu:471-477
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
----------------
tra wrote:
> These tests only veryfy that the code compiled, but it does not guarantee that we've picked the correct overload.
> You should give callees different return types and assign the result to a variable of intended type.  See `test_host_device_calls_hd_template() ` on line 341 for an example.
they have different return types. The right one returns double and the wrong one returns void. If the wrong one is chosen, there is syntax error since the caller returns double.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526