[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

Fri May 8 13:26:07 PDT 2020

tra added a subscriber: wash.
tra added a comment.

In D79526#2027470 <https://reviews.llvm.org/D79526#2027470>, @yaxunl wrote:

> For implicit host device functions, since they are not guaranteed to work in device compilation, we can only resolve them as if they are host functions. This causes asymmetry but implicit host device functions are originally host functions so it is biased toward host compilation in the beginning.

I don't think that the assertion that `implicit host device functions are originally host functions` is always true. While in practice most such functions may indeed come from the existing host code (e.g. the standard library), I don't see any inherent reason why they can't come from the code written for GPU. E.g. thrust is likely to have some implicitly HD functions in the code that was not intended for CPUs and your assumption will be wrong. Even if such case may not exist now, it would not be unreasonable for users to have such code on device. 
This overload resolution difference is observable and it will likely create new corner cases in convoluted enough C++ code.

I think we need something more principled than "happens to work for existing code".

> Only the original resolution guarantees no other issues.  For example, in the failed compilation in TF, some ctor of std::atomic becomes implicit host device function because it is constexpr. We should treated as wrong-sided in device compilation, but we should treated as same-sided in host compilation, otherwise it changes the resolution in host compilation and causes other issues.

It may be true for atomic, where we do need to have GPU-specific implementation. However, I can also see classes with constexpr constructors that are prefectly usable on both sides and do not have to be treated as the wrong-side.

TBH, I do not see any reasonable way to deal with this with the current implementation of how HD functions are treated. This patch and its base do improve things somewhat, but it all comes at the cost of further complexity and potentially paints us even deeper into a corner. Current behavior is already rather hard to explain.

Some time back @wash from NVIDIA was asking about improving HD function handling. Maybe it's time for all interested parties to figure out whether it's time to come up with a better solution. Not in this patch, obviously.

================
Comment at: clang/test/SemaCUDA/function-overload.cu:471-477
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
----------------
These tests only veryfy that the code compiled, but it does not guarantee that we've picked the correct overload.
You should give callees different return types and assign the result to a variable of intended type.  See `test_host_device_calls_hd_template() ` on line 341 for an example.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526