[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

Sun May 10 21:55:21 PDT 2020

yaxunl added a comment.

In D79526#2027552 <https://reviews.llvm.org/D79526#2027552>, @tra wrote:

> In D79526#2027470 <https://reviews.llvm.org/D79526#2027470>, @yaxunl wrote:
>
> > For implicit host device functions, since they are not guaranteed to work in device compilation, we can only resolve them as if they are host functions. This causes asymmetry but implicit host device functions are originally host functions so it is biased toward host compilation in the beginning.
>
>
> I don't think that the assertion that `implicit host device functions are originally host functions` is always true. While in practice most such functions may indeed come from the existing host code (e.g. the standard library), I don't see any inherent reason why they can't come from the code written for GPU. E.g. thrust is likely to have some implicitly HD functions in the code that was not intended for CPUs and your assumption will be wrong. Even if such case may not exist now, it would not be unreasonable for users to have such code on device. 
>  This overload resolution difference is observable and it will likely create new corner cases in convoluted enough C++ code.

I agree currently it is possible to force a device function to be implicitly host device by pragma. However it is arguable whether we should have special handling of overload resolution in this case. We do special handling of overload resolution because we can not modify some system headers which are intended for host originally. If a function was originally device function, it is CUDA/HIP code and it should follow normal overloading resolution rule and should be fixed if issues occur when it is marked as a host device function.

> I think we need something more principled than "happens to work for existing code".
> 
>> Only the original resolution guarantees no other issues.  For example, in the failed compilation in TF, some ctor of std::atomic becomes implicit host device function because it is constexpr. We should treated as wrong-sided in device compilation, but we should treated as same-sided in host compilation, otherwise it changes the resolution in host compilation and causes other issues.
> 
> It may be true for atomic, where we do need to have GPU-specific implementation. However, I can also see classes with constexpr constructors that are prefectly usable on both sides and do not have to be treated as the wrong-side.

Before this patch (together with the reverted commit), the device host candidates are always treated with the same preference as wrong-sided candidates in device compilation, so a wrong-sided candidate may hide a viable host device candidate. This patch fixes that for most cases, including: 1. host compilation 2. explicit host device caller 3. explicit host device callee. Only in device compilation when an implicit host device caller calls an implicit host device callee we apply the special 'incorrect' overloading resolution rule. If the special handling causes undesirable effect on users code, users can either mark the caller or callee to be explicit host device to bypass the special handling.

> TBH, I do not see any reasonable way to deal with this with the current implementation of how HD functions are treated. This patch and its base do improve things somewhat, but it all comes at the cost of further complexity and potentially paints us even deeper into a corner. Current behavior is already rather hard to explain.
> 
> Some time back @wash from NVIDIA was asking about improving HD function handling. Maybe it's time for all interested parties to figure out whether it's time to come up with a better solution. Not in this patch, obviously.

This patch is trying to fix the incorrect overloading resolution rule about host device callee in host device caller. It should be favored over wrong-sided callee but currently it is not.

If we reject this patch, we have to bear with the incorrect overloading rule until a better fix is implemented.

The complexity introduced by this patch is that it needs to have special rule for implicit host device caller and implicit host device callee in device compilation, where implicit host device callee is not favored over wrong-sided callee to preserve the overloading resolution result as if they are both host callees. This is to allow some functions in system headers becoming implicitly host device functions without causing undeferrable diagnostics.

The complexity introduced in the compiler code is not significant: a new function Sema::IsCUDAImplicitHostDeviceFunction is introduced and used in isBetterOverloadCandidate to detect the special situation that needs special handling. The code for special handling is trivial.

The complexity introduced in the overloading resolution rule is somehow concerning.

Before this patch, the rule is: same sided candidates are favored over wrong sided candidates, but host device candidates have same preference as wrong sided candidates.

After this patch, the rule is: same sided candidates and host device candidates have the same preference over wrong-sided candidates, except implicit host device function in device compilation, which preserves original resolution.

The reason to have the exception is that the implicit host device caller may be in system headers which users cannot modify. In device compilation, favoring implicit host device candidates over host candidates may change the resolution results, which incurs diagnostics.

Alternative solution I can think of:

1. defer all diagnostics possibly incurred due to overloading resolution change. Since the host device function is invalid, it cannot really be used by device code. As long as it is not really emitted, it should be OK. However, this requires all the possible diagnostics incurred due to overloading resolution change to be deferred. This requires some significant changes since 1) the PartialDiagnosticBuilder currently has limited input types than the DiagnosticBuilder; 2) the diagnostic to be deferred may have accompanying notes which need to be deferred in coordination;  3) If there are control flow changes depending on whether diagnostics happen, they need to be modified so that compilation will continue.

2. change precedence of host-ness: If selection of a candidate will incur error, then it is not favored over host-ness, i.e. we would rather choose a wrong-sided candidate that does not cause other error, than choosing a implicit host device candidate that causes other error.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526