[PATCH] D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter

Fri Feb 15 21:28:25 PST 2019

yaxunl added a comment.

In D56411#1400251 <https://reviews.llvm.org/D56411#1400251>, @rjmccall wrote:

> It is totally unreasonable, at the time you are resolving a template argument, to investigate how the corresponding template parameter is used within the template and use that to shape how the template argument is resolved.  That is simply not how the C++ template model works.  Given that CODA doesn't distinguish between host and device functions in the type system, if you are going to have a rule here, it has to be based on, at most, (1) the current semantic context (which may not even be a function), (2) the template being specialized, and (3) the declarations in the template-argument set.
>
> As I've said before on a previous patch, I think the *best* rule would be to recognize a hard difference between host and device function types, probably by making function types default to being host function types and requiring function pointers that can store device function pointers to be explicitly annotated.  However, that would not be source-compatible with ordinary CUDA, which is presumably unacceptable.
>
> The second-best rule would be to preserve compatibility by making an unannotated function type still be "unknown whether host or device", but to also allow the creation of explicitly host-only and device-only function types.  For source compatibility, DREs to functions would formally have the unknown function type.  Converting a pointer to an unknown function into a pointer to a host function would do some basic checking on the operand expression (basically to verify that it's not obviously a device function), and resolving an overload set in the context of a host-only function pointer type would do the obvious filtering.
>
> Otherwise, you're going to be stuck where you are right now, which is that you're messing around with heuristics because somebody added a language extension that isn't actually very well thought out.  But if that's what you have to do, it's what you have to do.  For this specific question, where you are trying to resolve an overloaded template argument, I think there are basically two sensible options.
>
> - You can filter the overloads by the host-ness of the template.  This makes some sense, because it's probably most likely that a function template that takes a function as a template argument is going to call it — but not necessarily, because it very well might decide instead to call over to the device to invoke the function.  Also, not all templates have a "host-ness"; that's pretty much exclusive to function templates.
> - You can filter the overload by the host-ness of the current context.  Again, this makes some sense because it's likely that a host function is trying to pass down a host function — but again, it's not hard to think of exceptions.  And again, this has the problem that the context isn't always a function and so doesn't necessarily have a host-ness. Any sort of additional template-specific guidance seems doomed to gradually turn into the second design I mentioned above where you have the ability to be more specific about function types.
>
>   For the time being, this is still a Clang extension, and while Artem mentioned that NVIDIA is investigating it, that's presumably still an investigation and we still have an opportunity to shape their thinking.  So I would really recommend taking the second approach, or maybe even trying to convince them to take the first.  (How common is higher-order programming on the device, anyway, that you can't break source compatibility for it?)  For this specific line of inquiry, that would probably mean not trying to automatically use any particular filter on the overload set but instead just relying on the programmer to annotation what kind of function they want.

I have seen important machine learning frameworks heavily using function type template parameters. If we make host-ness part of type system. Those templates expecting device function template parameters have to be rewritten, otherwise they won't compile. I don't think it is an easy task to persuade developers to make that change, since nvcc does not require that.

However, since this host-ness based overloading resolution is already in place and used by existing code, I do not want to break it. I consider your suggestion about host-ness based heuristic overloading resolution most viable for the current situation: take the host-ness of function templates as the first heuristic if the function under resolution is a function template argument, otherwise take the host-ness of the current context as the next heuristic.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56411/new/

https://reviews.llvm.org/D56411