[PATCH] D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter

Thu Feb 14 09:03:13 PST 2019

yaxunl added a comment.

In D56411#1398103 <https://reviews.llvm.org/D56411#1398103>, @rjmccall wrote:

> In D56411#1398097 <https://reviews.llvm.org/D56411#1398097>, @yaxunl wrote:
>
> > In D56411#1365878 <https://reviews.llvm.org/D56411#1365878>, @yaxunl wrote:
> >
> > > In D56411#1365745 <https://reviews.llvm.org/D56411#1365745>, @rjmccall wrote:
> > >
> > > > In D56411#1365727 <https://reviews.llvm.org/D56411#1365727>, @yaxunl wrote:
> > > >
> > > > > In D56411#1360010 <https://reviews.llvm.org/D56411#1360010>, @rjmccall wrote:
> > > > >
> > > > > > I think the diagnostic should come during instantiation when you find an evaluated use of a host function within a device function.
> > > > >
> > > > >
> > > > > It seems the body of function template is checked only during parsing of the definition of the template itself. When a function
> > > > >  template is instantiated, the body of the instantiated function is not checked again.
> > > >
> > > >
> > > > No, that's not correct.  However, it's checked somewhat differently, and it's possible that the existing diagnostic is not set up to fire along all common paths.  Try moving the diagnostic to `MarkFunctionReferenced`, and note that `OdrUse` will be `false` in all the unevaluated contexts.
> > >
> > >
> > > You are right. After I disable current diagnostic, I saw PerformPendingInstantiations at the end of parsing the TU, where the AST of the instantiated function is iterated and MarkFunctionReferenced is called. I will try to fix my patch as suggested. Thanks.
> >
> >
> > I got one concern. If we want to do overload resolution of function type template argument based on host or device, we need to do that before template instantiation, right?
> >
> > e.g. we have two functions having the same name f and type, but one is `__host__` and the other is `__device__`, and we pass it as a template argument to a template function g. We want to choose `__device__ f` if g itself is `__device__` and `__host__ f` if g itself is `__host__`. If we want to do this we have to do the check before template instantiation, right?
>
>
> Yes, you would need to check that when resolving the overload to a single declaration.  That would be separate from diagnosing uses.
>
> That said, does CUDA have a general rule resolving `__host__` vs. `__device__` overloads based on context?  And does it allow overloading based solely on `__host__` vs. `__device__`?

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-declaration-specifiers does not talk about that.

Experimenting with nvcc shows that two functions cannot differ only by host/device attr, otherwise it is treated as redefinition of one function.

So I withdraw my concern.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56411/new/

https://reviews.llvm.org/D56411