[PATCH] D56411: [CUDA][HIP][Sema] Fix template kernel with function as template parameter

Thu Feb 14 13:00:26 PST 2019

yaxunl added a comment.

In D56411#1398329 <https://reviews.llvm.org/D56411#1398329>, @rjmccall wrote:

> In D56411#1398328 <https://reviews.llvm.org/D56411#1398328>, @rjmccall wrote:
>
> > In D56411#1398291 <https://reviews.llvm.org/D56411#1398291>, @tra wrote:
> >
> > > >> That said, does CUDA have a general rule resolving `__host__` vs. `__device__` overloads based on context?  And does it allow overloading based solely on `__host__` vs. `__device__`?
> > >
> > > NVCC does not. Clang does. See https://goo.gl/EXnymm for the details.
> > >
> > > AFAICT, NVIDIA is starting to consider adopting Clang's approach:
> > >  http://lists.llvm.org/pipermail/cfe-dev/2018-November/060070.html (original message from Bryce apparently didn't make it to the cfe-dev archive)
> >
> >
> > Okay.  Probably the template-argument rule ought to be the same as the address-of-function rule, which I assume means that there's a final pass that resolves ambiguities in favor of functions that can be used from the current context, to the extent that that's meaningful.  It's hard to tell because that document does not appear to include a formal specification.
>
>
> Regardless, that has no effect on this patch.

The check for host/device to resolve template argument already exists in clang before this patch. This patch is trying to fix a bug in that check.
e.g.

  __device__ void f();
  __host__ void f();
  template<void (*F)()> __global__ void kernel() { F(); }
  __host__ void g() { kernel<f><<<1,1>>>(); }

Template kernel is trying to resove f, it is supposed to get `__device__ f` but it gets `__host__ f`, because
Sema::CheckCUDACall thinks the caller of f is g but actually the caller of f is the template kernel.

This check cannot be deferred to template instantiation since it is too late. It has to be done in
a constant evalucation context where template argument is checked. Since there is no existing way
to tell Sema::CheckCUDACall that clang is checking template argument, the template is passed through
a newly added member to ExpressionEvaluationContextRecord.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56411/new/

https://reviews.llvm.org/D56411