[PATCH] D94732: [CUDA] Normalize handling of defauled dtor.

Wed Jan 20 14:32:42 PST 2021

tra added inline comments.

================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:15162-15170
+  bool SkipDtorChecks = VD->getType()->isArrayType();
+
+  // CUDA: Skip destructor checks for host-only variables during device-side
+  // compilation
+  SkipDtorChecks |=
+      (LangOpts.CUDAIsDevice && VD->hasGlobalStorage() &&
+       !(VD->hasAttr<CUDADeviceAttr>() || VD->hasAttr<CUDAConstantAttr>() ||
----------------
rsmith wrote:
> tra wrote:
> > rsmith wrote:
> > > Is this safe? What happens if the destructor for the variable is a template, and instantiating that template results in a reference to a device function? Eg:
> > > 
> > > ```
> > > template<typename T> __device__ void f() {}
> > > template<typename T> struct A {
> > >   ~A() { f<<<>>>(); }
> > > };
> > > A a;
> > > ```
> > This is business as usual -- we catch it during host compilation, where `a` is instantiated.
> > 
> > ```
> > h.cu:3:10: error: no matching function for call to 'f'
> >   ~A() { f<T>(); }
> >          ^~~~
> > h.cu:5:8: note: in instantiation of member function 'A<int>::~A' requested here
> > A<int> a;
> >        ^
> > h.cu:1:51: note: candidate function not viable: call to __device__ function from __host__ function
> > template<typename T> __attribute__((device)) void f() {}
> > 
> > 1 error generated when compiling for host.
> > ```
> > 
> > If it were a `__device__ A<int> a;` , then we catch it during GPU compilation and also complain that we can't have dynamic initializers.
> > 
> Sorry, testcase wasn't quite right; I meant for `f` to be `__global__` not `__device__` so that the kernel call to it works. Fixed example:
> 
> ```
> extern "C" int cudaConfigureCall(int a, int b);
> template<typename T> __attribute__((__global__)) void f(T) {}
> template<typename T> struct A {
>   ~A() { f<<<1, 1>>>(T()); }
> };
> A<int> a;
> ```
> 
> I think that this is valid. In order for it to work, we need to trigger instantiation of `f<int>` on the device side of the compilation. In order to do that, we need to trigger instantiation of `A<int>::~A()`, so we need to mark it referenced on the device side. (This is, I think, in line with the general principle that we want to do the same template instantiations of host functions on both sides of the compilation, so that both sides agree on which kernel functions are referenced.)
You're right.  To think of it this particular change is not needed at all any more. The real issue is fixed by the better selection of the usual deallocator. We do not need to skip dtor checks here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94732/new/

https://reviews.llvm.org/D94732