[PATCH] D78655: [HIP] Let lambda be host device by default

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Apr 25 07:56:15 PDT 2020


yaxunl marked 2 inline comments as done.
yaxunl added inline comments.


================
Comment at: clang/test/CodeGenCUDA/lambda.cu:16-26
+template<class F>
+__global__ void g(F f) { f(); }
+
+template<class F>
+void h(F f) { g<<<1,1>>>(f); }
+
+__device__ int a;
----------------
tra wrote:
> yaxunl wrote:
> > tra wrote:
> > > The test example may not be doing what it's seemingly supposed to be doing:
> > > https://cppinsights.io/s/3a5c42ff
> > > 
> > > `h()` gets a temporary host-side object which keeps the reference to `a` and that reference will actually point to the host-side shadow of the actual device-side `a`. When you get to execute `g` it's `this` may not be very usable on device side and thus `f.operator()` will probably not work.
> > > 
> > > Alas, we currently have no diagnostics for that kind of error.
> > > 
> > > Change it to a non-capturing lambda, perhaps?
> > It works.
> > 
> > We need to think about this in device compilation. In device compilation, global variable is a device variable, the lambda is a device host function, therefore the lambda is accessing the real a, not the shadow.
> > 
> > In the host compilation, the lambda is not really called, therefore it is not emitted.
> > 
> > I will update the lit test with these checks.
> Clang manages to see through to the initializer of `a`, but I'm not sure how much we can rely on this.
> In general, `f.operator()` for a capturing lambda needs to access captured variables via `this` which points to a temporary objects created and passed to `g` by the host. You can see it if you capture a local variable: https://godbolt.org/z/99389o
> 
> Anyways, it's an issue orthogonal to this patch. My concern is that tests are often used as an example of things that are OK to do, and capturing lambdas are a pretty big foot-shooting gun when used with CUDA. It's very easy to do wrong thing without compiler complaining about them.
> While accessing `a` does work, it appears to do so by accident, rather than by design. 
> 
> I'm fairly confident that I can hide the initializer with sufficiently complicated code, force clang to access `a` via `this` and make everything fail at runtime. IMO, what we have here is a 'happens to work' situation. I do not want to call it 'guaranteed to work' without making sure that it always does.
> 
> In order to demonstrate that lambda is host/device, you do not need it to be a capturing lambda. You can make it call an overloaded function with host and device variants and verify that the lambda works on host and device sides.
> 
> 
I added one more test, where a lambda function calls a template function which is overloaded with a host version and a device version. The lambda is called in both host function and in kernel. Test shows correct version of template function are emitted in host and device compilation.

I think it is not a surprise that the lambda function is able to resolve the host/device-ness of the callee correctly. We are doing resolution in a host device function and the two candidates are same-side vs wrong-side.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78655/new/

https://reviews.llvm.org/D78655





More information about the cfe-commits mailing list