[PATCH] D78655: [HIP] Let lambda be host device by default

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Apr 24 14:05:31 PDT 2020


tra added inline comments.


================
Comment at: clang/test/CodeGenCUDA/lambda.cu:16-26
+template<class F>
+__global__ void g(F f) { f(); }
+
+template<class F>
+void h(F f) { g<<<1,1>>>(f); }
+
+__device__ int a;
----------------
yaxunl wrote:
> tra wrote:
> > The test example may not be doing what it's seemingly supposed to be doing:
> > https://cppinsights.io/s/3a5c42ff
> > 
> > `h()` gets a temporary host-side object which keeps the reference to `a` and that reference will actually point to the host-side shadow of the actual device-side `a`. When you get to execute `g` it's `this` may not be very usable on device side and thus `f.operator()` will probably not work.
> > 
> > Alas, we currently have no diagnostics for that kind of error.
> > 
> > Change it to a non-capturing lambda, perhaps?
> It works.
> 
> We need to think about this in device compilation. In device compilation, global variable is a device variable, the lambda is a device host function, therefore the lambda is accessing the real a, not the shadow.
> 
> In the host compilation, the lambda is not really called, therefore it is not emitted.
> 
> I will update the lit test with these checks.
Clang manages to see through to the initializer of `a`, but I'm not sure how much we can rely on this.
In general, `f.operator()` for a capturing lambda needs to access captured variables via `this` which points to a temporary objects created and passed to `g` by the host. You can see it if you capture a local variable: https://godbolt.org/z/99389o

Anyways, it's an issue orthogonal to this patch. My concern is that tests are often used as an example of things that are OK to do, and capturing lambdas are a pretty big foot-shooting gun when used with CUDA. It's very easy to do wrong thing without compiler complaining about them.
While accessing `a` does work, it appears to do so by accident, rather than by design. 

I'm fairly confident that I can hide the initializer with sufficiently complicated code, force clang to access `a` via `this` and make everything fail at runtime. IMO, what we have here is a 'happens to work' situation. I do not want to call it 'guaranteed to work' without making sure that it always does.

In order to demonstrate that lambda is host/device, you do not need it to be a capturing lambda. You can make it call an overloaded function with host and device variants and verify that the lambda works on host and device sides.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78655/new/

https://reviews.llvm.org/D78655





More information about the cfe-commits mailing list