[PATCH] D57488: [CUDA] add support for the new kernel launch API in CUDA-9.2+.

Thu Jan 31 10:37:58 PST 2019

tra added inline comments.

================
Comment at: clang/lib/CodeGen/CGCUDANV.cpp:239
+    CGM.Error(CGF.CurFuncDecl->getLocation(),
+              "Can't find declaration for cudaLaunchKernel()"); // FIXME.
+    return;
----------------
jlebar wrote:
> Unfixed FIXME?
Fixed the comment. :-)
There's not much we can do if we have no declaration for cudaLaunchKernel, so throwing the error here is the best we can do.

================
Comment at: clang/lib/CodeGen/CGCUDANV.cpp:260
+                              /*isVarArg=*/false),
+      "__cudaPopCallConfiguration");
+
----------------
jlebar wrote:
> I see lots of references to `__cudaPushCallConfiguration`, but this is the only reference I see to `__cudaPopCallConfiguration`.  Is this a typo?  Also are we supposed to emit matching push and pop function calls?  Kind of weird to do one without the other...
the `pop` part is indeed used only here. 
`Push` is something that takes user-specified parameters, so we get Sema to check them.
`Pop` is much simpler and does not have any direct user exposure, so we can just create and use it here.

As for matching, it is balanced. `Push` is called at the kernel launch site with the parameters of `<<<>>>` .`Pop` is done  in the host-side kernel stub where we retrieve those parameters and pass them to the CUDA runtime.

Essentially, push/pop are poor names for these functions are the nesting is never more than one level deep. We could've just stashed the arguments in a fixed buffer somewhere.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57488/new/

https://reviews.llvm.org/D57488