[clang] [Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel (PR #115821)

Tue Dec 3 01:10:27 PST 2024

================
@@ -1085,8 +1085,10 @@ llvm::Value *CodeGenFunction::EmitBlockLiteral(const CGBlockInfo &blockInfo) {
       blockAddr.getPointer(), ConvertType(blockInfo.getBlockExpr()->getType()));
 
   if (IsOpenCL) {
-    CGM.getOpenCLRuntime().recordBlockInfo(blockInfo.BlockExpression, InvokeFn,
-                                           result, blockInfo.StructureType);
+    CGM.getOpenCLRuntime().recordBlockInfo(
+        blockInfo.BlockExpression, InvokeFn, result, blockInfo.StructureType,
+        CurGD && CurGD.isDeclOpenCLKernel() &&
+            (CurGD.getKernelReferenceKind() == KernelReferenceKind::Kernel));
----------------
rjmccall wrote:

That's one of the options, yeah. You'd always emit stubs as directly calling their kernels (so `_clang_ocl_kern_imp_caller_kern` would actually just call `caller_kern`, at least coming out of IRGen), and then you'd let LLVM decide when and whether to inline. The main disadvantage is that there might not already be code to emit calls to kernels; I imagine that's normally only done by the OpenCL runtime, and the ABI is probably pretty different from the normal call ABI.  But maybe that's not a big deal.

The second option is that you could emit stubs by making sure the kernel is emitted first and then directly cloning the kernel body into the stub.  Whether this is materially different from just inlining, I don't know, but it's an option. We do have code for this sort of thing already — we need it for emitting virtual varargs thunks in C++.

The third option is that you could emit kernels as directly calling their stubs. The advantage here is that all the calls are just normal calls that you definitely already have code to handle. The disadvantage is that you'd always be forcing the stub to be emitted, even in the 99% case that there are no other uses of it. It'll probably get reliably inlined away, but still, it's burning some extra compile time.

The last option is that you could emit kernels by emitting the stub and then directly cloning the function body into the kernel.

Up to you; they all have trade-offs. But I do think you need to not double-emit the kernel body.

https://github.com/llvm/llvm-project/pull/115821