[PATCH] D38134: [OpenCL] Emit enqueued block as kernel

Mon Sep 25 08:04:22 PDT 2017

Anastasia added a comment.

I think we should add a test case when the same block is both called and enqueued.

================
Comment at: lib/CodeGen/CGOpenCLRuntime.cpp:113
+
+llvm::Value *CGOpenCLRuntime::emitOpenCLEnqueuedBlock(CodeGenFunction &CGF,
+                                                      const Expr *E) {
----------------
yaxunl wrote:
> Anastasia wrote:
> > yaxunl wrote:
> > > Anastasia wrote:
> > > > I am not particularly in favour of duplicating CodeGen functionality as it typically has so many special cases that are hard to catch. Is this logic needed in order to pass to block literal information  that the block is enqueued?
> > > This code is needed to emit separate functions for a block which is directly called and also enqueued as a kernel. Since the kernel needs to have proper calling convention and ABI, it cannot be emitted as the same function as when the block is called directly. Since it is OpenCL specific code, I found it is cleaner to separate this code as member of CGOpenCLRuntime instead of fitting it into CGF.EmitBlockLiteral.
> > This part is replacing standard `EmitScalarExpr` call which is doing several things before calling into block generation. That's why I am a bit worried we are covering all the corner cases here.
> > 
> > So if we transform all blocks into kernels unconditionally we won't need this special handling then?
> > 
> > Do we generate two versions of the blocks now: one for enqueue and one for call?
> If we transform all blocks into kernels, we could simplify the logic. Probably will not need this special handling.
> 
> However, when the block is called directly, the first argument is a private pointer, when it is executed as a kernel, the first argument is a global pointer or a struct (for amdgpu target), therefore the emitted functions cannot be the same.
Would using generic address space for this first argument not work?

================
Comment at: test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl:3
+
+// CHECK: %[[S1:struct.__amdgpu_block_arg_t.*]] = type { [3 x i64], [1 x i8] }
+// CHECK: %[[S2:struct.__amdgpu_block_arg_t.*]] = type { [5 x i64], [1 x i8] }
----------------
yaxunl wrote:
> Anastasia wrote:
> > This struct is not identical to block literal struct?
> The LLVM type of the first argument of block invoke function is created directly with sorting and rearrangement. There is no AST type corresponding to it. However, the function codegen requires AST type of this argument. I feel it is unnecessary to create the corresponding AST type. For simplicity, just create an AST type with the same size and alignment as the LLVM type. In the function code, it will be bitcasted to the correct LLVM struct type and get the captured variables.
So `void ptr` won't be possible here? Since it is cast to a right struct inside the block anyway. Once again a block is a special type object with known semantic to compiler and runtime in contract to kernels that can be written with any arbitrary type of arguments.

I just don't like the idea of duplicating the block invoke function in case it's being both called and enqueued. Also the login in blocks code generation becomes difficult to understand. So I am wondering if we could perhaps create a separate kernel function (as a wrapper) for enqueue_kernel which would call a block instead. What do you think about it? I think the kernel prototype would be fairly generic as it would just have a block call inside and pass the arguments into it... We won't need to modify block generation then at all. 

https://reviews.llvm.org/D38134