[PATCH] D24715: [OpenCL] Block captured variables in dynamic parallelism - OpenCL 2.0

bekket mcclane via cfe-commits cfe-commits at lists.llvm.org
Mon Oct 3 02:47:19 PDT 2016


mshockwave removed a reviewer: bader.
mshockwave updated this revision to Diff 73254.
mshockwave added a comment.

@Anastasia
Sorry for late responding, I'd just attach a new version of patch that fixes block function use cases as normal lambda functions. But testing code is not included in this revision, we're still working on it.

About the questions you asked in the previous comment, I'm going to explain them from another aspect: How will one implement __enqueue_kernel_XXX? It might be classified into two categories:

1. Library implementation like libclc/libclcxx written in OpenCL-C
2. Implement builtins directly in compiler.

If we choose the first one, which most of people would do regarding its simplicity and flexibility, and we want to fetch captured variables inside the implementation of __enqueue_kernel_XXX, the possible approach would be:

  void* block_as_voidptr = (void*)arg_child_kernel;
  block_literal_ty *block = (block_literal_ty*)block_as_voidptr;
  block->capA;
  block->capB;

This seems promise, but what exactly `block_literal_ty` looks like? We all know that `block_literal_ty` would look similar to:

  typedef struct {
    /*
    * Fields of block header. 
    * e.g. isa, block_descriptor...
    */
  
    int capA;
    int capB;
    ...
  } block_literal_ty;

But since we're discussing a static type language, the definition of this struct must be known. However, the EXACT appearence of `block_literal_ty` would vary among programs, or even functions. That's the thing cap_copy_helper want to aid.

Of course there is another library approach: Keep the child kernel's invoke_function prototype untouched, pass block_literal variable(in void pointer type) as its first function argument. Since instructions for extracting captured variables had been generated during the codegen of invoke_function body. Also, we don't need to tackle any captured variables inside __enqueue_kernel_XXX. 
However, the OpenCL spec says that global address space is the only address space shared between parent and child kernel; and the block_literal variable itself, is allocated as private(stack) variable in parent kernel. So we need to copy the block_literal variable(not its pointer) into some global space. Nevertheless, OpenCL doesn't allow dynamic-sized memory in global space, so we need to define a block of static size memory, perhaps array, in our library implementation. Here is the place might require global memory management since static size implies potential risk of running out pre-allocated space.

Regarding the improvement proposed by us which "flatten" captured variables into invoke_function argument list and block_literal pointer wouldn't be passed as first argument(to invoke_function) anymore. The reason why it doesn't require global memory management is that we can retrieve captured variables with cap_num field and cap_copy_helper routine INSIDE __enqueue_kernel_XXX and passed those captures as arguments to child kernel, rather than saving block_literal variable globally and postpone the retrieving actions until invoke_function, the child kernel body.


https://reviews.llvm.org/D24715

Files:
  lib/CodeGen/CGBlocks.cpp
  lib/CodeGen/CGBlocks.h
  lib/CodeGen/CGExpr.cpp
  lib/CodeGen/CodeGenFunction.cpp
  lib/CodeGen/CodeGenFunction.h
  lib/CodeGen/CodeGenModule.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D24715.73254.patch
Type: text/x-patch
Size: 28561 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20161003/7b81aa95/attachment-0001.bin>


More information about the cfe-commits mailing list