[cfe-dev] [RFC] Block captured variables in OpenCL 2.X dynamic parallelism

Bekket McClane via cfe-dev cfe-dev at lists.llvm.org
Fri Sep 16 21:12:03 PDT 2016


Hi,
OpenCL 2.X dynamic parallelism in clang 3.9 leverages most of the
Objective-C's block structure to save captured variables from parent
kernel. However, there is a disadvantage on this design: The block literal
structure, which needs to be passed to child kernel by address later, is
allocated in parent kernel context by default. But parent kernels and child
kernels only shares the global address space, so we either need to put
block literal structures in global address space, which brings other
problems like memory management due to the large amount of block literals
potentially, or find another way to figure this out.

We propose a way that passes all the captured variables as function
arguments to the child kernel invoke function, which is the "main body"
function of child kernel. OpenCL 2.X spec doesn't allowed __local variables
to be captured into child kernel block, so we can almost assert that there
are only two kinds of captured variables: constants(i.e by-value variables)
and __global pointers. Also, OpenCL 2.X allows child kernels shared chunks
of memory which are passed as arguments in(and only in) __local pointer
type to child kernel block. Thus, if we insert all the captured variables
to the head of the child block invoke function argument list, which
originally contains only __local pointers if there are shared memory
chunks, we can not only conveniently distinguish captured variables from
shared memory pointers based on their types in the argument list, but also
remove the need of maintaining memory for block literal structures since we
don't pass it to child kernel anymore.

Nevertheless, captured variables still needed to be extracted inside most
of the __enqueue_kernel_XXX implementations. We came up with a design that
adds two new fields into the block literal struct: cap_num, which tells the
amount of captured variables, and cap_copy_helper, which is a function with
prototype:
    size_t cap_copy_helper(void* block_literal, unsigned int arg_index,
void* dest_memory)
The second parameter arg_index requires index of the desired captured
variables in block invoke function argument list mentioned above. This
helper would copy your desired variable value from block_literal to
dest_memory and return its size. Actually we had previously built a version
of helper function that still took block_literal and arg_index but return
the captured variables's pointer(in void* type) directly. In that approach
neither can we know the size of variable nor could we retrieve value from
the returning pointer due to the lack of type information and one can not
directly de-reference a void pointer either. The new cap_copy_helper
approach can solve both of the problems, although it seems to be more
suitable for copying all the captured variables adjacently into a chunk of
memory, I think it still can be used to fetch individual captured variable.

We have implemented a prototype for all of the design above except the
cap_copy_helper, but it could be finished soon. Also, discussions for
captured variables in advanced type like image_t are welcomed - since we
haven't covered those types yet.

Yours sincerely,

-- 
Bekket McClane
Department of Computer Science,
National Tsing Hua University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160917/e707a4d1/attachment.html>


More information about the cfe-dev mailing list