[PATCH] D36410: [OpenCL] Handle taking address of block captures
Yaxun Liu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Sep 5 14:26:48 PDT 2017
yaxunl added a comment.
In https://reviews.llvm.org/D36410#856907, @bader wrote:
> In https://reviews.llvm.org/D36410#856716, @yaxunl wrote:
>
> > The captured variable is still passed by value. The address taking is on the duplicate of the captured variable, not on the original variable.
>
>
> In this case address of captured variables should point to the private address space, as function parameters reside in private address space (see "6.5 Address Space Qualifiers" paragraph 3).
> This makes unclear to me how capturing works for variables declared in the address spaces other than private (e.g. `local int a;`).
Sorry I was not accurate when saying the captured variables are passed by value.
The captured variable is emitted as a member of the block context structure. There are two situations:
1. the block is called as a normal block.
In this case , the block context is in stack and passed indirectly to the block invoke function, so taking address of captured variable results in a private pointer.
3. the block is called through enqueue_kernel as a kernel.
In this case the address space of block context depends on how the target pass the block context to the kernel.
Some target may pass the block context struct directly, in this case taking address of captured variable results in a private pointer.
Some target may pass the block context struct by a pointer to global memory, in this case taking address of captured variable results in a global pointer.
Unless the target is able to handle indirect byval kernel argument, the block invoke function will ends up with different ISA's when called as a normal
block and called as a device-enqueued kernel.
One possible solution may be:
1. when emitting code for the definition of the block, assuming it is a normal block and let captured variables have private address space.
2. when emitting code for enqueue_kernel,
emit a target-specific version of the block invoke function again which uses proper kernel ABI and with kernel metadata. Let the captured variables have target-specific address space.
However, since this may take some time and efforts to implement, I think the current patch is acceptable as a temporary measure.
https://reviews.llvm.org/D36410
More information about the cfe-commits
mailing list