[PATCH] D37822: [OpenCL] Clean up and add missing fields for block struct

Thu Sep 21 15:53:18 PDT 2017

yaxunl added a comment.

In https://reviews.llvm.org/D37822#877903, @Anastasia wrote:

> In https://reviews.llvm.org/D37822#877572, @yaxunl wrote:
>
> > In https://reviews.llvm.org/D37822#873876, @Anastasia wrote:
> >
> > > In https://reviews.llvm.org/D37822#872446, @yaxunl wrote:
> > >
> > > > In https://reviews.llvm.org/D37822#872291, @Anastasia wrote:
> > > >
> > > > > Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.
> > > >
> > > >
> > > > The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
> > > >  the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
> > > >  allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.
> > > >
> > > > Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
> > > >  the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
> > > >  requirement.
> > > >
> > > > If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
> > > >  target specific use. And add a target hook to allow target fill the reserved space, e.g.
> > > >
> > > >   struct __opencl_block_literal {
> > > >     int total_size;
> > > >     int align;
> > > >     __generic void *invoke;
> > > >     int target_reserved_size; /* round up to 4 bytes */
> > > >     int target_reserved[];
> > > >     /* captures */
> > > >   };
> > > >
> > >
> > >
> > > I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?
> >
> >
> > If we decide to add target reserved fields, I can add target hooks to fill these fields. However I would suggest to leave this for future since I don't see there is need for other fields for now.
>
>
> I could imagine it can be usefull for some vendor implementations.

> 
> 
>>> However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine `enqueue_kernel` would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?
>> 
>> enqueue_kernel needs to pass the block struct to the kernel. Let's assume it does this by copying the block struct to a buffer. If enqueue_kernel does not know the alignment of the struct, it can only put it at an arbitrary address in the buffer. Then the kernel has to copy the struct to an aligned private memory and load the fields. However, if the enqueued_kernel knows the alignment of the struct, it can put it at an address satisfying the alignment. Then the kernel can load the fields directly from the buffer, skips the step of copying to an aligned private memory. Therefore, alignment of the block struct is usually a useful information for enqueue_kernel. I think that's why in the SPIRV spec OpEnqueueKernel requires an alignment operand for the block context.
> 
> Ok, I just think in C if you use `malloc` to obtain a pointer to some memory location it doesn't take any alignment information. Then you can use the pointer to copy any data including the struct into the location its pointed to. And the pointer can be used later on correctly. I think the alignment is deduced in this case from the type or the size of an object. Do you know where the alignment information is used for SPIRV call? Also how is the block represented in SPIRV?

If you just use malloc and put your struct in it, there is no guarantee that your struct is aligned at the required alignment, then your kernel cannot load a field directly from that memory. For example, if your first field is an int and the instruction can only load an int from an addr aligned at 4, and your malloc'ed addr is aligned at 1, then you cannot load that int directly. Instead, you need to copy the 4 bytes to an addr aligned at 4, then use that instruction to load it. If you use posix_memalign to get an aligned buffer, then your kernel can generate more efficient code.

OpEnqueueKernel instruction in SPIR-V is for representing OpenCL enqueue_kernel. In SPIR-V block is represented by block invoke function. When enqueue_kernel is translated to OpEnqueueKernel, it is required to provide block invoke function, block context, the size and alignment of the block context.

https://reviews.llvm.org/D37822