[cfe-dev] [LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

Tue Jan 4 11:59:19 PST 2011


> -----Original Message-----
> From: Peter Collingbourne [mailto:peter at pcc.me.uk]
> Sent: Tuesday, January 04, 2011 11:51 AM
> To: Villmow, Micah
> Cc: Anton Lokhmotov; cfe-dev at cs.uiuc.edu; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Function-level metadata for OpenCL (was Re:
> OpenCL support)
> 
> On Mon, Jan 03, 2011 at 12:52:02PM -0600, Villmow, Micah wrote:
> > Sorry for the late reply, as I have been on vacation for awhile.
> >
> > One method which I haven't seen mentioned is to separate out the
> kernel semantics from the function definition.
> >
> > All the kernel attribute does is specify that this function is an
> entry point to the device from the host. So, why not just create a
> separate entry point that is only callable by the host and everything
> from the device goes to the original entry point.
> >
> > For example, you have two functions and one calls the other:
> >
> > kernel foo() {
> > }
> > kernel bar() {
> >   foo();
> > }
> >
> > If you separate kernel function from the function body, then handling
> this becomes easy.
> >
> > You end up with four functions:
> >
> > kernel foo_kernel() {
> >  foo();
> > }
> >
> > foo() {
> > }
> >
> > kernel bar_kernel() {
> >  bar();
> > }
> >
> > bar(){
> >  foo();
> > }
> >
> > Then the issue is no longer a compilation problem, but just an entry
> point runtime issue. Instead of calling foo(), the runtime just calls
> foo_kernel() which handles all of the kernel setup issues and then
> calls the function body itself.
> >
> > This removes the need to have any metadata nodes in the IR and allows
> the kernel function to handle any setup issues for the specific device
> such as __local's, id/group calculations, memory offsets, etc...
> without having to impact the performance of a kernel calling another
> kernel.
> 
> I like this idea.  I think that the entry point should keep its
> original name though, while we rename the body, because the fact that
> we factor out the function body seems like an implementation detail.
> 
[Villmow, Micah] Well, if the entry point keeps its same name, and the body is renamed, then all of the call sites must also be modified to point to the body and not the entry point. Either way is fine, as long as it is something that I think everyone can agree with.
> To a certain extent it also removes the need to attach metadata for
> reqd_work_group_size etc at the function level (if required by the
> target), since this information can be attached to intrinsic calls
> within the entry point.  Example:
> 
> define void @foo() nounwind {
> entry:
>   call void @llvm.opencl.reqd.work.group.size(i32 4, i32 1, i32 1)
>   ; .. other setup ..
>   call void @foo_kernel()
>   ret void
> }
> 
> define internal void @foo_kernel() nounwind {
>   ; ... body ...
> }
> 
> These intrinsics wouldn't necessarily expand to target code directly,
> but would be used to generate something appropriate for the target in
> a similar fashion to the debug metadata intrinsics.  Also, by keeping
> the metadata in the entry point we guarantee that no more than one
> intrinsic call may appear within a function even if the inliner
> is used, allowing code generators to simply search for uses of the
> @llvm.opencl.reqd.work.group.size (or whatever) intrinsic to create
> a mapping from functions to attributes.
> 
[Villmow, Micah] Have you had any thoughts about bringing this up with Khronos about standardizing some of these ideas/conventions between the multiple vendors that are using LLVM for their OpenCL implementations?
> Thanks,
> --
> Peter