[cfe-dev] Function-level metadata for OpenCL (was Re: OpenCL support)
David Neto
dneto.llvm at gmail.com
Mon Dec 20 11:47:37 PST 2010
On Fri, Dec 17, 2010 at 5:16 PM, Nick Lewycky <nlewycky at google.com> wrote:
> Being discardable is a design point of metadata. You might add something
> else to support this, but it won't be metadata.
> Why are you trying to preserve "kernel"-ness into the LLVM IR? What
> semantics does it have? What does __kernel actually mean to the optimizers
> and code generator?
> Could you just make __kernel mean "externally visible" and undecorated
> functions be "linkonce_odr"? If that's not enough, could you swing it around
> and maintain single named metadata node with a list of functions that are
> marked __kernel?
> Nick
>>
>> I do have a concern though with the semantics of the inliner when it
>> needs to inline a function with metadata. One possibility would be to
>> discard the callee's metadata, or somehow merge it with the caller's.
>> Discarding seems like the right solution for OpenCL and a good starting
>> point (in future we may wish to add attributes to metadata nodes like
>> the 'appending' linkage for globals) but sounds like something that
>> should be discussed first.
>
Regarding linkage:
A __kernel function is externally visible. It is callable from the
user program which is logically a separate compilation unit; and from
other functions in its own compilation unit.
The non-kernel functions have private linkage, I believe: they are
only callable by other functions in the same compilation unit.
However, a __kernel behaves differently when called from the user
program vs. another function in the compilation unit. In OpenCL the
user program can invoke a kernel as an NDRange, i.e. with an implied
loop around it to iterate over an index space of 1 to 3 dimensions.
(This is the "big idea" of OpenCL). (The index values are available
in the function body from intrinsic functions get_work_dim() and
get_global_id(uint workdim).)
But that implied loop is only applied when directly called from the
user program. When a kernel is called from another kernel, it behaves
as a regular function call and just adopts the caller's index point.
The spec does not specify whether or how that implied loop is
represented in the IR. I expect most implementations don't represent
the loop explicitly.
I would be happy to see an OpenCL-specific patch that always marked
non-kernel functions with internal linkage. Then you could
distinguish the kernel/non-kernel case just by the linkage attribute.
It might be a little unclean / unorthogonal, but I think it would be
ok.
(There are also other minor differences, e.g. the behaviour of a
function-scope-local-addr-space variable in a nested kernel is
implementation-defined. See the Notes in the functionQualifiers
reference.)
david
References
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/functionQualifiers.html
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clEnqueueNDRangeKernel.html
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/local.html
More information about the cfe-dev
mailing list