Hello,<br><br>I see, that gives me a better idea.  I was working on the builtin<br>function support, so these issues sound familiar :).<br><br><div class="gmail_quote">On Fri, Oct 21, 2011 at 10:04 AM, Peter Collingbourne <span dir="ltr"><<a href="mailto:peter@pcc.me.uk">peter@pcc.me.uk</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div></div><br></div>

Firstly, the declarations of builtin functions.  Currently these live<br>

in header files in libclc's include directory, with target specific<br>

overrides possible by arranging the order of -I flags, and I intend to<br>

keep it this way.  Optionally, libclc may, as part of its compilation<br>

process, produce a precompiled header (.pch) file for each target for<br>

efficiency (reading one large serialised file is more efficient than<br>

reading and parsing several small files).<br>

<br></blockquote><div><br>When you say "libclc...may produce a precompiled header...", do you<br>mean "one of the artifacts built with libclc is a .pch file"? (Just clarifying).<br>This seems like a good idea, I haven't looked at how clang supports<br>

.pch files.  Preliminarily, I was essentially creating a monolithic<br>"builtin.h" header with all of the prototypes which got inserted before<br>compiling the .cl files.  All of the tinkering I had done was with clang<br>

embedded as a library, rather than executed as a separate process.<br>At a glance, pocl executes clang as a separate process, yes?<br><br> <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


Secondly, the implementation of builtin functions.  This is a tricky<br>

issue, mainly because we must support a wide variety of targets,<br>

some of which have space restrictions and cannot support a large<br>

runtime library contained in each executable, and we must support<br>

inlining for efficiency and because many targets (especially GPUs)<br>

require it.  Initially I thought that the solution to this would be<br>

to provide "static inline" function definitions in the header files.<br>

Unfortunately I have since realised that the situation is more<br>

complicated than that.  Some builtin implementations must be written<br>

in pure LLVM IR, because Clang currently lacks support for emitting<br>

the necessary instructions.  Some builtins use data, such as cosine<br>

tables, which we should not duplicate in every translation unit.<br>

As a consequence of this, the implementations of the builtins cannot<br>

live in the header file.<br>

 </blockquote><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Instead, the solution shall be to provide a .bc file providing all<br>

of the builtin function implementations (similar to how you suggest<br>

above).  Clang's frontend will be modified to include support for<br>

lazily linking bitcode modules (so that only used functions will be<br>

loaded from the .bc and linked) before performing optimisations.<br>

Each global in the .bc providing the builtins (this includes the<br>

builtins themselves, plus any data they use) will use linkonce_odr<br>

linkage.  This linkage provides the same semantics as C++ "inline" --<br>

it permits inlining, and at most one copy of the global will appear<br>

in the final executable.<br>

<br></blockquote><div><br>When you say clang's frontend, does llvm-ld have support for this?<br>I'm less familiar with some of the link-time optimization things that<br>have been done.  It seems that the bitcode modules could be linked<br>

normally, and then a pass could be run to remove uncalled functions.<br> <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

You mentioned overloaded functions.  This is already handled by Clang's<br>

IR generator.  Any function marked with __attribute__((overloadable))<br>

will have its name mangled according to the Itanium C++ ABI name<br>

mangling rules.<br>

<br></blockquote><div><br>Right, the overloaded functions are mangled.  What I meant is that in Clover,<br>some builtins are not linked in, so when the LLVM JIT refs an unknown function,<br>it calls an optional function resolver, which Clover also provides.  I believe<br>

that this resolver needs to understand the mangled name, rather than the<br>bare name.  If you look at the resolver, it currently doesn't deal with the<br>overloaded builtins.<br><a href="http://cgit.freedesktop.org/~steckdenis/clover/tree/src/core/cpu/builtins.cpp:416">http://cgit.freedesktop.org/~steckdenis/clover/tree/src/core/cpu/builtins.cpp:416</a><br>

<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Some targets, as part of their ABI, require a specific set of external<br>

symbols to be present in every object file, and those symbols must<br>

appear exactly once (an example being the _global_block_offset and<br>

other symbols used by NVIDIA's OpenCL implementation).  The solution<br>

to this would be to provide those symbols in a separate .bc file.<br>

That file would serve a similar role to glibc's crt0.o, and would be<br>

linked into every final executable during the final link step.<br>

<br></blockquote><div><br>Could you explain this a bit further?  I understand that some targets<br>may need other symbols.  That's ok.<br>I'm unclear as to what you mean by final executable.  If I have a<br><a href="http://file.cl">file.cl</a> with a number of kernels and support functions, the OpenCL<br>

runtime needs to be able to execute the kernels.  What executable<br>is coming into the picture?  Or do you mean "program"?<br>  <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


How would clients use these artifacts?  Another feature of libclc<br>

will be that clients will not need to worry about any of this.<br>

The Clang driver will be taught to pass the necessary flags to the<br>

Clang frontend, and the intention is that a command line such as this:<br>

<br>

$ clang -target ptx32--nvidiacl -o file.ptx <a href="http://file.cl" target="_blank">file.cl</a><br>

<br></blockquote><div><br>So, I'm a little unclear as to what exactly this is going to produce.<br>file.ptx will have all of the .ptx assembly for all of the referenced builtins,<br>so it can be assembled into the executable referenced above?<br>

<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

would just work -- the semantics of such a command line driver<br>

invocation would be equivalent to the invocation of a program which<br>

uses the OpenCL platform layer and runtime APIs to build an OpenCL C<br>

program with the given flags (excluding -target, -o and input files)<br>

using clCreateProgramWithSource and clBuildProgram, and then uses<br>

clGetProgramInfo to dump the binaries.  As a side effect of this, the<br>

implementation of clBuildProgram would be very simple -- it would only<br>

need to invoke the driver with a few command line options in addition<br>

to the flags provided by the user as a parameter to clBuildProgram.<br>

<br></blockquote><div><br>Ok, this gives me more of an idea where you're headed.  Thanks for the<br>explanation.  Sounds great!<br>  <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


Clang provides an API for invoking its driver (see the<br>

clang::createInvocationFromCommandLine function).  There may also be<br>

a small wrapper library for clBuildProgram implementations to use,<br>

to simplify the entire process.  This could be part of libclc or<br>

perhaps a separate project.<br>

<br>

Thanks,<br>

<font color="#888888">--<br>

Peter<br>

</font></blockquote></div><br>Thank you for the detailed explanation.<br><br>Pete<br><br>