[Libclc-dev] Integration Question

Fri Oct 21 10:04:15 PDT 2011

On Fri, Oct 21, 2011 at 06:33:22AM -0700, Pete Couperus wrote:
> Hello,
> 
> Thanks for taking on the libclc project, it is nice to start seeing more
> open
> source OpenCL pieces appear.  I would be interested in contributing.
> Forgive me, I haven't followed all of the discussions to this point.
> 
> I have a couple of questions (based on what I see available now/project
> description) on how you would see libclc used by other OpenCL components.
> 
> In the OpenCL runtime (not part of your project, but a project which would
> be
> the consumer of your project), when someone calls clCreateKernel, what
> artifacts from libclc would they be using?  Meaning, inlined headers?
> Already compiled .bc files for that platform to link against?  A
> shared/static lib for that platform to load/link into the runtime? Something
> else?
> 
> I've thought a bit/dabbled in writing something similar to libclc, and my
> approach was compiling a (platform specific) .bc file which was loaded by
> the runtime to link kernels against, and then the linked kernels were JIT-ed
> by the runtime.  In some sense, this is close to what Clover does, except
> the builtins are loaded/linked to the kernels at run time, rather than
> linked into the runtime and made available through symbol look-up.  (One
> thing that I'm not sure Denis has done is symbol lookup for the overloaded
> functions...which involves looking up the name-mangled symbol, IIUC).
> 
> I'm interested in the approach you plan taking, and pitching in!

Hi Pete,

Thanks for your interest in libclc.  I've been thinking a lot in the
past few days about exactly which artifacts libclc should provide
and the entire compilation process.

Firstly, the declarations of builtin functions.  Currently these live
in header files in libclc's include directory, with target specific
overrides possible by arranging the order of -I flags, and I intend to
keep it this way.  Optionally, libclc may, as part of its compilation
process, produce a precompiled header (.pch) file for each target for
efficiency (reading one large serialised file is more efficient than
reading and parsing several small files).

Secondly, the implementation of builtin functions.  This is a tricky
issue, mainly because we must support a wide variety of targets,
some of which have space restrictions and cannot support a large
runtime library contained in each executable, and we must support
inlining for efficiency and because many targets (especially GPUs)
require it.  Initially I thought that the solution to this would be
to provide "static inline" function definitions in the header files.
Unfortunately I have since realised that the situation is more
complicated than that.  Some builtin implementations must be written
in pure LLVM IR, because Clang currently lacks support for emitting
the necessary instructions.  Some builtins use data, such as cosine
tables, which we should not duplicate in every translation unit.
As a consequence of this, the implementations of the builtins cannot
live in the header file.

Instead, the solution shall be to provide a .bc file providing all
of the builtin function implementations (similar to how you suggest
above).  Clang's frontend will be modified to include support for
lazily linking bitcode modules (so that only used functions will be
loaded from the .bc and linked) before performing optimisations.
Each global in the .bc providing the builtins (this includes the
builtins themselves, plus any data they use) will use linkonce_odr
linkage.  This linkage provides the same semantics as C++ "inline" --
it permits inlining, and at most one copy of the global will appear
in the final executable.

You mentioned overloaded functions.  This is already handled by Clang's
IR generator.  Any function marked with __attribute__((overloadable))
will have its name mangled according to the Itanium C++ ABI name
mangling rules.

Some targets, as part of their ABI, require a specific set of external
symbols to be present in every object file, and those symbols must
appear exactly once (an example being the _global_block_offset and
other symbols used by NVIDIA's OpenCL implementation).  The solution
to this would be to provide those symbols in a separate .bc file.
That file would serve a similar role to glibc's crt0.o, and would be
linked into every final executable during the final link step.

How would clients use these artifacts?  Another feature of libclc
will be that clients will not need to worry about any of this.
The Clang driver will be taught to pass the necessary flags to the
Clang frontend, and the intention is that a command line such as this:

$ clang -target ptx32--nvidiacl -o file.ptx file.cl

would just work -- the semantics of such a command line driver
invocation would be equivalent to the invocation of a program which
uses the OpenCL platform layer and runtime APIs to build an OpenCL C
program with the given flags (excluding -target, -o and input files)
using clCreateProgramWithSource and clBuildProgram, and then uses
clGetProgramInfo to dump the binaries.  As a side effect of this, the
implementation of clBuildProgram would be very simple -- it would only
need to invoke the driver with a few command line options in addition
to the flags provided by the user as a parameter to clBuildProgram.

Clang provides an API for invoking its driver (see the
clang::createInvocationFromCommandLine function).  There may also be
a small wrapper library for clBuildProgram implementations to use,
to simplify the entire process.  This could be part of libclc or
perhaps a separate project.

Thanks,
-- 
Peter