[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)
steckdenis at yahoo.fr
Thu Oct 20 10:42:23 PDT 2011
I am the developer of Clover, and so much activity about OpenCL these days is really exciting. Here is my point of view, mainly on Clover and how the projects could use each other.
Clover is made in a way that allow a certain level of modularity. Although POCL would be very difficult to merge into Clover (or Clover into POCL), as these two projects are nearly exactly doing the same things (an OpenCL platform layer), Libclc and the German driver are very interesting.
I'll begin with the German driver, maybe the project the easiest to integrate or bind to Clover. Clover uses a hardware abstraction layer, a set of virtual classes a driver has to implement in order to be usable by Clover. They are DeviceInterface, DeviceBuffer, DeviceKernel and DeviceProgram. You can see their prototype here : http://cgit.freedesktop.org/~steckdenis/clover/tree/src/core/deviceinterface.h . Currently, I have developed a small driver, normally not too slow, that simply runs the LLVM IR produced by Clang using the LLVM JIT. The kernels are split in work-groups, split in work-items in such a way that multithreading is efficiently used.
What would be interesting is to try to integrate the German driver into Clover using this interface, or to have this driver built as a library on which Clover links (if there is a problem of license, but Clover is BSD and it seems that the driver will be the same). I would personally be very excited to see how another driver would perform in Clover, feature-wise and performance-wise.
Libclc could also be useful to Clover, but less likely. The goal of this project is to implement all the OpenCL built-in functions. It's good, but Clover already does the same, using a different technique. Libclc is very elegant (I think), it seems to use custom LLVM intrinsics, and is built around pure C macros. Clover uses a slightly more complex system, involving a Python script "compiling" a set of built-ins into four files. For example, this declaration (REPL is a macro that does a simple for()) :
def vecf : float2 float3 float4 float8 float16
native $type acospi $vecf : x:$type
result[i] = std::acos(x[i]) / M_PI;
Is compiled to these fragments, one for each vector type (float2, float3, etc) :
// In stdlib_def.h : what the OpenCL C kernel sees
float2 OVERLOAD acospi(float2 x);
// In stdlib_impl.h : what gets compiled to LLVM IR at Clover compile time, and then linked to each kernel
void __cpu_float2_acospi_float2(float *result, float *x);
float2 OVERLOAD acospi(float2 x)
__cpu_float2_acospi_float2((float *)&result, (float *)&x);
// __cpu_float2_acospi_float2 is a function implemented in the Clover .so library, using llvm::JIT::registerLasyFunctionCreator
// In builtins_impl.h : the actual C++ implementation, included in src/core/cpu/builtins.cpp
static void float2_acospi_float2(float *result, float *x)
result[i] = std::acos(x[i]) / M_PI;
// And then a small else if in the lazy function creator, in order to bind everything together
else if (name == "__cpu_float2_acospi_float2")
return (void *)&float2_acospi_float2;
The system works fairly well, and I was able to implement a dozen of built-in functions in only two hours. It's very fast to simply declare "native" functions using STL or Boost math functions, and hardware drivers simply can replace the LLVM "call" statements with what they need to accelerate the functions on the GPU.
So, libclc would only be useful to Clover if it is developed by so much people that its development becomes way faster than Clover, and if it provides an easy and efficient way to natively implement functions, without needing to have a LLVM pass turning LLVM intrinsics to native function calls.
Here is my personal point of view, and I hope a solution will be found not to have three or four different projects working on the same things.
More information about the llvm-dev