[LLVMdev] Re : ANN: libclc (OpenCL C library implementation)

Thu Oct 20 12:24:15 PDT 2011

On Thu, Oct 20, 2011 at 1:42 PM, Denis Steckelmacher <steckdenis at yahoo.fr>wrote:

> Hello,
>
> I am the developer of Clover, and so much activity about OpenCL these days
> is really exciting. Here is my point of view, mainly on Clover and how the
> projects could use each other.
>
> Clover is made in a way that allow a certain level of modularity. Although
> POCL would be very difficult to merge into Clover (or Clover into POCL), as
> these two projects are nearly exactly doing the same things (an OpenCL
> platform layer), Libclc and the German driver are very interesting.
>
> I'll begin with the German driver, maybe the project the easiest to
> integrate or bind to Clover. Clover uses a hardware abstraction layer, a set
> of virtual classes a driver has to implement in order to be usable by
> Clover. They are DeviceInterface, DeviceBuffer, DeviceKernel and
> DeviceProgram. You can see their prototype here :
> http://cgit.freedesktop.org/~steckdenis/clover/tree/src/core/deviceinterface.h. Currently, I have developed a small driver, normally not too slow, that
> simply runs the LLVM IR produced by Clang using the LLVM JIT. The kernels
> are split in work-groups, split in work-items in such a way that
> multithreading is efficiently used.
>
> What would be interesting is to try to integrate the German driver into
> Clover using this interface, or to have this driver built as a library on
> which Clover links (if there is a problem of license, but Clover is BSD and
> it seems that the driver will be the same). I would personally be very
> excited to see how another driver would perform in Clover, feature-wise and
> performance-wise.
>
> Libclc could also be useful to Clover, but less likely. The goal of this
> project is to implement all the OpenCL built-in functions. It's good, but
> Clover already does the same, using a different technique. Libclc is very
> elegant (I think), it seems to use custom LLVM intrinsics, and is built
> around pure C macros.

libclc only uses LLVM intrinsics (currently) for back-end specific
functionality.  For example, the get_local_id() function is implemented
separately for each target, and uses LLVM PTX intrinsics if compiling for
the PTX back-end.  This is not something you could implement in a generic
way without back-end hooks (at least not without dirty hacks in the
back-end).

> Clover uses a slightly more complex system, involving a Python script
> "compiling" a set of built-ins into four files. For example, this
> declaration (REPL is a macro that does a simple for()) :
>

> ----
> def vecf : float2 float3 float4 float8 float16
>
> native $type acospi $vecf : x:$type
>     REPL($vecdim)
>         result[i] = std::acos(x[i]) / M_PI;
> end
> ----
>
> Is compiled to these fragments, one for each vector type (float2, float3,
> etc) :
>
> ----
> // In stdlib_def.h : what the OpenCL C kernel sees
> float2 OVERLOAD acospi(float2 x);
>
> // In stdlib_impl.h : what gets compiled to LLVM IR at Clover compile time,
> and then linked to each kernel
> void __cpu_float2_acospi_float2(float *result, float *x);
> float2 OVERLOAD acospi(float2 x)
> {
>     float2 result;
>
>     __cpu_float2_acospi_float2((float *)&result, (float *)&x);
>
>     return result;
> }
>
> // __cpu_float2_acospi_float2 is a function implemented in the Clover .so
> library, using llvm::JIT::registerLasyFunctionCreator
> // In builtins_impl.h : the actual C++ implementation, included in
> src/core/cpu/builtins.cpp
> static void float2_acospi_float2(float *result, float *x)
> {
>     REPL(2)
>         result[i] = std::acos(x[i]) / M_PI;
>
> }
>
> // And then a small else if in the lazy function creator, in order to bind
> everything together
>     else if (name == "__cpu_float2_acospi_float2")
>         return (void *)&float2_acospi_float2;
> ----
>

If the LLVM JIT picks up these functions at run-time, then there is no
chance of inlining these math functions.  This is not good for performance.

>
> The system works fairly well, and I was able to implement a dozen of
> built-in functions in only two hours. It's very fast to simply declare
> "native" functions using STL or Boost math functions, and hardware drivers
> simply can replace the LLVM "call" statements with what they need to
> accelerate the functions on the GPU.
>

But then the hardware driver layer has to have GPU implementations for all
of these functions.

>
> So, libclc would only be useful to Clover if it is developed by so much
> people that its development becomes way faster than Clover, and if it
> provides an easy and efficient way to natively implement functions, without
> needing to have a LLVM pass turning LLVM intrinsics to native function
> calls.
>
> Here is my personal point of view, and I hope a solution will be found not
> to have three or four different projects working on the same things.
>
> Best regards,
> Denis Steckelmacher.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111020/da907596/attachment.html>