[cfe-dev] [LLVMdev] ANN: libclc (OpenCL C library implementation)
pekka.jaaskelainen at tut.fi
Fri Oct 21 07:53:57 PDT 2011
On 10/21/2011 05:07 PM, Michael Boyer wrote:
> If you have not already seen it, you (and anyone else working on OpenCL
> runtimes) might be interested in this paper from AMD:
> http://dl.acm.org/citation.cfm?id=1854302 In particular, Section 4
> describes the implementation approach for their x86 OpenCL runtime and
> mentions a number of optimizations they applied to things like the
> work-item stack.
"...Recently, compiler techniques have been proposed to effectively execute
CUDA kernels on CPUs (mCUDA ). The kernels are modiﬁed to execute on
work-groups rather than work-items and the work-group state is stored in
local arrays. The barriers are eliminated by treating them as ﬁssion points
for the work-item loops. This potentially leads to low barrier overhead. Our
approach differs by not relying on advanced compiler techniques and leaving
the kernel source code unchanged by moving the work-item scheduling to inside
the runtime system..."
For the record, the pocl approach to WI execution is similar to mCUDAs. We
try to avoid overheads from single WI execution that are apparent in the
"threading approaches" which include overheads of thread context switches.
This includes the setjump/longjump method. Original reason for us taking this
road was that we wanted to flexibly parallelize multiple WIs (with barriers)
on static ILP cores.
More information about the cfe-dev