[cfe-dev] OpenCL support

Sat Dec 4 18:01:29 PST 2010

Hi David,

On Fri, Dec 03, 2010 at 11:14:06PM -0500, David Neto wrote:
> Peter, I believe it is incorrect to make __local variables static and
> therefore codegen'd into global variables.  The reason is that the
> storage for a __local variable is shared between different work items
> in the same group, but should be different for work items in different
> groups.
...
> I don't know how existing OpenCL implementations handle this case.

Most GPU architectures have a separate address space for memory shared
within a work group, where a given logical memory address corresponds
to a different physical address dependent on the work group.  Existing
OpenCL implementations for GPUs handle this case simply by allocating
__local variables as global variables within this address space.

> It
> seems it would be a good idea to transform the code so that uses of x
> become loads and stores from memory, and the address for that memory
> is returned by a builtin function that itself is dependent on work
> group ids.
> 
> I'm just learning Clang now, so I'm not prepared to say how that would
> be done.  Is it okay to transform the AST before semantic analysis?
> Where should I start looking?  (I would guess lib/Sema...)

This transformation may be useful for a CPU based OpenCL
implementation, but would not be appropriate in Sema for a few
reasons.  The first is that the AST should at all times be an accurate
representation of the input source code.

The second is that such a transformation would be specific to the
OpenCL implementation -- not only would it be inappropriate for
GPUs but there are a number of feasible CPU based implementation
techniques which we shouldn't have to teach Sema or in fact any part
of Clang about.

The best place to do this transformation would be at the LLVM level
with an implementation specific transformation pass.

Thanks,
-- 
Peter