[cfe-dev] OpenCL support

Mon Dec 6 08:14:42 PST 2010

Peter,

Thanks for your informative reply.  I appreciate the advice about the
high level intent of Sema.

On Sat, Dec 4, 2010 at 9:01 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> Hi David,
>
> On Fri, Dec 03, 2010 at 11:14:06PM -0500, David Neto wrote:
>> Peter, I believe it is incorrect to make __local variables static and
>> therefore codegen'd into global variables.  The reason is that the
>> storage for a __local variable is shared between different work items
>> in the same group, but should be different for work items in different
>> groups.
> ...
>> I don't know how existing OpenCL implementations handle this case.
>
> Most GPU architectures have a separate address space for memory shared
> within a work group, where a given logical memory address corresponds
> to a different physical address dependent on the work group.  Existing
> OpenCL implementations for GPUs handle this case simply by allocating
> __local variables as global variables within this address space.

Ah.  Thanks for this tidbit about GPUs.

>
>> It
>> seems it would be a good idea to transform the code so that uses of x
>> become loads and stores from memory, and the address for that memory
>> is returned by a builtin function that itself is dependent on work
>> group ids.
>>
>> I'm just learning Clang now, so I'm not prepared to say how that would
>> be done.  Is it okay to transform the AST before semantic analysis?
>> Where should I start looking?  (I would guess lib/Sema...)
>
> This transformation may be useful for a CPU based OpenCL
> implementation, but would not be appropriate in Sema for a few
> reasons.  The first is that the AST should at all times be an accurate
> representation of the input source code.
>
> The second is that such a transformation would be specific to the
> OpenCL implementation -- not only would it be inappropriate for
> GPUs but there are a number of feasible CPU based implementation
> techniques which we shouldn't have to teach Sema or in fact any part
> of Clang about.
>
> The best place to do this transformation would be at the LLVM level
> with an implementation specific transformation pass.

Ok.  Now I'm even more convinced that your patch [1] is incorrect because:
(a) it's specific to GPU-style implementations of OpenCL, not the
generic semantics of OpenCL.
(b) it pushes target-specific assumptions into Sema.  But you've just
argued that the AST should reflect the original source code as much as
possible.

On (a):   I understand that ARM is preparing to contribute a more
complete OpenCL front-end to Clang.   It would be great to nail down a
common front end with generic OpenCL semantics, and let later stages
(Clang's CodeGen? LLVM IR pass?) handle more target-specific
assumptions.  E.g. it would be nice to standardize on how Clang
handles OpenCL's local, global, etc. etc. etc.  E.g. just agreeing on
address space numbering would be a step forward.  (e.g. global is 1,
local is 2...)

What do I think your patch should look like?  It's true that the
diag::err_as_qualified_auto_decl is inappropriate for OpenCL when it's
the __local addres space.

But we need to implement the semantics somehow.  Conceptually I think
of it as a CL source-to-source transformation that lowers
function-scope-local-address-space variables into a more primitive
form.

I think I disagree that the Clang is an inappropriate spot for
implementing this type of transform: Clang "knows" the source language
semantics, and has a lot of machinery required for the transform.
Also, Clang also knows a lot about the target machine (e.g. type
sizes, builtins, more?).

So I believe the "auto var in different address space" case should be
allowed in the AST in the OpenCL case, and the local-lowering
transform should be applied in CodeGen.  Perhaps the lowering is
target-specific, e.g. GPU-style, or more generic style as I proposed.

Thoughts?

>
> Thanks,
> --
> Peter
>

[1] http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20101018/035558.html