[cfe-dev] OpenCL support

Mon Dec 6 14:55:34 PST 2010

Hi David,

On Mon, Dec 06, 2010 at 11:14:42AM -0500, David Neto wrote:
> >> It
> >> seems it would be a good idea to transform the code so that uses of x
> >> become loads and stores from memory, and the address for that memory
> >> is returned by a builtin function that itself is dependent on work
> >> group ids.
> >>
> >> I'm just learning Clang now, so I'm not prepared to say how that would
> >> be done.  Is it okay to transform the AST before semantic analysis?
> >> Where should I start looking?  (I would guess lib/Sema...)
> >
> > This transformation may be useful for a CPU based OpenCL
> > implementation, but would not be appropriate in Sema for a few
> > reasons.  The first is that the AST should at all times be an accurate
> > representation of the input source code.
> >
> > The second is that such a transformation would be specific to the
> > OpenCL implementation -- not only would it be inappropriate for
> > GPUs but there are a number of feasible CPU based implementation
> > techniques which we shouldn't have to teach Sema or in fact any part
> > of Clang about.
> >
> > The best place to do this transformation would be at the LLVM level
> > with an implementation specific transformation pass.
> 
> Ok.  Now I'm even more convinced that your patch [1] is incorrect because:
> (a) it's specific to GPU-style implementations of OpenCL, not the
> generic semantics of OpenCL.
> (b) it pushes target-specific assumptions into Sema.  But you've just
> argued that the AST should reflect the original source code as much as
> possible.

Yes, that's why I don't like the patch so much :)  It was really
designed to work with the current infrastructure, which isn't
very well suited to more exotic languages like OpenCL.

> On (a):   I understand that ARM is preparing to contribute a more
> complete OpenCL front-end to Clang.   It would be great to nail down a
> common front end with generic OpenCL semantics, and let later stages
> (Clang's CodeGen? LLVM IR pass?) handle more target-specific
> assumptions.  E.g. it would be nice to standardize on how Clang
> handles OpenCL's local, global, etc. etc. etc.  E.g. just agreeing on
> address space numbering would be a step forward.  (e.g. global is 1,
> local is 2...)

+llvmdev, as this is also a LLVM-relevant issue.

I agree.  We should set a standard for address spaces in LLVM - a low
range for 'standard' address spaces (with a defined semantics for each
value in that range) and a high range for target-specific spaces.
It looks like address spaces are already being used this way to a
certain extent in the targets (X86 uses 256 -> GS, 257 -> FS).  And
I think 256 'standard' address spaces should be enough, but I'm happy
to be proven wrong :)

> What do I think your patch should look like?  It's true that the
> diag::err_as_qualified_auto_decl is inappropriate for OpenCL when it's
> the __local addres space.
> 
> But we need to implement the semantics somehow.  Conceptually I think
> of it as a CL source-to-source transformation that lowers
> function-scope-local-address-space variables into a more primitive
> form.
> 
> I think I disagree that the Clang is an inappropriate spot for
> implementing this type of transform: Clang "knows" the source language
> semantics, and has a lot of machinery required for the transform.
> Also, Clang also knows a lot about the target machine (e.g. type
> sizes, builtins, more?).
> 
> So I believe the "auto var in different address space" case should be
> allowed in the AST in the OpenCL case, and the local-lowering
> transform should be applied in CodeGen.  Perhaps the lowering is
> target-specific, e.g. GPU-style, or more generic style as I proposed.
> 
> Thoughts?

I've been rethinking this and perhaps coming around to this way
of thinking.  Allocating variables in the __local address space
is really something that can't be represented at the LLVM level,
at least in a standard form.

But to a certain extent both auto and static storage-classes are wrong
here.  Auto implies that each invocation of the function gets its own
variable, while static implies that all invocations share a variable.

Perhaps the right thing to do here is to introduce a new storage-class
for __local variables (let's call it the 'wg-local' storage-class).
A variable cannot be made wg-local with a storage-class specifier but
function-scope-local-address-space variables would be made so in a
similar way to my original patch.  The target would then be required
to define at CodeGen the semantics of declaring wg-local variables and
loading and storing from local address space in the way you propose.

A side effect of this is that we will require a mapping of
target-unsupported address spaces to supported address spaces in
CodeGen.  For example, a CPU based implementation should map the
local address space to 0.

Thanks,
-- 
Peter