[cfe-dev] RFC: Representation of OpenCL Memory Spaces

Fri Oct 14 18:13:54 PDT 2011

On Fri, Oct 14, 2011 at 10:40 AM, Peter Collingbourne <peter at pcc.me.uk>wrote:

> On Thu, Oct 13, 2011 at 04:14:09PM -0400, Justin Holewinski wrote:
> > On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <peter at pcc.me.uk
> >wrote:
> >
> > > Hi Justin,
> > >
> > > Thanks for bringing this up, I think it's important to discuss
> > > these issues here.
> > >
> > > On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:
> > > > It is becoming increasingly clear to me that LLVM address spaces are
> not
> > > the
> > > > general solution to OpenCL/CUDA memory spaces. They are a convenient
> hack
> > > to
> > > > get things working in the short term, but I think a more long-term
> > > approach
> > > > should be discussed and decided upon now before the OpenCL and CUDA
> > > > implementations in Clang/LLVM get too mature. To be clear, I am not
> > > > advocating that *targets* change to a different method for
> representing
> > > > device memory spaces. The current use of address spaces to represent
> > > > different types of device memory is perfectly valid, IMHO. However,
> this
> > > > knowledge should not be encoded in front-ends and pre-SelectionDAG
> > > > optimization passes.
> > >
> > > I disagree.  The targets should expose all the address spaces they
> > > provide, and the frontend should know about the various address spaces
> > > it needs to know about.  It is incumbent on the frontend to deliver
> > > a valid IR for a particular language implementation, and part of
> > > that involves knowing about the ABI requirements for the language
> > > implementation (which may involve using specific address spaces)
> > > and the capabilities of each target (including the capabilities of
> > > the target's address spaces), together with the language semantics.
> > > It is not the job of the optimisers or backend to know the semantics
> > > for a specific language, a specific implementation of that language
> > > or a specific ABI.
> > >
> >
> > But this is assuming that a target's address spaces have a valid 1 to 1
> > mapping between OpenCL memory spaces and back-end address spaces.  What
> > happens for a target such as x86?  Do we introduce pseudo address spaces
> > into the back-end just to satisfy the front-end OpenCL requirements?
>
> I don't see how anything I wrote implies that.  For x86, there would
> presumably be a many-to-one mapping.
>
> > > This presupposes that we need a way of representing OpenCL address
> > > spaces in IR targeting X86 (and targets which lack GPU-like address
> > > spaces).  As far as I can tell, the only real representations of
> > > OpenCL address spaces on such targets that we need are a way of
> > > distinguishing the different address spaces for alias analysis
> > > and a representation for __local variables allocated on the stack.
> > > TBAA metadata would solve the first problem, and we already have
> > > mechanisms in the frontend that could be used to solve the second.
> > >
> >
> > Which mechanisms could be used to differentiate between thread-private
> and
> > __local data?
>
> In OpenCL C, it is illegal to declare a variable with static storage
> duration in the __private address space (section 6.5: "All program
> scope variables must be declared in the __constant address space.";
> section 6.8g: "The extern, static, auto and register storage-class
> specifiers are not supported.").  This implies that there is no way
> for pointers to the __private address space to be usefully shared
> between work-items without invoking undefined behaviour, so the
> question is moot (i.e. __private does not need to be implemented using
> thread-local storage).
>
> It is possible to write OpenCL C code which shares pointers to
> __private memory using barrier synchronisation, but since there is no
> way to queue a memory fence across __private memory (only __local and
> __global), any access to that memory would invoke undefined behaviour.
> For example, consider the following (2 work-items in a work-group):
>
> __kernel void foo() {
>  int x = 0;
>  int *__local p;
>  if (get_local_id(0) == 0) p = &x;
>  barrier(CLK_LOCAL_MEM_FENCE);
>  if (get_local_id(0) == 1) *p = 1;
>  barrier(CLK_LOCAL_MEM_FENCE);
>  // what is the value of x in work-item 0 here?
> }
>
> The value of x at the comment is undefined, because no fence across
> __private memory was queued.
>
> Perhaps more straightforwardly, referring to the following passage
> in section 3.3 ("memory model") of the OpenCL specification:
>
> "Private Memory: A region of memory private to a work-item. Variables
> defined in one work-item's private memory are not visible to another
> work-item."
>
> We can interpret the term "not visible" here as meaning that accesses
> across work-items invoke undefined behaviour, so in the example above,
> the write to x via p would itself be undefined.
>

I was referring more to the front-end aspects here.  Let's say we have:

__kernel void foo() {
  float privateBuffer[8];
  __local float localBuffer[8];
}

What mechanisms, other than address spaces, can we use to tell the X86
back-end that privateBuffer is private to the thread, and localBuffer is
shared among all threads in a work-group?

>
> Thanks,
> --
> Peter
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111014/065489af/attachment.html>