[cfe-dev] RFC: Representation of OpenCL Memory Spaces

Peter Collingbourne peter at pcc.me.uk
Fri Oct 14 18:34:59 PDT 2011


On Fri, Oct 14, 2011 at 06:13:54PM -0700, Justin Holewinski wrote:
> > In OpenCL C, it is illegal to declare a variable with static storage
> > duration in the __private address space (section 6.5: "All program
> > scope variables must be declared in the __constant address space.";
> > section 6.8g: "The extern, static, auto and register storage-class
> > specifiers are not supported.").  This implies that there is no way
> > for pointers to the __private address space to be usefully shared
> > between work-items without invoking undefined behaviour, so the
> > question is moot (i.e. __private does not need to be implemented using
> > thread-local storage).
> >
> > It is possible to write OpenCL C code which shares pointers to
> > __private memory using barrier synchronisation, but since there is no
> > way to queue a memory fence across __private memory (only __local and
> > __global), any access to that memory would invoke undefined behaviour.
> > For example, consider the following (2 work-items in a work-group):
> >
> > __kernel void foo() {
> >  int x = 0;
> >  int *__local p;
> >  if (get_local_id(0) == 0) p = &x;
> >  barrier(CLK_LOCAL_MEM_FENCE);
> >  if (get_local_id(0) == 1) *p = 1;
> >  barrier(CLK_LOCAL_MEM_FENCE);
> >  // what is the value of x in work-item 0 here?
> > }
> >
> > The value of x at the comment is undefined, because no fence across
> > __private memory was queued.
> >
> > Perhaps more straightforwardly, referring to the following passage
> > in section 3.3 ("memory model") of the OpenCL specification:
> >
> > "Private Memory: A region of memory private to a work-item. Variables
> > defined in one work-item's private memory are not visible to another
> > work-item."
> >
> > We can interpret the term "not visible" here as meaning that accesses
> > across work-items invoke undefined behaviour, so in the example above,
> > the write to x via p would itself be undefined.
> >
> 
> I was referring more to the front-end aspects here.  Let's say we have:
> 
> __kernel void foo() {
>   float privateBuffer[8];
>   __local float localBuffer[8];
> }
> 
> What mechanisms, other than address spaces, can we use to tell the X86
> back-end that privateBuffer is private to the thread, and localBuffer is
> shared among all threads in a work-group?

There is no need to tell the x86 backend that privateBuffer is
private to the thread.  For the reasons I explained, there is no
way for work-items to usefully get pointers to other work-items'
privateBuffer objects, so as long as privateBuffer is allocated
as an automatic variable (i.e. on the stack), there is no other
special treatment required.

As for localBuffer, the IR generator would emit accesses to __local
variables in an implementation-specific way, and the IR generator
already contains a mechanism for doing so.   In this mailing list
post I explained in more detail the CGOpenCLRuntime class that is
used to do this:

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110815/045187.html

The "hidden pointer argument" technique is the one most suited to x86,
but this has not actually been implemented.

Thanks,
-- 
Peter



More information about the cfe-dev mailing list