<div class="gmail_quote">On Fri, Oct 14, 2011 at 10:40 AM, Peter Collingbourne <span dir="ltr"><<a href="mailto:peter@pcc.me.uk">peter@pcc.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div><div></div><div class="h5">On Thu, Oct 13, 2011 at 04:14:09PM -0400, Justin Holewinski wrote:<br>

> On Thu, Oct 13, 2011 at 11:57 AM, Peter Collingbourne <<a href="mailto:peter@pcc.me.uk">peter@pcc.me.uk</a>>wrote:<br>

><br>

> > Hi Justin,<br>

> ><br>

> > Thanks for bringing this up, I think it's important to discuss<br>

> > these issues here.<br>

> ><br>

> > On Thu, Oct 13, 2011 at 09:46:28AM -0400, Justin Holewinski wrote:<br>

> > > It is becoming increasingly clear to me that LLVM address spaces are not<br>

> > the<br>

> > > general solution to OpenCL/CUDA memory spaces. They are a convenient hack<br>

> > to<br>

> > > get things working in the short term, but I think a more long-term<br>

> > approach<br>

> > > should be discussed and decided upon now before the OpenCL and CUDA<br>

> > > implementations in Clang/LLVM get too mature. To be clear, I am not<br>

> > > advocating that *targets* change to a different method for representing<br>

> > > device memory spaces. The current use of address spaces to represent<br>

> > > different types of device memory is perfectly valid, IMHO. However, this<br>

> > > knowledge should not be encoded in front-ends and pre-SelectionDAG<br>

> > > optimization passes.<br>

> ><br>

> > I disagree.  The targets should expose all the address spaces they<br>

> > provide, and the frontend should know about the various address spaces<br>

> > it needs to know about.  It is incumbent on the frontend to deliver<br>

> > a valid IR for a particular language implementation, and part of<br>

> > that involves knowing about the ABI requirements for the language<br>

> > implementation (which may involve using specific address spaces)<br>

> > and the capabilities of each target (including the capabilities of<br>

> > the target's address spaces), together with the language semantics.<br>

> > It is not the job of the optimisers or backend to know the semantics<br>

> > for a specific language, a specific implementation of that language<br>

> > or a specific ABI.<br>

> ><br>

><br>

> But this is assuming that a target's address spaces have a valid 1 to 1<br>

> mapping between OpenCL memory spaces and back-end address spaces.  What<br>

> happens for a target such as x86?  Do we introduce pseudo address spaces<br>

> into the back-end just to satisfy the front-end OpenCL requirements?<br>

<br>

</div></div>I don't see how anything I wrote implies that.  For x86, there would<br>

presumably be a many-to-one mapping.<br>

<div class="im"><br>

> > This presupposes that we need a way of representing OpenCL address<br>

> > spaces in IR targeting X86 (and targets which lack GPU-like address<br>

> > spaces).  As far as I can tell, the only real representations of<br>

> > OpenCL address spaces on such targets that we need are a way of<br>

> > distinguishing the different address spaces for alias analysis<br>

> > and a representation for __local variables allocated on the stack.<br>

> > TBAA metadata would solve the first problem, and we already have<br>

> > mechanisms in the frontend that could be used to solve the second.<br>

> ><br>

><br>

> Which mechanisms could be used to differentiate between thread-private and<br>

> __local data?<br>

<br>

</div>In OpenCL C, it is illegal to declare a variable with static storage<br>

duration in the __private address space (section 6.5: "All program<br>

scope variables must be declared in the __constant address space.";<br>

section 6.8g: "The extern, static, auto and register storage-class<br>

specifiers are not supported.").  This implies that there is no way<br>

for pointers to the __private address space to be usefully shared<br>

between work-items without invoking undefined behaviour, so the<br>

question is moot (i.e. __private does not need to be implemented using<br>

thread-local storage).<br>

<br>

It is possible to write OpenCL C code which shares pointers to<br>

__private memory using barrier synchronisation, but since there is no<br>

way to queue a memory fence across __private memory (only __local and<br>

__global), any access to that memory would invoke undefined behaviour.<br>

For example, consider the following (2 work-items in a work-group):<br>

<br>

__kernel void foo() {<br>

  int x = 0;<br>

  int *__local p;<br>

  if (get_local_id(0) == 0) p = &x;<br>

  barrier(CLK_LOCAL_MEM_FENCE);<br>

  if (get_local_id(0) == 1) *p = 1;<br>

  barrier(CLK_LOCAL_MEM_FENCE);<br>

  // what is the value of x in work-item 0 here?<br>

}<br>

<br>

The value of x at the comment is undefined, because no fence across<br>

__private memory was queued.<br>

<br>

Perhaps more straightforwardly, referring to the following passage<br>

in section 3.3 ("memory model") of the OpenCL specification:<br>

<br>

"Private Memory: A region of memory private to a work-item. Variables<br>

defined in one work-item's private memory are not visible to another<br>

work-item."<br>

<br>

We can interpret the term "not visible" here as meaning that accesses<br>

across work-items invoke undefined behaviour, so in the example above,<br>

the write to x via p would itself be undefined.<br></blockquote><div><br></div><div>I was referring more to the front-end aspects here.  Let's say we have:</div><div><br></div><div>__kernel void foo() {</div><div>  float privateBuffer[8];</div>

<div>  __local float localBuffer[8];</div><div>}</div><div><br></div><div>What mechanisms, other than address spaces, can we use to tell the X86 back-end that privateBuffer is private to the thread, and localBuffer is shared among all threads in a work-group?</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

Thanks,<br>

<font color="#888888">--<br>

Peter<br>

</font></blockquote></div><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>