Hi,<br><br>I don't fully understand your problem description.<br><br><blockquote style="margin:0 0 0 40px;border:none;padding:0px"></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
...is caused by LLVM/Clang thinking<br>they are buffers with a constant base which they eventually won't be in<br>a parallel WG implementation. This triggers an issue I'm currently working on pocl: <a href="https://bugs.launchpad.net/pocl/+bug/1032203">https://bugs.launchpad.net/pocl/+bug/1032203</a> because Clang generates<br>
constant GEPs for the local buffer accesses (even though in a parallel<br>thread-safe implementation the local variables cannot be stored to<br>constant locations).</blockquote><div><br></div><div> Surely if you're passing in pointers to the kernel function that differ depending on workgroup, then a GEP from those pointers of a constant amount is perfectly safe. Why would a constant GEP from a per-workgroup base be a problem?</div>
<blockquote style="margin:0 0 0 40px;border:none;padding:0px"></blockquote><blockquote style="margin:0 0 0 40px;border:none;padding:0px"></blockquote><blockquote style="margin:0 0 0 40px;border:none;padding:0px"></blockquote>
<blockquote style="margin:0 0 0 40px;border:none;padding:0px"></blockquote><div><br></div>I'm sure there's something I've misunderstood about your solution...<div><br></div><div>Cheers,</div><div><br></div><div>
James</div><div><br>On 24 September 2012 12:41, Pekka Jääskeläinen <<a href="mailto:pekka.jaaskelainen@tut.fi">pekka.jaaskelainen@tut.fi</a>> wrote:<br>> Hi all,<br>><br>> Another OpenCL C implementation issue I'm currently fighting with is how<br>
> to best implement the automatic __local variables. Seems SPIR enforces<br>> the current Clang implementation of them that converts the automatic<br>> locals to C function static variables (thus, in practice global variables).<br>
><br>> Clearly, this is not a thread safe "final implementation", thus works as is<br>> only when multiple work groups of the same kernel are not executed in<br>> parallel. Therefore, some other compiler pass is assumed to convert those<br>
> function static (module global variables) to some other storage where the<br>> local buffers are allocated per work group thread.<br>><br>> The pocl implementation is what was suggested some time ago in this list:<br>
> the locals are converted to local arguments to the kernel function which<br>> are then passed per-thread buffers when the work group is executed. Thus,<br>> pocl needs to convert the references to these dummy globals to local<br>
> buffer pointers at the end of the kernel function argument list.<br>><br>> The problem from the use of the "semantically inadequate" 'function<br>> static' variables for the local buffers is caused by LLVM/Clang thinking<br>
> they are buffers with a constant base which they eventually won't be in<br>> a parallel WG implementation. This triggers an issue I'm currently working<br>> on pocl: <a href="https://bugs.launchpad.net/pocl/+bug/1032203">https://bugs.launchpad.net/pocl/+bug/1032203</a> because Clang<br>
> generates<br>> constant GEPs for the local buffer accesses (even though in a parallel<br>> thread-safe implementation the local variables cannot be stored to<br>> constant locations).<br>><br>> So, I wonder if this piece of SPIR specs might cause other similar<br>
> problems (LLVM optimizing incorrectly due to the slightly wrong semantics)<br>> in the future and should be improved. The minimal fix would be<br>> to add some kind of attribute to the function static global that prevents<br>
> Clang/LLVM thinking the address is constant and apply optimizations that<br>> rely<br>> on that. Semantically the local buffer is actually a thread-local variable.<br>> Are thread locals somehow supported in LLVM IR?<br>
><br>><br>> On 09/13/2012 12:19 PM, Pekka Jääskeläinen wrote:<br>>><br>>> For what it's worth, this issue manifests itself in an unsolved pocl<br>>> bug: <a href="https://bugs.launchpad.net/pocl/+bug/987905">https://bugs.launchpad.net/pocl/+bug/987905</a><br>
>><br>>> It would be simpler to implement a portable implementation for calling the<br>>> kernel from the host if we could assume the kernel calling convention<br>>> mapped<br>>> each OpenCL setArg arg to a single kernel function arg (and preferably all<br>
>> arg data in memory). For the non-kernel functions it should not matter and<br>>> could be target-specific.<br>>><br>><br>><br>> --<br>> Pekka<br></div>