[Libclc-dev] [PATCH] Make ptx barrier work irrespective of the cl_mem_fence_flags

Jeroen Ketema via Libclc-dev libclc-dev at lists.llvm.org
Sun Oct 8 12:23:30 PDT 2017


> On 8 Oct 2017, at 21:19, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> 
> On Sun, 2017-10-08 at 20:18 +0200, Jeroen Ketema via Libclc-dev wrote:
>> This generates a "bar.sync 0” instruction, which not only causes the
>> threads to wait, but does acts as a memory fence, as required by
>> OpenCL. The fence does not differentiate between local and global
>> memory.
> 
> This sounds a bit heavy. There's no need for memory fence if the flags
> == 0. Is there a PTX instruction that only synchronizes thread progress
> without mem fence?
> It's not wrong to insert an extra fence so:
> Reviewed-by: Jan Vesely <jan.vesely at rutgers.edu <mailto:jan.vesely at rutgers.edu>>
> 
> I was just wondering if there is a lighter implementation.

PTX only has this, unfortunately.

It’s good that remark this though, because this does apply to the fence patch.
There the call can be wrapped.

Jeroen

> 
> Jan
> 
>> 
>> Index: ptx-nvidiacl/lib/synchronization/barrier.cl
>> ===================================================================
>> --- ptx-nvidiacl/lib/synchronization/barrier.cl	(revision 315170)
>> +++ ptx-nvidiacl/lib/synchronization/barrier.cl	(working copy)
>> @@ -1,8 +1,6 @@
>> #include <clc/clc.h>
>> 
>> _CLC_DEF void barrier(cl_mem_fence_flags flags) {
>> -  if (flags & CLK_LOCAL_MEM_FENCE) {
>> -    __syncthreads();
>> -  }
>> +  __syncthreads();
>> }
>> 
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> 
> -- 
> Jan Vesely <jan.vesely at rutgers.edu <mailto:jan.vesely at rutgers.edu>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20171008/f1782d74/attachment-0001.html>


More information about the Libclc-dev mailing list