[Libclc-dev] [PATCH] Make ptx barrier work irrespective of the cl_mem_fence_flags
Jeroen Ketema via Libclc-dev
libclc-dev at lists.llvm.org
Sun Oct 8 12:23:30 PDT 2017
> On 8 Oct 2017, at 21:19, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>
> On Sun, 2017-10-08 at 20:18 +0200, Jeroen Ketema via Libclc-dev wrote:
>> This generates a "bar.sync 0” instruction, which not only causes the
>> threads to wait, but does acts as a memory fence, as required by
>> OpenCL. The fence does not differentiate between local and global
>> memory.
>
> This sounds a bit heavy. There's no need for memory fence if the flags
> == 0. Is there a PTX instruction that only synchronizes thread progress
> without mem fence?
> It's not wrong to insert an extra fence so:
> Reviewed-by: Jan Vesely <jan.vesely at rutgers.edu <mailto:jan.vesely at rutgers.edu>>
>
> I was just wondering if there is a lighter implementation.
PTX only has this, unfortunately.
It’s good that remark this though, because this does apply to the fence patch.
There the call can be wrapped.
Jeroen
>
> Jan
>
>>
>> Index: ptx-nvidiacl/lib/synchronization/barrier.cl
>> ===================================================================
>> --- ptx-nvidiacl/lib/synchronization/barrier.cl (revision 315170)
>> +++ ptx-nvidiacl/lib/synchronization/barrier.cl (working copy)
>> @@ -1,8 +1,6 @@
>> #include <clc/clc.h>
>>
>> _CLC_DEF void barrier(cl_mem_fence_flags flags) {
>> - if (flags & CLK_LOCAL_MEM_FENCE) {
>> - __syncthreads();
>> - }
>> + __syncthreads();
>> }
>>
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>
> --
> Jan Vesely <jan.vesely at rutgers.edu <mailto:jan.vesely at rutgers.edu>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20171008/f1782d74/attachment-0001.html>
More information about the Libclc-dev
mailing list