[Libclc-dev] [PATCH] Make ptx barrier work irrespective of the cl_mem_fence_flags

Jan Vesely via Libclc-dev libclc-dev at lists.llvm.org
Sun Oct 8 12:19:28 PDT 2017


On Sun, 2017-10-08 at 20:18 +0200, Jeroen Ketema via Libclc-dev wrote:
> This generates a "bar.sync 0” instruction, which not only causes the
> threads to wait, but does acts as a memory fence, as required by
> OpenCL. The fence does not differentiate between local and global
> memory.

This sounds a bit heavy. There's no need for memory fence if the flags
== 0. Is there a PTX instruction that only synchronizes thread progress
without mem fence?
It's not wrong to insert an extra fence so:
Reviewed-by: Jan Vesely <jan.vesely at rutgers.edu>

I was just wondering if there is a lighter implementation.

Jan

> 
> Index: ptx-nvidiacl/lib/synchronization/barrier.cl
> ===================================================================
> --- ptx-nvidiacl/lib/synchronization/barrier.cl	(revision 315170)
> +++ ptx-nvidiacl/lib/synchronization/barrier.cl	(working copy)
> @@ -1,8 +1,6 @@
>  #include <clc/clc.h>
>  
>  _CLC_DEF void barrier(cl_mem_fence_flags flags) {
> -  if (flags & CLK_LOCAL_MEM_FENCE) {
> -    __syncthreads();
> -  }
> +  __syncthreads();
>  }
> 
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20171008/eb837acb/attachment.sig>


More information about the Libclc-dev mailing list