[Libclc-dev] [PATCH] Make ptx barrier work irrespective of the cl_mem_fence_flags
Jan Vesely via Libclc-dev
libclc-dev at lists.llvm.org
Sun Oct 8 12:19:28 PDT 2017
On Sun, 2017-10-08 at 20:18 +0200, Jeroen Ketema via Libclc-dev wrote:
> This generates a "bar.sync 0” instruction, which not only causes the
> threads to wait, but does acts as a memory fence, as required by
> OpenCL. The fence does not differentiate between local and global
> memory.
This sounds a bit heavy. There's no need for memory fence if the flags
== 0. Is there a PTX instruction that only synchronizes thread progress
without mem fence?
It's not wrong to insert an extra fence so:
Reviewed-by: Jan Vesely <jan.vesely at rutgers.edu>
I was just wondering if there is a lighter implementation.
Jan
>
> Index: ptx-nvidiacl/lib/synchronization/barrier.cl
> ===================================================================
> --- ptx-nvidiacl/lib/synchronization/barrier.cl (revision 315170)
> +++ ptx-nvidiacl/lib/synchronization/barrier.cl (working copy)
> @@ -1,8 +1,6 @@
> #include <clc/clc.h>
>
> _CLC_DEF void barrier(cl_mem_fence_flags flags) {
> - if (flags & CLK_LOCAL_MEM_FENCE) {
> - __syncthreads();
> - }
> + __syncthreads();
> }
>
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
--
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20171008/eb837acb/attachment.sig>
More information about the Libclc-dev
mailing list