[Libclc-dev] [PATCH] Make ptx barrier work irrespective of the cl_mem_fence_flags

Jeroen Ketema via Libclc-dev libclc-dev at lists.llvm.org
Sun Oct 8 11:18:49 PDT 2017


This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory.

Index: ptx-nvidiacl/lib/synchronization/barrier.cl
===================================================================
--- ptx-nvidiacl/lib/synchronization/barrier.cl	(revision 315170)
+++ ptx-nvidiacl/lib/synchronization/barrier.cl	(working copy)
@@ -1,8 +1,6 @@
 #include <clc/clc.h>
 
 _CLC_DEF void barrier(cl_mem_fence_flags flags) {
-  if (flags & CLK_LOCAL_MEM_FENCE) {
-    __syncthreads();
-  }
+  __syncthreads();
 }



More information about the Libclc-dev mailing list