[Libclc-dev] [PATCH] Make ptx barrier work irrespective of the cl_mem_fence_flags
Jeroen Ketema via Libclc-dev
libclc-dev at lists.llvm.org
Sun Oct 8 11:18:49 PDT 2017
This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory.
Index: ptx-nvidiacl/lib/synchronization/barrier.cl
===================================================================
--- ptx-nvidiacl/lib/synchronization/barrier.cl (revision 315170)
+++ ptx-nvidiacl/lib/synchronization/barrier.cl (working copy)
@@ -1,8 +1,6 @@
#include <clc/clc.h>
_CLC_DEF void barrier(cl_mem_fence_flags flags) {
- if (flags & CLK_LOCAL_MEM_FENCE) {
- __syncthreads();
- }
+ __syncthreads();
}
More information about the Libclc-dev
mailing list