[Libclc-dev] [PATCH] Implement mem_fence on ptx

Jeroen Ketema via Libclc-dev libclc-dev at lists.llvm.org
Sun Oct 8 11:23:27 PDT 2017

PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored.

Index: ptx-nvidiacl/lib/SOURCES
--- ptx-nvidiacl/lib/SOURCES	(revision 315170)
+++ ptx-nvidiacl/lib/SOURCES	(working copy)
@@ -1,3 +1,4 @@
Index: ptx-nvidiacl/lib/mem_fence/fence.cl
--- ptx-nvidiacl/lib/mem_fence/fence.cl	(nonexistent)
+++ ptx-nvidiacl/lib/mem_fence/fence.cl	(working copy)
@@ -0,0 +1,14 @@
+#include <clc/clc.h>
+_CLC_DEF void mem_fence(cl_mem_fence_flags flags) {
+   __nvvm_membar_cta();
+// We do not have separate mechanism for read and write fences
+_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) {
+  mem_fence(flags);
+_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) {
+  mem_fence(flags);

More information about the Libclc-dev mailing list