[Libclc-dev] [PATCH] Implement mem_fence on ptx

Jeroen Ketema via Libclc-dev libclc-dev at lists.llvm.org
Mon Oct 9 12:21:10 PDT 2017


Updated version of the patch, which does not emit a fence if neither
CLK_GLOBAL_MEM_FENCE nor CLK_LOCAL_MEM_FENCE is
passed via the flags parameter.

Index: ptx-nvidiacl/lib/SOURCES
===================================================================
--- ptx-nvidiacl/lib/SOURCES	(revision 315193)
+++ ptx-nvidiacl/lib/SOURCES	(working copy)
@@ -1,3 +1,4 @@
+mem_fence/fence.cl
 synchronization/barrier.cl
 workitem/get_global_id.cl
 workitem/get_group_id.cl
Index: ptx-nvidiacl/lib/mem_fence/fence.cl
===================================================================
--- ptx-nvidiacl/lib/mem_fence/fence.cl	(nonexistent)
+++ ptx-nvidiacl/lib/mem_fence/fence.cl	(working copy)
@@ -0,0 +1,15 @@
+#include <clc/clc.h>
+
+_CLC_DEF void mem_fence(cl_mem_fence_flags flags) {
+   if (flags & (CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE))
+     __nvvm_membar_cta();
+}
+
+// We do not have separate mechanism for read and write fences.
+_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) {
+  mem_fence(flags);
+}
+
+_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) {
+  mem_fence(flags);
+}

> On 8 Oct 2017, at 20:23, Jeroen Ketema via Libclc-dev <libclc-dev at lists.llvm.org> wrote:
> 
> PTX does not differentiate between read and write fences. Hence, these a
> lowered to a mem_fence call. The mem_fence function compiles to the
> “member.cta” instruction, which commits all outstanding reads and writes
> of a thread such that these become visible to all other threads in the same
> CTA (i.e., work-group). The instruction does not differentiate between
> global and local memory. Hence, the flags parameter is ignored.
> 
> Index: ptx-nvidiacl/lib/SOURCES
> ===================================================================
> --- ptx-nvidiacl/lib/SOURCES	(revision 315170)
> +++ ptx-nvidiacl/lib/SOURCES	(working copy)
> @@ -1,3 +1,4 @@
> +mem_fence/fence.cl
> synchronization/barrier.cl
> workitem/get_global_id.cl
> workitem/get_group_id.cl
> Index: ptx-nvidiacl/lib/mem_fence/fence.cl
> ===================================================================
> --- ptx-nvidiacl/lib/mem_fence/fence.cl	(nonexistent)
> +++ ptx-nvidiacl/lib/mem_fence/fence.cl	(working copy)
> @@ -0,0 +1,14 @@
> +#include <clc/clc.h>
> +
> +_CLC_DEF void mem_fence(cl_mem_fence_flags flags) {
> +   __nvvm_membar_cta();
> +}
> +
> +// We do not have separate mechanism for read and write fences
> +_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) {
> +  mem_fence(flags);
> +}
> +
> +_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) {
> +  mem_fence(flags);
> +}
> 
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev



More information about the Libclc-dev mailing list