[Libclc-dev] [PATCH] Implement mem_fence on ptx

Mon Oct 9 12:38:37 PDT 2017

> On 9 Oct 2017, at 21:36, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> 
> On Mon, 2017-10-09 at 21:21 +0200, Jeroen Ketema via Libclc-dev wrote:
>> Updated version of the patch, which does not emit a fence if neither
>> CLK_GLOBAL_MEM_FENCE nor CLK_LOCAL_MEM_FENCE is
>> passed via the flags parameter.
> 
> Can you include the explanation/description from v1 in the commit
> message?

I will.

Jeroen

> Reviewed-by: Jan Vesely <jan.vesely at rutgers.edu <mailto:jan.vesely at rutgers.edu>>
> 
> Jan
> 
>> 
>> Index: ptx-nvidiacl/lib/SOURCES
>> ===================================================================
>> --- ptx-nvidiacl/lib/SOURCES	(revision 315193)
>> +++ ptx-nvidiacl/lib/SOURCES	(working copy)
>> @@ -1,3 +1,4 @@
>> +mem_fence/fence.cl
>> synchronization/barrier.cl
>> workitem/get_global_id.cl
>> workitem/get_group_id.cl
>> Index: ptx-nvidiacl/lib/mem_fence/fence.cl
>> ===================================================================
>> --- ptx-nvidiacl/lib/mem_fence/fence.cl	(nonexistent)
>> +++ ptx-nvidiacl/lib/mem_fence/fence.cl	(working copy)
>> @@ -0,0 +1,15 @@
>> +#include <clc/clc.h>
>> +
>> +_CLC_DEF void mem_fence(cl_mem_fence_flags flags) {
>> +   if (flags & (CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE))
>> +     __nvvm_membar_cta();
>> +}
>> +
>> +// We do not have separate mechanism for read and write fences.
>> +_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) {
>> +  mem_fence(flags);
>> +}
>> +
>> +_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) {
>> +  mem_fence(flags);
>> +}
>> 
>>> On 8 Oct 2017, at 20:23, Jeroen Ketema via Libclc-dev <libclc-dev at lists.llvm.org> wrote:
>>> 
>>> PTX does not differentiate between read and write fences. Hence, these a
>>> lowered to a mem_fence call. The mem_fence function compiles to the
>>> “member.cta” instruction, which commits all outstanding reads and writes
>>> of a thread such that these become visible to all other threads in the same
>>> CTA (i.e., work-group). The instruction does not differentiate between
>>> global and local memory. Hence, the flags parameter is ignored.
>>> 
>>> Index: ptx-nvidiacl/lib/SOURCES
>>> ===================================================================
>>> --- ptx-nvidiacl/lib/SOURCES	(revision 315170)
>>> +++ ptx-nvidiacl/lib/SOURCES	(working copy)
>>> @@ -1,3 +1,4 @@
>>> +mem_fence/fence.cl
>>> synchronization/barrier.cl
>>> workitem/get_global_id.cl
>>> workitem/get_group_id.cl
>>> Index: ptx-nvidiacl/lib/mem_fence/fence.cl
>>> ===================================================================
>>> --- ptx-nvidiacl/lib/mem_fence/fence.cl	(nonexistent)
>>> +++ ptx-nvidiacl/lib/mem_fence/fence.cl	(working copy)
>>> @@ -0,0 +1,14 @@
>>> +#include <clc/clc.h>
>>> +
>>> +_CLC_DEF void mem_fence(cl_mem_fence_flags flags) {
>>> +   __nvvm_membar_cta();
>>> +}
>>> +
>>> +// We do not have separate mechanism for read and write fences
>>> +_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) {
>>> +  mem_fence(flags);
>>> +}
>>> +
>>> +_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) {
>>> +  mem_fence(flags);
>>> +}
>>> 
>>> _______________________________________________
>>> Libclc-dev mailing list
>>> Libclc-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>> 
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org <mailto:Libclc-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20171009/0c68fbbe/attachment-0001.html>