[Libclc-dev] [PATCH] Implement mem_fence on ptx
Jeroen Ketema via Libclc-dev
libclc-dev at lists.llvm.org
Mon Oct 9 12:38:37 PDT 2017
> On 9 Oct 2017, at 21:36, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>
> On Mon, 2017-10-09 at 21:21 +0200, Jeroen Ketema via Libclc-dev wrote:
>> Updated version of the patch, which does not emit a fence if neither
>> CLK_GLOBAL_MEM_FENCE nor CLK_LOCAL_MEM_FENCE is
>> passed via the flags parameter.
>
> Can you include the explanation/description from v1 in the commit
> message?
I will.
Jeroen
> Reviewed-by: Jan Vesely <jan.vesely at rutgers.edu <mailto:jan.vesely at rutgers.edu>>
>
> Jan
>
>>
>> Index: ptx-nvidiacl/lib/SOURCES
>> ===================================================================
>> --- ptx-nvidiacl/lib/SOURCES (revision 315193)
>> +++ ptx-nvidiacl/lib/SOURCES (working copy)
>> @@ -1,3 +1,4 @@
>> +mem_fence/fence.cl
>> synchronization/barrier.cl
>> workitem/get_global_id.cl
>> workitem/get_group_id.cl
>> Index: ptx-nvidiacl/lib/mem_fence/fence.cl
>> ===================================================================
>> --- ptx-nvidiacl/lib/mem_fence/fence.cl (nonexistent)
>> +++ ptx-nvidiacl/lib/mem_fence/fence.cl (working copy)
>> @@ -0,0 +1,15 @@
>> +#include <clc/clc.h>
>> +
>> +_CLC_DEF void mem_fence(cl_mem_fence_flags flags) {
>> + if (flags & (CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE))
>> + __nvvm_membar_cta();
>> +}
>> +
>> +// We do not have separate mechanism for read and write fences.
>> +_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) {
>> + mem_fence(flags);
>> +}
>> +
>> +_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) {
>> + mem_fence(flags);
>> +}
>>
>>> On 8 Oct 2017, at 20:23, Jeroen Ketema via Libclc-dev <libclc-dev at lists.llvm.org> wrote:
>>>
>>> PTX does not differentiate between read and write fences. Hence, these a
>>> lowered to a mem_fence call. The mem_fence function compiles to the
>>> “member.cta” instruction, which commits all outstanding reads and writes
>>> of a thread such that these become visible to all other threads in the same
>>> CTA (i.e., work-group). The instruction does not differentiate between
>>> global and local memory. Hence, the flags parameter is ignored.
>>>
>>> Index: ptx-nvidiacl/lib/SOURCES
>>> ===================================================================
>>> --- ptx-nvidiacl/lib/SOURCES (revision 315170)
>>> +++ ptx-nvidiacl/lib/SOURCES (working copy)
>>> @@ -1,3 +1,4 @@
>>> +mem_fence/fence.cl
>>> synchronization/barrier.cl
>>> workitem/get_global_id.cl
>>> workitem/get_group_id.cl
>>> Index: ptx-nvidiacl/lib/mem_fence/fence.cl
>>> ===================================================================
>>> --- ptx-nvidiacl/lib/mem_fence/fence.cl (nonexistent)
>>> +++ ptx-nvidiacl/lib/mem_fence/fence.cl (working copy)
>>> @@ -0,0 +1,14 @@
>>> +#include <clc/clc.h>
>>> +
>>> +_CLC_DEF void mem_fence(cl_mem_fence_flags flags) {
>>> + __nvvm_membar_cta();
>>> +}
>>> +
>>> +// We do not have separate mechanism for read and write fences
>>> +_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) {
>>> + mem_fence(flags);
>>> +}
>>> +
>>> +_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) {
>>> + mem_fence(flags);
>>> +}
>>>
>>> _______________________________________________
>>> Libclc-dev mailing list
>>> Libclc-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
>>
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at lists.llvm.org <mailto:Libclc-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20171009/0c68fbbe/attachment-0001.html>
More information about the Libclc-dev
mailing list