[Libclc-dev] [PATCH 1/2] amdgcn: Implement {read_, write_, }mem_fence builtin
Aaron Watry via Libclc-dev
libclc-dev at lists.llvm.org
Sat Aug 12 11:18:05 PDT 2017
On Fri, Aug 11, 2017 at 8:59 PM, Jan Vesely via Libclc-dev
<libclc-dev at lists.llvm.org> wrote:
> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> ---
>
> __builtin_amdgcn_s_waitcnt path is compile tested only. I currently
> don't have machine with GCN hw and LLVM > 4
Tested on GCN 1.0 (PITCAIRN) with LLVM 6.0 svn (from sometime last week).
I ran the local-memory/global-memory piglit tests, and the conformance
basic/test_basic vload*, vstore*, and barrier tests. No change in
pass-rate.
The only ones that fail are vload_private with char/uchar/short/ushort
data types, but those failed before your series.
See below for one question/clarification.
>
> Jan
>
> amdgcn/lib/SOURCES | 2 ++
> amdgcn/lib/mem_fence/fence.cl | 32 ++++++++++++++++++++++
> amdgcn/lib/mem_fence/waitcnt.ll | 11 ++++++++
> generic/include/clc/clc.h | 3 ++
> .../clc/explicit_fence/explicit_memory_fence.h | 3 ++
> 5 files changed, 51 insertions(+)
> create mode 100644 amdgcn/lib/mem_fence/fence.cl
> create mode 100644 amdgcn/lib/mem_fence/waitcnt.ll
> create mode 100644 generic/include/clc/explicit_fence/explicit_memory_fence.h
>
> diff --git a/amdgcn/lib/SOURCES b/amdgcn/lib/SOURCES
> index 1ff5fd1..24f5949 100644
> --- a/amdgcn/lib/SOURCES
> +++ b/amdgcn/lib/SOURCES
> @@ -1,4 +1,6 @@
> math/ldexp.cl
> +mem_fence/fence.cl
> +mem_fence/waitcnt.ll
> synchronization/barrier_impl.ll
> workitem/get_global_offset.cl
> workitem/get_group_id.cl
> diff --git a/amdgcn/lib/mem_fence/fence.cl b/amdgcn/lib/mem_fence/fence.cl
> new file mode 100644
> index 0000000..f64c6e2
> --- /dev/null
> +++ b/amdgcn/lib/mem_fence/fence.cl
> @@ -0,0 +1,32 @@
> +#include <clc/clc.h>
> +
> +void __clc_amdgcn_s_waitcnt(unsigned flags);
> +
> +// Newer clang supports __builtin_amdgcn_s_waitcnt
> +#if __clang_major__ >= 5
> +# define __waitcnt(x) __builtin_amdgcn_s_waitcnt(x)
> +#else
> +# define __waitcnt(x) __clc_amdgcn_s_waitcnt(x)
> +#endif
> +
> +_CLC_DEF void mem_fence(cl_mem_fence_flags flags)
> +{
> + if (flags & CLK_GLOBAL_MEM_FENCE) {
> + // scalar mem is counted with LGKM but we don't know whether
> + // the compiler turned any loads/stores to scalar
> + __waitcnt(0);
Silly question: does waitcnt(0) also synchronize local memory?
It's possible for someone to call write_mem_fence(CLK_GLOBAL_MEM_FENCE
| CLK_GLOBAL_MEM_FENCE), and I want to make sure that doing an else-if
here is acceptable.
If the waitcnt(0) works to synchronize both global/local, then this series is:
Reviewed-By: Aaron Watry <awatry at gmail.com>
Tested-By: Aaron Watry <awatry at gmail.com>
Also, thanks for taking care of this. Both the barrier and mem_fence
functions have been bugging me for a while, and I've never gotten
around to cleaning them up.
--Aaron
> + } else if (flags & CLK_LOCAL_MEM_FENCE)
> + __waitcnt(0xff); // LGKM is [12:8]
> +}
> +#undef __waitcnt
> +
> +// We don't have separate mechanism for read and write fences
> +_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags)
> +{
> + mem_fence(flags);
> +}
> +
> +_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags)
> +{
> + mem_fence(flags);
> +}
> diff --git a/amdgcn/lib/mem_fence/waitcnt.ll b/amdgcn/lib/mem_fence/waitcnt.ll
> new file mode 100644
> index 0000000..8be7f18
> --- /dev/null
> +++ b/amdgcn/lib/mem_fence/waitcnt.ll
> @@ -0,0 +1,11 @@
> +declare void @llvm.amdgcn.s.waitcnt(i32) #0
> +
> +; Export waitcnt intrinsic for clang < 5
> +define void @__clc_amdgcn_s_waitcnt(i32 %flags) #1 {
> +entry:
> + tail call void @llvm.amdgcn.s.waitcnt(i32 %flags)
> + ret void
> +}
> +
> +attributes #0 = { nounwind }
> +attributes #1 = { nounwind alwaysinline }
> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
> index 5130632..deb9d70 100644
> --- a/generic/include/clc/clc.h
> +++ b/generic/include/clc/clc.h
> @@ -179,6 +179,9 @@
> #include <clc/synchronization/cl_mem_fence_flags.h>
> #include <clc/synchronization/barrier.h>
>
> +/* 6.11.9 Explicit Memory Fence Functions */
> +#include <clc/explicit_fence/explicit_memory_fence.h>
> +
> /* 6.11.10 Async Copy and Prefetch Functions */
> #include <clc/async/async_work_group_copy.h>
> #include <clc/async/async_work_group_strided_copy.h>
> diff --git a/generic/include/clc/explicit_fence/explicit_memory_fence.h b/generic/include/clc/explicit_fence/explicit_memory_fence.h
> new file mode 100644
> index 0000000..8e046b1
> --- /dev/null
> +++ b/generic/include/clc/explicit_fence/explicit_memory_fence.h
> @@ -0,0 +1,3 @@
> +_CLC_DECL void mem_fence(cl_mem_fence_flags flags);
> +_CLC_DECL void read_mem_fence(cl_mem_fence_flags flags);
> +_CLC_DECL void write_mem_fence(cl_mem_fence_flags flags);
> --
> 2.9.4
>
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
More information about the Libclc-dev
mailing list