[Libclc-dev] [PATCH] R600: Set the noduplicate attribute on barrier() intrinsics
Tom Stellard
tom at stellard.net
Tue Oct 15 08:53:54 PDT 2013
On Tue, Oct 15, 2013 at 08:27:36AM -0500, Aaron Watry wrote:
> I may be mis-understanding the LLVM Language Reference, but can't we
> specify noduplicate with alwaysinline? Or would that produce
> undesired behavior?
>
> From the language ref (http://llvm.org/docs/LangRef.html):
>
> [quote]
> alwaysinline
> This attribute indicates that the inliner should attempt to inline
> this function into callers whenever possible, ignoring any active
> inlining size threshold for this
> caller.
>
> ...snip...
>
> noduplicate
>
> This attribute indicates that calls to the function cannot be
> duplicated. A call to a noduplicate function may be moved within its
> parent function, but may not be duplicated within its parent function.
>
> A function containing a noduplicate call may still be an inlining
> candidate, provided that the call is not duplicated by inlining. That
> implies that the function has internal linkage and only has one call
> site, so the original call is dead after inlining.
> [/quote]
>
Are you asking about adding alwaysinline to the intrinsic calls?
The patch is already setting alwaysinline on the barrier function.
> If that isn't an option, then there's also inlinehint... I'm not
> NAK'ing this patch, just asking for clarification (if you happen to
> know the answer).
>
> Also, is there an upcoming patch to implement the intrinsic for
> llvm.AMDGPU.barrier.global? I scanned recent llvm-commits posts, but
> if there's one there, I must've missed it. If there's nothing yet,
> let me know and I'll take a crack at it.... I had started looking at
> the necessary plumbing for this a while ago, but I hadn't written any
> code except a few tests.
>
I have not been working on this, and I don't have plans to work on it in
the near future.
-Tom
> --Aaron
>
> On Fri, Oct 11, 2013 at 12:35 PM, Tom Stellard <tom at stellard.net> wrote:
> > From: Tom Stellard <thomas.stellard at amd.com>
> >
> > This will prevent LLVM optimization passes from creating illegal uses
> > of the barrier() intrinsic (e.g. calling barrier() from a conditional
> > that is not executed by all threads).
> > ---
> > r600/lib/SOURCES | 1 -
> > r600/lib/synchronization/barrier.cl | 15 +++++----------
> > r600/lib/synchronization/barrier_impl.ll | 33 ++++++++++++++++++++++++--------
> > 3 files changed, 30 insertions(+), 19 deletions(-)
> >
> > diff --git a/r600/lib/SOURCES b/r600/lib/SOURCES
> > index aac6d8f..d9fc897 100644
> > --- a/r600/lib/SOURCES
> > +++ b/r600/lib/SOURCES
> > @@ -8,4 +8,3 @@ workitem/get_global_size.ll
> > synchronization/barrier.cl
> > synchronization/barrier_impl.ll
> > shared/vload.cl
> > -shared/vstore.cl
> > \ No newline at end of file
> > diff --git a/r600/lib/synchronization/barrier.cl b/r600/lib/synchronization/barrier.cl
> > index ac0b4b3..6f2900b 100644
> > --- a/r600/lib/synchronization/barrier.cl
> > +++ b/r600/lib/synchronization/barrier.cl
> > @@ -1,15 +1,10 @@
> >
> > #include <clc/clc.h>
> >
> > -void barrier_local(void);
> > -void barrier_global(void);
> > -
> > -void barrier(cl_mem_fence_flags flags) {
> > - if (flags & CLK_LOCAL_MEM_FENCE) {
> > - barrier_local();
> > - }
> > +_CLC_DEF int __clc_clk_local_mem_fence() {
> > + return CLK_LOCAL_MEM_FENCE;
> > +}
> >
> > - if (flags & CLK_GLOBAL_MEM_FENCE) {
> > - barrier_global();
> > - }
> > +_CLC_DEF int __clc_clk_global_mem_fence() {
> > + return CLK_GLOBAL_MEM_FENCE;
> > }
> > diff --git a/r600/lib/synchronization/barrier_impl.ll b/r600/lib/synchronization/barrier_impl.ll
> > index 99ac018..3d8ee66 100644
> > --- a/r600/lib/synchronization/barrier_impl.ll
> > +++ b/r600/lib/synchronization/barrier_impl.ll
> > @@ -1,12 +1,29 @@
> > -declare void @llvm.AMDGPU.barrier.local() nounwind
> > -declare void @llvm.AMDGPU.barrier.global() nounwind
> > +declare i32 @__clc_clk_local_mem_fence() nounwind alwaysinline
> > +declare i32 @__clc_clk_global_mem_fence() nounwind alwaysinline
> > +declare void @llvm.AMDGPU.barrier.local() nounwind noduplicate
> > +declare void @llvm.AMDGPU.barrier.global() nounwind noduplicate
> >
> > -define void @barrier_local() nounwind alwaysinline {
> > - call void @llvm.AMDGPU.barrier.local()
> > - ret void
> > -}
> > +define void @barrier(i32 %flags) nounwind noduplicate alwaysinline {
> > +barrier_local_test:
> > + %CLK_LOCAL_MEM_FENCE = call i32 @__clc_clk_local_mem_fence()
> > + %0 = and i32 %flags, %CLK_LOCAL_MEM_FENCE
> > + %1 = icmp ne i32 %0, 0
> > + br i1 %1, label %barrier_local, label %barrier_global_test
> > +
> > +barrier_local:
> > + call void @llvm.AMDGPU.barrier.local() noduplicate
> > + br label %barrier_global_test
> > +
> > +barrier_global_test:
> > + %CLK_GLOBAL_MEM_FENCE = call i32 @__clc_clk_global_mem_fence()
> > + %2 = and i32 %flags, %CLK_GLOBAL_MEM_FENCE
> > + %3 = icmp ne i32 %2, 0
> > + br i1 %3, label %barrier_global, label %done
> > +
> > +barrier_global:
> > + call void @llvm.AMDGPU.barrier.global() noduplicate
> > + br label %done
> >
> > -define void @barrier_global() nounwind alwaysinline {
> > - call void @llvm.AMDGPU.barrier.global()
> > +done:
> > ret void
> > }
> > --
> > 1.7.11.4
> >
> >
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at pcc.me.uk
> > http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
More information about the Libclc-dev
mailing list