[Libclc-dev] [PATCH] R600: Use new barrier intrinsic

Aaron Watry via Libclc-dev libclc-dev at lists.llvm.org
Fri Jul 15 11:06:04 PDT 2016


On Fri, Jul 15, 2016 at 11:03 AM, Jan Vesely <jan.vesely at rutgers.edu> wrote:

> On Thu, 2016-07-14 at 23:48 -0700, Matt Arsenault via Libclc-dev wrote:
> > >
> > > On Jul 14, 2016, at 20:03, Aaron Watry <awatry at gmail.com> wrote:
> > >
> > > Re-adding libclc to CC list.... oops.
> > >
> > > On Thu, Jul 14, 2016 at 10:03 PM, Aaron Watry <awatry at gmail.com
> > > <mailto:awatry at gmail.com>> wrote:
> > > From 02c998f017107153c6dc78b8eb58192bfc338d80 Mon Sep 17 00:00:00
> > > 2001
> > > From: Matt Arsenault <matthew.arsenault at Amd.com>
> > > Date: Wed, 13 Jul 2016 21:07:53 -0700
> > > Subject: [PATCH] R600: Use new barrier intrinsic
> > >
> > > ---
> > >  r600/lib/synchronization/barrier_impl.ll | 7 +++----
> > >  1 file changed, 3 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/r600/lib/synchronization/barrier_impl.ll
> > > b/r600/lib/synchronization/barrier_impl.ll
> > > index 825b2eb..9b8fefb 100644
> > > --- a/r600/lib/synchronization/barrier_impl.ll
> > > +++ b/r600/lib/synchronization/barrier_impl.ll
> > > @@ -1,7 +1,6 @@
> > >  declare i32 @__clc_clk_local_mem_fence() #1
> > >  declare i32 @__clc_clk_global_mem_fence() #1
> > > -declare void @llvm.AMDGPU.barrier.local() #0
> > > -declare void @llvm.AMDGPU.barrier.global() #0
> > > +declare void @llvm.r600.group.barrier() #0
> > >
> > >  define void @barrier(i32 %flags) #2 {
> > >  barrier_local_test:
> > > @@ -11,7 +10,7 @@ barrier_local_test:
> > >    br i1 %1, label %barrier_local, label %barrier_global_test
> > >
> > >  barrier_local:
> > > -  call void @llvm.AMDGPU.barrier.local()
> > > +  call void @llvm.r600.group.barrier()
> > >
> > > Please pardon my ignorance, but does the new intrinsic handle
> > > synchronizing writes/reads to local/global memory along with thread
> > > synchronization?
> > >
> > > If that is the case, could we modify the barrier_local portion to
> > > jump straight to done, so that we don't end up inserting two
> > > barriers if someone calls
> > > barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);
> > >
> > > I don't have my NI card installed at the moment, but could if
> > > necessary to test things out.
> > >
> > > --Aaron
> > >
> >
> > No, but the old intrinsics didn’t either. There is only one
> > synchronization instruction, the memory fence is a different problem.
> > I don’t think it actually matters with the constraint that global
> > memory is only consistent within a workgroup, although I”m less sure
> > how the old hardware caches worked
>
> EG/NI both reload/flush L1 (VC and TC) caches at the beginning and end
> of every fetch clause. We can make the barrier instruction wait for
> previous CF instructions (bit 31 CF_WORD0), but afaik it only waits for
> instruction to complete (i.e I don't think it waits for the flush
> complete).
>
> There are _ACK versions of VC/TC fetch clauses, and WAIT_ACK
> instruction as fence, although I haven't found decisive info on whether
> these do wait for flushes to complete (or how they differ from using
> barrier bit).
>
> I agree with the point about issuing two barrier instructions. IMO the
> correct implementation of this functions for r600 would be:
>
> void barrier(cl_mem_fence_flags flags) {
>   mem_fence(flags);
>   __builtin_r600_barrier();
> }
>
> the current implementation also fails to issue any barrier if flags ==
> 0, so reducing it to:
>
> define void @barrier(i32 %flags) #2 {
>   call void @llvm.r600.group.barrier()
>   ret void
> }
>
> would be an improvement
>

Yeah, agreed.  I've written code in the past that attempted to do
barrier(0) to synchronize threads in a group without requiring
synchronization of local/global memory...  I'm not saying it was good code,
but the barrier was required in that case.

I'd be fine with changing to the implementation you suggested.  I know it's
not the first time we've discussed fixing our barrier implementation.
Hopefully we actually finish it this time :)

--Aaron


>
>
> jan
> >
> > -Matt
> >
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/libclc-dev
> --
> Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20160715/541757ca/attachment-0001.html>


More information about the Libclc-dev mailing list