[Libclc-dev] [PATCH 1/1] r600: Add fence implementation, rework barrier

Jan Vesely jan.vesely at rutgers.edu
Wed Apr 30 09:00:02 PDT 2014


On Tue, 2014-04-29 at 20:50 -0500, Aaron Watry wrote:
> On Sun, Apr 27, 2014 at 1:26 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > Hw flushes caches after each mem clause (Ch 8.2 EG manual),
> > so we need to pass the information that indicates requested
> > clause boundaries.
> > Use existing llvm fence instruction for that.
> >
> > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> > ---
> >
> > I'm not sure whether using 'acquire' ordering for reads and 'release'
> > for writes is the correct thing with respect to OpenCL specs.
> > It seems to get read-write-relation correct, but I', not sure about
> > read ordering, or write ordering. Making everything use the strongest mem_fence
> > might be safer.
> >
> > This patch changes global-memory.cl piglit crash from:
> > i32 = GlobalAddress<void ()* @llvm.AMDGPU.barrier.global> 0 [ORD=11]
> > Undefined function
> > to:
> > LLVM ERROR: Cannot select: 0x24b9f58: i32,ch = load 0x2090108, 0x24b9718, 0x24b9c40<LD1[@barrier_test_local_int.test(addrspace=3)], zext from i1> [ORD=4] [ID=13]
> >   0x24b9718: i32 = Constant<0> [ID=7]
> >   0x24b9c40: i32 = undef [ID=2]
> > In function: barrier_test_local_int
> >
> > Without the fences the piglit just fails.
> >
> >
> >  r600/lib/SOURCES                         |  1 +
> >  r600/lib/synchronization/barrier_impl.ll | 25 ++----------------
> >  r600/lib/synchronization/fence_impl.ll   | 44 ++++++++++++++++++++++++++++++++
> >  3 files changed, 47 insertions(+), 23 deletions(-)
> >  create mode 100644 r600/lib/synchronization/fence_impl.ll
> >
> > diff --git a/r600/lib/SOURCES b/r600/lib/SOURCES
> > index d9fc897..d93ec0f 100644
> > --- a/r600/lib/SOURCES
> > +++ b/r600/lib/SOURCES
> > @@ -7,4 +7,5 @@ workitem/get_local_id.ll
> >  workitem/get_global_size.ll
> >  synchronization/barrier.cl
> >  synchronization/barrier_impl.ll
> > +synchronization/fence_impl.ll
> >  shared/vload.cl
> > diff --git a/r600/lib/synchronization/barrier_impl.ll b/r600/lib/synchronization/barrier_impl.ll
> > index 3d8ee66..2f2d865 100644
> > --- a/r600/lib/synchronization/barrier_impl.ll
> > +++ b/r600/lib/synchronization/barrier_impl.ll
> > @@ -1,29 +1,8 @@
> > -declare i32 @__clc_clk_local_mem_fence() nounwind alwaysinline
> > -declare i32 @__clc_clk_global_mem_fence() nounwind alwaysinline
> > +declare void @mem_fence(i32 %flags) nounwind alwaysinline
> >  declare void @llvm.AMDGPU.barrier.local() nounwind noduplicate
> > -declare void @llvm.AMDGPU.barrier.global() nounwind noduplicate
> >
> >  define void @barrier(i32 %flags) nounwind noduplicate alwaysinline {
> > -barrier_local_test:
> > -  %CLK_LOCAL_MEM_FENCE = call i32 @__clc_clk_local_mem_fence()
> > -  %0 = and i32 %flags, %CLK_LOCAL_MEM_FENCE
> > -  %1 = icmp ne i32 %0, 0
> > -  br i1 %1, label %barrier_local, label %barrier_global_test
> > -
> > -barrier_local:
> > +  call void @mem_fence(i32 %flags)
> >    call void @llvm.AMDGPU.barrier.local() noduplicate
> > -  br label %barrier_global_test
> > -
> > -barrier_global_test:
> > -  %CLK_GLOBAL_MEM_FENCE = call i32 @__clc_clk_global_mem_fence()
> > -  %2 = and i32 %flags, %CLK_GLOBAL_MEM_FENCE
> > -  %3 = icmp ne i32 %2, 0
> > -  br i1 %3, label %barrier_global, label %done
> > -
> > -barrier_global:
> > -  call void @llvm.AMDGPU.barrier.global() noduplicate
> > -  br label %done
> > -
> > -done:
> >    ret void
> >  }
> > diff --git a/r600/lib/synchronization/fence_impl.ll b/r600/lib/synchronization/fence_impl.ll
> > new file mode 100644
> > index 0000000..a72e70e
> > --- /dev/null
> > +++ b/r600/lib/synchronization/fence_impl.ll
> > @@ -0,0 +1,44 @@
> > +declare i32 @__clc_clk_local_mem_fence() nounwind alwaysinline
> > +declare i32 @__clc_clk_global_mem_fence() nounwind alwaysinline
> > +
> > +define void @mem_fence(i32 %flags) nounwind noduplicate alwaysinline {
> > +  %CLK_GLOBAL_MEM_FENCE = call i32 @__clc_clk_global_mem_fence()
> > +  %1 = and i32 %flags, %CLK_GLOBAL_MEM_FENCE
> > +  %2 = icmp ne i32 %1, 0
> > +  br i1 %2, label %global_fence, label %exit_fence
> > +
> > +global_fence:
> > +  fence seq_cst
> > +  br label %exit_fence
> > +
> > +exit_fence:
> > +  ret void
> > +}
> > +
> > +define void @read_mem_fence(i32 %flags) nounwind noduplicate alwaysinline {
> > +  %CLK_GLOBAL_MEM_FENCE = call i32 @__clc_clk_global_mem_fence()
> > +  %1 = and i32 %flags, %CLK_GLOBAL_MEM_FENCE
> > +  %2 = icmp ne i32 %1, 0
> > +  br i1 %2, label %global_fence, label %exit_fence
> > +
> > +global_fence:
> 
> The label global_fence is defined three times in this file.  That may
> be ok (compiler isn't yelling about it, probably because they're in
> different functions), but it does rub me the wrong way...
> 
> I'll echo Tom's comments.  For now, we might end up needing to use
> seq_cst, but if we can get clarification, that'd be good.  I also get
> the same AtomicFence selection error on my machines as Tom's
> getting... but that could be that we're just running a different LLVM
> version than you are (maybe?).

That's the intended behavior for now (see comments section of the
patch).
I don't have the llvm part yet, it will take me some time to understand
how memory instructions are put into clauses, and how to use fences in
the process.

The reason I posted the patch was to get an some feedback on whether it
makes sense to use existing llvm fence intrinsics to pass the
information to llvm, and whether it makes sense to use separate
intrinsics for read/write fences, since we'll need to treat them the
same anyway (at least that's my understanding of the memory part of isa
manual).
I should have marked the patch as RFC/WIP.

My original approach implemented read/write fences by just calling
mem_fence(). That should take care of both using only seq_cst and
duplicate labels. I can post is as a v2 if you are ok with a patch that
is not really useful on its own.

regards,
Jan

> 
> Once we get your changes in, I have a bit of refactoring of most of
> barrier/[write|read]_mem_fence that I want to get finished up (given
> that working mem fences are sort of required for barriers, I was
> trying to implement those standalone functions first).
> 
> Other than the label issue, and Tom's comments, this patch looks ok to me.
> 
> --Aaron
> 
> > +  fence acquire
> > +  br label %exit_fence
> > +
> > +exit_fence:
> > +  ret void
> > +}
> > +
> > +define void @write_mem_fence(i32 %flags) nounwind noduplicate alwaysinline {
> > +  %CLK_GLOBAL_MEM_FENCE = call i32 @__clc_clk_global_mem_fence()
> > +  %1 = and i32 %flags, %CLK_GLOBAL_MEM_FENCE
> > +  %2 = icmp ne i32 %1, 0
> > +  br i1 %2, label %global_fence, label %exit_fence
> > +
> > +global_fence:
> > +  fence release
> > +  br label %exit_fence
> > +
> > +exit_fence:
> > +  ret void
> > +}
> > --
> > 1.9.0
> >
> >
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at pcc.me.uk
> > http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20140430/12e471df/attachment.sig>


More information about the Libclc-dev mailing list