[Libclc-dev] [PATCH] Add atomic_inc and atoimic_add builtins
Aaron Watry
awatry at gmail.com
Fri Aug 30 08:13:00 PDT 2013
Alright, I finally got a chance to test with that additional LDS size
calculation patch.
With current mesa, current llvm, current clang, and the following
patches applied, the piglit tests for these builtins now work
correctly:
Mesa Patch:
LDS Size calculation patch
LLVM:
R600/SI: Don't emit S_WQM_B64 instruction for compute shaders
R600: Fix incorrect LDS size calculation
R600: Expand SELECT nodes rather than custom lowering them
R600: Add support for local memory atomic add
So, I've now managed to test on both Cedar and Pitcairn without any issues.
Reviewed-By: Aaron Watry <awatry at gmail.com>
--Aaron
On Thu, Aug 22, 2013 at 3:46 PM, Aaron Watry <awatry at gmail.com> wrote:
> In that case, I probably haven't managed to test SI yet... :)
>
> I did the testing yesterday, which is before that patch hit the list.
> So if SI won't work without it, then I probably only tested evergreen
> (it's been a rough few weeks). I won't be in a situation where I can
> test my SI card before Tuesday, so from my perspective feel free to
> commit this libclc change based on the successful testing on
> evergreen. I'm also assuming that you have an SI available to test
> this on yourself.
>
> --Aaron
>
> On Thu, Aug 22, 2013 at 3:07 PM, Tom Stellard <tom at stellard.net> wrote:
>> On Thu, Aug 22, 2013 at 01:52:46PM -0500, Aaron Watry wrote:
>>> Followup: With the patches that you sent to mesa-dev yesterday, I have
>>> successfully tested this on CEDAR, and possibly on PITCAIRN. I can't
>>> quite remember if I tested PITCAIRN yesterday, and my ssh port
>>> forwarding seem to be down for the moment so I can't (re)test at this
>>> time.
>>>
>>
>> PITCAIRN probably won't work without this patch:
>> http://lists.freedesktop.org/archives/mesa-dev/2013-August/043686.html
>> Did you apply this patch when you tested?
>>
>> -Tom
>>
>>> --Aaron
>>>
>>> On Wed, Aug 21, 2013 at 11:26 AM, Aaron Watry <awatry at gmail.com> wrote:
>>> > In the Subject: s/atoimic_add/atomic_add/
>>> >
>>> > On Tue, Aug 20, 2013 at 6:21 PM, Tom Stellard <tom at stellard.net> wrote:
>>> >> From: Tom Stellard <thomas.stellard at amd.com>
>>> >>
>>> >> ---
>>> >> generic/include/clc/atomic/atomic_add.h | 3 +++
>>> >> generic/include/clc/atomic/atomic_decl.inc | 10 ++++++++++
>>> >> generic/include/clc/atomic/atomic_inc.h | 1 +
>>> >> generic/include/clc/clc.h | 4 ++++
>>> >> generic/lib/SOURCES | 1 +
>>> >> generic/lib/atomic/atomic_impl.ll | 11 +++++++++++
>>> >> r600/lib/SOURCES | 1 +
>>> >> r600/lib/atomic/atomic.cl | 20 ++++++++++++++++++++
>>> >> 8 files changed, 51 insertions(+)
>>> >> create mode 100644 generic/include/clc/atomic/atomic_add.h
>>> >> create mode 100644 generic/include/clc/atomic/atomic_decl.inc
>>> >> create mode 100644 generic/include/clc/atomic/atomic_inc.h
>>> >> create mode 100644 generic/lib/atomic/atomic_impl.ll
>>> >> create mode 100644 r600/lib/atomic/atomic.cl
>>> >>
>>> >> diff --git a/generic/include/clc/atomic/atomic_add.h b/generic/include/clc/atomic/atomic_add.h
>>> >> new file mode 100644
>>> >> index 0000000..66d8978
>>> >> --- /dev/null
>>> >> +++ b/generic/include/clc/atomic/atomic_add.h
>>> >> @@ -0,0 +1,3 @@
>>> >> +#define __CLC_FUNCTION atomic_add
>>> >> +#include <clc/atomic/atomic_decl.inc>
>>> >> +#undef __CLC_FUNCTION
>>> >> diff --git a/generic/include/clc/atomic/atomic_decl.inc b/generic/include/clc/atomic/atomic_decl.inc
>>> >> new file mode 100644
>>> >> index 0000000..03d01aa
>>> >> --- /dev/null
>>> >> +++ b/generic/include/clc/atomic/atomic_decl.inc
>>> >> @@ -0,0 +1,10 @@
>>> >> +
>>> >> +#define __CLC_DECLARE_ATOMIC(ADDRSPACE, TYPE) \
>>> >> + _CLC_OVERLOAD _CLC_DECL TYPE __CLC_FUNCTION (volatile ADDRSPACE TYPE *, TYPE);
>>> >> +
>>> >> +#define __CLC_DECLARE_ATOMIC_ADDRSPACE(TYPE) \
>>> >> + __CLC_DECLARE_ATOMIC(global, TYPE); \
>>> >> + __CLC_DECLARE_ATOMIC(local, TYPE);
>>> >> +
>>> >> +__CLC_DECLARE_ATOMIC_ADDRSPACE(int);
>>> >> +__CLC_DECLARE_ATOMIC_ADDRSPACE(uint);
>>> >> diff --git a/generic/include/clc/atomic/atomic_inc.h b/generic/include/clc/atomic/atomic_inc.h
>>> >> new file mode 100644
>>> >> index 0000000..2137391
>>> >> --- /dev/null
>>> >> +++ b/generic/include/clc/atomic/atomic_inc.h
>>> >> @@ -0,0 +1 @@
>>> >> +#define atomic_inc(p) atomic_add(p, 1);
>>> >> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>>> >> index b906245..679d73a 100644
>>> >> --- a/generic/include/clc/clc.h
>>> >> +++ b/generic/include/clc/clc.h
>>> >> @@ -96,4 +96,8 @@
>>> >> #include <clc/synchronization/cl_mem_fence_flags.h>
>>> >> #include <clc/synchronization/barrier.h>
>>> >>
>>> >> +/* 6.11.11 Atomic Functins */
>>> >> +#include <clc/atomic/atomic_add.h>
>>> >> +#include <clc/atomic/atomic_inc.h>
>>> >> +
>>> >> #pragma OPENCL EXTENSION all : disable
>>> >> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
>>> >> index 9ac08bd..2d84e02 100644
>>> >> --- a/generic/lib/SOURCES
>>> >> +++ b/generic/lib/SOURCES
>>> >> @@ -1,3 +1,4 @@
>>> >> +atomic/atomic_impl.ll
>>> >> convert.cl
>>> >> geometric/cross.cl
>>> >> geometric/dot.cl
>>> >> diff --git a/generic/lib/atomic/atomic_impl.ll b/generic/lib/atomic/atomic_impl.ll
>>> >> new file mode 100644
>>> >> index 0000000..c11383b
>>> >> --- /dev/null
>>> >> +++ b/generic/lib/atomic/atomic_impl.ll
>>> >> @@ -0,0 +1,11 @@
>>> >> +define i32 @__clc_atomic_add_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
>>> >> +entry:
>>> >> + %0 = atomicrmw volatile add i32 addrspace(1)* %ptr, i32 %value seq_cst
>>> >> + ret i32 %0
>>> >
>>> > I had this nice long response about how the function is supposed to
>>> > return the old value, and then I finally looked up the atomicrmw
>>> > instruction and realized that the LLVM reference stated that this
>>> > instruction does exactly what we need. Would you mind changing the
>>> > name of '%0' to '%old' just to prevent future readers from making the
>>> > mistake I almost made?
>>> >
>>> >> +}
>>> >> +
>>> >> +define i32 @__clc_atomic_add_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
>>> >> +entry:
>>> >> + %0 = atomicrmw volatile add i32 addrspace(3)* %ptr, i32 %value seq_cst
>>> >> + ret i32 %0
>>> >> +}
>>> >> diff --git a/r600/lib/SOURCES b/r600/lib/SOURCES
>>> >> index 38919ab..de30f6e 100644
>>> >> --- a/r600/lib/SOURCES
>>> >> +++ b/r600/lib/SOURCES
>>> >> @@ -1,3 +1,4 @@
>>> >> +atomic/atomic.cl
>>> >> workitem/get_num_groups.ll
>>> >> workitem/get_group_id.ll
>>> >> workitem/get_local_size.ll
>>> >> diff --git a/r600/lib/atomic/atomic.cl b/r600/lib/atomic/atomic.cl
>>> >> new file mode 100644
>>> >> index 0000000..e80180c
>>> >> --- /dev/null
>>> >> +++ b/r600/lib/atomic/atomic.cl
>>> >> @@ -0,0 +1,20 @@
>>> >> +#include <clc/clc.h>
>>> >> +
>>> >> +#define ATOMIC_FUNC_TYPE(SIGN, TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
>>> >> +_CLC_OVERLOAD _CLC_DEF SIGN TYPE FUNCTION (volatile CL_ADDRSPACE SIGN TYPE *p, SIGN TYPE val) { \
>>> >> + return (SIGN TYPE)__clc_##FUNCTION##_addr##LLVM_ADDRSPACE((volatile CL_ADDRSPACE signed TYPE*)p, (signed TYPE)val); \
>>> >
>>> > The presence of SIGN and signed in these functions makes me nervous...
>>> > Is that actually intended? Are we intentionally casting all unsigned
>>> > to signed in order to reduce assembly function counts? If it's
>>> > intentional, then fine. Just wanted to make sure that you meant to do
>>> > this.
>>> >
>>> > I'm assuming that there's also associated LLVM R600 back-end changes
>>> > to go with this patch-set, since my CEDAR and PITCAIRN both give me
>>> > the following when I actually try to run the new piglit tests:
>>> > LLVM ERROR: Cannot select: 0x1b33de0: i32,ch = AtomicLoadAdd
>>> > 0x1b33ce0, 0x1b32c60, 0x1b33260<Volatile LDST4[%mem]> [ORD=2] [ID=19]
>>> >
>>> > With the subject spelling correction, %0 -> %old change, and
>>> > clarification on the SIGN/signed questions:
>>> > Reviewed-by: Aaron Watry <awatry at gmail.com>
>>> >
>>> >
>>> >> +}
>>> >> +
>>> >> +#define ATOMIC_FUNC_SIGN(TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
>>> >> + _CLC_DECL signed TYPE __clc_##FUNCTION##_addr##LLVM_ADDRSPACE(volatile CL_ADDRSPACE signed TYPE*, signed TYPE); \
>>> >> + ATOMIC_FUNC_TYPE(signed, TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
>>> >> + ATOMIC_FUNC_TYPE(unsigned, TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE)
>>> >> +
>>> >> +#define ATOMIC_FUNC_ADDRSPACE(TYPE, FUNCTION) \
>>> >> + ATOMIC_FUNC_SIGN(TYPE, FUNCTION, global, 1) \
>>> >> + ATOMIC_FUNC_SIGN(TYPE, FUNCTION, local, 3)
>>> >> +
>>> >> +#define ATOMIC_FUNC(FUNCTION) \
>>> >> + ATOMIC_FUNC_ADDRSPACE(int, FUNCTION)
>>> >> +
>>> >> +ATOMIC_FUNC(atomic_add)
>>> >> --
>>> >> 1.7.11.4
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Libclc-dev mailing list
>>> >> Libclc-dev at pcc.me.uk
>>> >> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
More information about the Libclc-dev
mailing list