[Libclc-dev] [PATCH 2/3] Implement async_work_group_copy builtin
Tom Stellard
tom at stellard.net
Thu Aug 21 07:33:21 PDT 2014
On Sat, Aug 09, 2014 at 09:42:44PM +0100, Jeroen Ketema wrote:
>
> On 08 Aug 2014, at 22:40, Tom Stellard <thomas.stellard at amd.com> wrote:
>
> > This is a simple implementation which just copies data synchronously.
> > ---
> > generic/include/clc/async/async_work_group_copy.h | 15 +++++++++++++++
> > generic/include/clc/async/async_work_group_copy.inc | 5 +++++
> > generic/include/clc/clc.h | 1 +
> > generic/lib/SOURCES | 1 +
> > generic/lib/async/async_work_group_copy.cl | 21 +++++++++++++++++++++
> > generic/lib/async/async_work_group_copy.inc | 16 ++++++++++++++++
> > 6 files changed, 59 insertions(+)
> > create mode 100644 generic/include/clc/async/async_work_group_copy.h
> > create mode 100644 generic/include/clc/async/async_work_group_copy.inc
> > create mode 100644 generic/lib/async/async_work_group_copy.cl
> > create mode 100644 generic/lib/async/async_work_group_copy.inc
> >
> > diff --git a/generic/include/clc/async/async_work_group_copy.h b/generic/include/clc/async/async_work_group_copy.h
> > new file mode 100644
> > index 0000000..39c637b
> > --- /dev/null
> > +++ b/generic/include/clc/async/async_work_group_copy.h
> > @@ -0,0 +1,15 @@
> > +#define __CLC_DST_ADDR_SPACE local
> > +#define __CLC_SRC_ADDR_SPACE global
> > +#define __CLC_BODY <clc/async/async_work_group_copy.inc>
> > +#include <clc/async/gentype.inc>
> > +#undef __CLC_DST_ADDR_SPACE
> > +#undef __CLC_SRC_ADDR_SPACE
> > +#undef __CLC_BODY
> > +
> > +#define __CLC_DST_ADDR_SPACE global
> > +#define __CLC_SRC_ADDR_SPACE local
> > +#define __CLC_BODY <clc/async/async_work_group_copy.inc>
> > +#include <clc/async/gentype.inc>
> > +#undef __CLC_DST_ADDR_SPACE
> > +#undef __CLC_SRC_ADDR_SPACE
> > +#undef __CLC_BODY
> > diff --git a/generic/include/clc/async/async_work_group_copy.inc b/generic/include/clc/async/async_work_group_copy.inc
> > new file mode 100644
> > index 0000000..d85df6c
> > --- /dev/null
> > +++ b/generic/include/clc/async/async_work_group_copy.inc
> > @@ -0,0 +1,5 @@
> > +_CLC_OVERLOAD _CLC_DECL event_t async_work_group_copy(
> > + __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst,
> > + const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src,
> > + size_t num_gentypes,
> > + event_t event);
> > diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
> > index f499e6d..ed741b1 100644
> > --- a/generic/include/clc/clc.h
> > +++ b/generic/include/clc/clc.h
> > @@ -125,6 +125,7 @@
> > #include <clc/synchronization/barrier.h>
> >
> > /* 6.11.10 Async Copy and Prefetch Functions */
> > +#include <clc/async/async_work_group_copy.h>
> > #include <clc/async/prefetch.h>
> > #include <clc/async/wait_group_events.h>
> >
> > diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
> > index 3e847fe..e7dbca5 100644
> > --- a/generic/lib/SOURCES
> > +++ b/generic/lib/SOURCES
> > @@ -1,3 +1,4 @@
> > +async/async_work_group_copy.cl
> > async/prefetch.cl
> > async/wait_group_events.cl
> > atomic/atomic_impl.ll
> > diff --git a/generic/lib/async/async_work_group_copy.cl b/generic/lib/async/async_work_group_copy.cl
> > new file mode 100644
> > index 0000000..31c71d6
> > --- /dev/null
> > +++ b/generic/lib/async/async_work_group_copy.cl
> > @@ -0,0 +1,21 @@
> > +#include <clc/clc.h>
> > +
> > +#ifdef cl_khr_fp64
> > +#pragma OPENCL EXTENSION cl_khr_fp64 : enable
> > +#endif
> > +
> > +#define __CLC_DST_ADDR_SPACE local
> > +#define __CLC_SRC_ADDR_SPACE global
> > +#define __CLC_BODY <async_work_group_copy.inc>
> > +#include <clc/async/gentype.inc>
> > +#undef __CLC_DST_ADDR_SPACE
> > +#undef __CLC_SRC_ADDR_SPACE
> > +#undef __CLC_BODY
> > +
> > +#define __CLC_DST_ADDR_SPACE global
> > +#define __CLC_SRC_ADDR_SPACE local
> > +#define __CLC_BODY <async_work_group_copy.inc>
> > +#include <clc/async/gentype.inc>
> > +#undef __CLC_DST_ADDR_SPACE
> > +#undef __CLC_SRC_ADDR_SPACE
> > +#undef __CLC_BODY
> > diff --git a/generic/lib/async/async_work_group_copy.inc b/generic/lib/async/async_work_group_copy.inc
> > new file mode 100644
> > index 0000000..dd3db3f
> > --- /dev/null
> > +++ b/generic/lib/async/async_work_group_copy.inc
> > @@ -0,0 +1,16 @@
> > +_CLC_OVERLOAD _CLC_DEF event_t async_work_group_copy(
> > + __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst,
> > + const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src,
> > + size_t num_gentypes,
> > + event_t event) {
> > +
> > + // __builtin_memcpy doesn't work with address spaces, so we need to
> > + // implement the copy using a loop.
> > +
> > + unsigned i;
> > + for (i = 0; i < num_gentypes; ++i) {
> > + dst[i] = src[i];
> > + }
>
> If I understand this correctly, this lets every thread in the workgroup do the copy.
> So this code has a data races if executed by more than one thread. OpenCL 1.2/1.1
> does not say anything about the behaviour of racy code, so I’m not sure whether
> the behaviour of this code is properly defined. If the intention is to eventually support
> OpenCL 2.0, then the behaviour of is definitely undefined (due to the data races).
>
This isn't a bug in the implementation though, right? Isn't
it up the user to make sure the memory being written by each
thread doesn't overlap?
-Tom
> Jeroen
>
> > +
> > + return event;
> > +}
> > --
> > 1.8.1.5
> >
> >
> > _______________________________________________
> > Libclc-dev mailing list
> > Libclc-dev at pcc.me.uk
> > http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
>
>
> _______________________________________________
> Libclc-dev mailing list
> Libclc-dev at pcc.me.uk
> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
More information about the Libclc-dev
mailing list