[Libclc-dev] [PATCH 2/3] Implement async_work_group_copy builtin

Jeroen Ketema j.ketema at imperial.ac.uk
Thu Aug 21 08:11:45 PDT 2014


On 21 Aug 2014, at 15:33, Tom Stellard <tom at stellard.net> wrote:

> On Sat, Aug 09, 2014 at 09:42:44PM +0100, Jeroen Ketema wrote:
>> 
>> On 08 Aug 2014, at 22:40, Tom Stellard <thomas.stellard at amd.com> wrote:
>> 
>>> This is a simple implementation which just copies data synchronously.
>>> ---
>>> generic/include/clc/async/async_work_group_copy.h   | 15 +++++++++++++++
>>> generic/include/clc/async/async_work_group_copy.inc |  5 +++++
>>> generic/include/clc/clc.h                           |  1 +
>>> generic/lib/SOURCES                                 |  1 +
>>> generic/lib/async/async_work_group_copy.cl          | 21 +++++++++++++++++++++
>>> generic/lib/async/async_work_group_copy.inc         | 16 ++++++++++++++++
>>> 6 files changed, 59 insertions(+)
>>> create mode 100644 generic/include/clc/async/async_work_group_copy.h
>>> create mode 100644 generic/include/clc/async/async_work_group_copy.inc
>>> create mode 100644 generic/lib/async/async_work_group_copy.cl
>>> create mode 100644 generic/lib/async/async_work_group_copy.inc
>>> 
>>> diff --git a/generic/include/clc/async/async_work_group_copy.h b/generic/include/clc/async/async_work_group_copy.h
>>> new file mode 100644
>>> index 0000000..39c637b
>>> --- /dev/null
>>> +++ b/generic/include/clc/async/async_work_group_copy.h
>>> @@ -0,0 +1,15 @@
>>> +#define __CLC_DST_ADDR_SPACE local
>>> +#define __CLC_SRC_ADDR_SPACE global
>>> +#define __CLC_BODY <clc/async/async_work_group_copy.inc>
>>> +#include <clc/async/gentype.inc>
>>> +#undef __CLC_DST_ADDR_SPACE
>>> +#undef __CLC_SRC_ADDR_SPACE
>>> +#undef __CLC_BODY
>>> +
>>> +#define __CLC_DST_ADDR_SPACE global
>>> +#define __CLC_SRC_ADDR_SPACE local
>>> +#define __CLC_BODY <clc/async/async_work_group_copy.inc>
>>> +#include <clc/async/gentype.inc>
>>> +#undef __CLC_DST_ADDR_SPACE
>>> +#undef __CLC_SRC_ADDR_SPACE
>>> +#undef __CLC_BODY
>>> diff --git a/generic/include/clc/async/async_work_group_copy.inc b/generic/include/clc/async/async_work_group_copy.inc
>>> new file mode 100644
>>> index 0000000..d85df6c
>>> --- /dev/null
>>> +++ b/generic/include/clc/async/async_work_group_copy.inc
>>> @@ -0,0 +1,5 @@
>>> +_CLC_OVERLOAD _CLC_DECL event_t async_work_group_copy(
>>> +  __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst,
>>> +  const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src,
>>> +  size_t num_gentypes,
>>> +  event_t event);
>>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>>> index f499e6d..ed741b1 100644
>>> --- a/generic/include/clc/clc.h
>>> +++ b/generic/include/clc/clc.h
>>> @@ -125,6 +125,7 @@
>>> #include <clc/synchronization/barrier.h>
>>> 
>>> /* 6.11.10 Async Copy and Prefetch Functions */
>>> +#include <clc/async/async_work_group_copy.h>
>>> #include <clc/async/prefetch.h>
>>> #include <clc/async/wait_group_events.h>
>>> 
>>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
>>> index 3e847fe..e7dbca5 100644
>>> --- a/generic/lib/SOURCES
>>> +++ b/generic/lib/SOURCES
>>> @@ -1,3 +1,4 @@
>>> +async/async_work_group_copy.cl
>>> async/prefetch.cl
>>> async/wait_group_events.cl
>>> atomic/atomic_impl.ll
>>> diff --git a/generic/lib/async/async_work_group_copy.cl b/generic/lib/async/async_work_group_copy.cl
>>> new file mode 100644
>>> index 0000000..31c71d6
>>> --- /dev/null
>>> +++ b/generic/lib/async/async_work_group_copy.cl
>>> @@ -0,0 +1,21 @@
>>> +#include <clc/clc.h>
>>> +
>>> +#ifdef cl_khr_fp64
>>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable
>>> +#endif
>>> +
>>> +#define __CLC_DST_ADDR_SPACE local
>>> +#define __CLC_SRC_ADDR_SPACE global
>>> +#define __CLC_BODY <async_work_group_copy.inc>
>>> +#include <clc/async/gentype.inc>
>>> +#undef __CLC_DST_ADDR_SPACE
>>> +#undef __CLC_SRC_ADDR_SPACE
>>> +#undef __CLC_BODY
>>> +
>>> +#define __CLC_DST_ADDR_SPACE global
>>> +#define __CLC_SRC_ADDR_SPACE local
>>> +#define __CLC_BODY <async_work_group_copy.inc>
>>> +#include <clc/async/gentype.inc>
>>> +#undef __CLC_DST_ADDR_SPACE
>>> +#undef __CLC_SRC_ADDR_SPACE
>>> +#undef __CLC_BODY
>>> diff --git a/generic/lib/async/async_work_group_copy.inc b/generic/lib/async/async_work_group_copy.inc
>>> new file mode 100644
>>> index 0000000..dd3db3f
>>> --- /dev/null
>>> +++ b/generic/lib/async/async_work_group_copy.inc
>>> @@ -0,0 +1,16 @@
>>> +_CLC_OVERLOAD _CLC_DEF event_t async_work_group_copy(
>>> +    __CLC_DST_ADDR_SPACE __CLC_GENTYPE *dst,
>>> +    const __CLC_SRC_ADDR_SPACE __CLC_GENTYPE *src,
>>> +    size_t num_gentypes,
>>> +    event_t event) {
>>> +
>>> +  // __builtin_memcpy doesn't work with address spaces, so we need to
>>> +  // implement the copy using a loop.
>>> +
>>> +  unsigned i;
>>> +  for (i = 0; i < num_gentypes; ++i) {
>>> +    dst[i] = src[i];
>>> +  }
>> 
>> If I understand this correctly, this lets every thread in the workgroup do the copy.
>> So this code has a data races if executed by more than one thread. OpenCL 1.2/1.1
>> does not say anything about the behaviour of racy code, so I’m not sure whether
>> the behaviour of this code is properly defined. If the intention is to eventually support
>> OpenCL 2.0, then the behaviour of is definitely undefined (due to the data races).
>> 
> 
> This isn't a bug in the implementation though, right?  Isn't
> it up the user to make sure the memory being written by each
> thread doesn't overlap?

I think it is a bug in the implementation. All work-items in a work-group
need to reach the the function call with exactly the same arguments:

"The async copy is performed by all work-items in a work-group and this
built-in function must therefore be encountered by all work-items in a
workgroup executing the kernel with the same argument values; otherwise
the results are undefined.”

With the code you propose, this would mean that all work-items in a work-group
will execute exactly the same copy, which means the code is racy.

Jeroen

> 
> -Tom
> 
>> Jeroen
>> 
>>> +
>>> +  return event;
>>> +}
>>> -- 
>>> 1.8.1.5
>>> 
>>> 
>>> _______________________________________________
>>> Libclc-dev mailing list
>>> Libclc-dev at pcc.me.uk
>>> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
>> 
>> 
>> _______________________________________________
>> Libclc-dev mailing list
>> Libclc-dev at pcc.me.uk
>> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev





More information about the Libclc-dev mailing list