[Libclc-dev] [PATCH 1/3] Implement wait_group_events builtin v2

Tom Stellard tom at stellard.net
Tue Sep 30 07:15:15 PDT 2014


On Mon, Sep 29, 2014 at 04:56:04PM +0100, Jeroen Ketema wrote:
> 
> On 25 Sep 2014, at 16:02, Tom Stellard <tom at stellard.net> wrote:
> 
> > On Wed, Sep 24, 2014 at 11:00:27AM +0100, Jeroen Ketema wrote:
> >> 
> >> This looks good, but I think it suffices to use a barrier with an empty set of flags, as the function only requires the copy to have completed. Not that every memory operation arising from the copy has been committed, as I wrote before.
> >> 
> > 
> > I don't quite understand the difference here, can you give me an
> > example?
> 
> See the top of this page:
> 
>  https://www.khronos.org/message_boards/showthread.php/5875-Profiling-Code/page2
> 
> Thinking a bit on how this might impact the assembly generated for SI, I must admit I’m not quite sure. First I was thinking that this implied that I could write something like the following in pseudo-SI assembly when copying from local to global memory (for brevity I’m omitting any loop that might be needed):
> 
>   ds_read
>   s_waitcnt lgkmcnt(0)
>   buffer_store
>   s_barrier
> 
> but you might argue that a
> 
>   s_waitcnt vm_cnt(0)
> 
> is needed before the s_barrier, because only then is the copy complete (in the sense that we can use the used VGPRs for something else). In any case barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); is definitely the safe thing to do.
> 

Ok, thanks for the explanation.  Since this is a generic implementation,
I think I'll keep the barrier() call as is, since it is the safest thing
to do.  We can look at adding an optimized version for SI later.

-Tom

> Jeroen
> 
> > 
> >> The above is provided that a barrier call is not removed completely in case of an empty set of flags, which I recall is currently the case for R600. I think there was a patch for that at some point; not sure what happened to that patch though.
> >> 
> >> Jeroen
> >> 
> >> On 24 Sep 2014, at 01:42, Tom Stellard <thomas.stellard at amd.com> wrote:
> >> 
> >>> This is a simple default implemetation which just calls barrier().
> >>> 
> >>> v2:
> >>> - Only call barrier() once.
> >>> ---
> >>> generic/include/clc/async/wait_group_events.h | 1 +
> >>> generic/include/clc/clc.h                     | 1 +
> >>> generic/lib/SOURCES                           | 1 +
> >>> generic/lib/async/wait_group_events.cl        | 5 +++++
> >>> 4 files changed, 8 insertions(+)
> >>> create mode 100644 generic/include/clc/async/wait_group_events.h
> >>> create mode 100644 generic/lib/async/wait_group_events.cl
> >>> 
> >>> diff --git a/generic/include/clc/async/wait_group_events.h b/generic/include/clc/async/wait_group_events.h
> >>> new file mode 100644
> >>> index 0000000..799efa0
> >>> --- /dev/null
> >>> +++ b/generic/include/clc/async/wait_group_events.h
> >>> @@ -0,0 +1 @@
> >>> +void wait_group_events(int num_events, event_t *event_list);
> >>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
> >>> index b8c1cb9..0dccf53 100644
> >>> --- a/generic/include/clc/clc.h
> >>> +++ b/generic/include/clc/clc.h
> >>> @@ -138,6 +138,7 @@
> >>> 
> >>> /* 6.11.10 Async Copy and Prefetch Functions */
> >>> #include <clc/async/prefetch.h>
> >>> +#include <clc/async/wait_group_events.h>
> >>> 
> >>> /* 6.11.11 Atomic Functions */
> >>> #include <clc/atomic/atomic_add.h>
> >>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
> >>> index e4ba1d1..cefef94 100644
> >>> --- a/generic/lib/SOURCES
> >>> +++ b/generic/lib/SOURCES
> >>> @@ -1,4 +1,5 @@
> >>> async/prefetch.cl
> >>> +async/wait_group_events.cl
> >>> atomic/atomic_impl.ll
> >>> cl_khr_global_int32_base_atomics/atom_add.cl
> >>> cl_khr_global_int32_base_atomics/atom_dec.cl
> >>> diff --git a/generic/lib/async/wait_group_events.cl b/generic/lib/async/wait_group_events.cl
> >>> new file mode 100644
> >>> index 0000000..05c9d58
> >>> --- /dev/null
> >>> +++ b/generic/lib/async/wait_group_events.cl
> >>> @@ -0,0 +1,5 @@
> >>> +#include <clc/clc.h>
> >>> +
> >>> +_CLC_DEF void wait_group_events(int num_events, event_t *event_list) {
> >>> +  barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
> >>> +}
> >>> -- 
> >>> 1.8.5.5
> >>> 
> >>> 
> >>> _______________________________________________
> >>> Libclc-dev mailing list
> >>> Libclc-dev at pcc.me.uk
> >>> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
> >> 
> >> 
> >> _______________________________________________
> >> Libclc-dev mailing list
> >> Libclc-dev at pcc.me.uk
> >> http://www.pcc.me.uk/cgi-bin/mailman/listinfo/libclc-dev
> 




More information about the Libclc-dev mailing list