[Libclc-dev] [PATCH 1/4] Fix and improvements to barrier() for R600 targets

Sat Aug 23 11:37:58 PDT 2014

Le 23/08/2014 20:30, Matt Arsenault a écrit :
>
>
>     - I have read the spec and my conclusion is that a barrier is a
>     work-group syncpoint, whatever are the flags. So I think that we
>     must have a barrier nofence() call.
>
> I would agree, though the spec is ambiguous. I would make it fence all 
> address spaces as the fallback else case for a non compile time 
> constant (though I remember finding that was not allowed, though I've 
> never re-found where in the spec that is specified. It should be a 
> frontend warning anyway)
I have seen that when flags is 0, the closed driver queues a memory 
fence for local and global. So I think we should do like you say. I will 
change that.

>     - For the localglobal() stuff used everywhere, it is used to mimic
>     how the closed driver seems to do. In their IR output we can see
>     that they have chosen to use different pseudo-instructions for all
>     the possibilities: barriers and memory fences seem to have
>     different intrinsics according to the different flags and all.
>
>
> This is because in AMDIL the same fence instruction with different 
> modifiers implements all of the variations of barrier and mem_fence. 
> LLVM is not aware of the hardware details of how it works and does not 
> do any real scheduling
Ok, it is certainly a better way of doing this.
>
>      So I thought that maybe, it would be intereseting to do the same.
>     Thanks to that, it is really easy to lower correctly intrinsics,
>     and we have no change to do if someday some hardware has a special
>     instruction for every combination (very irealistic however).
>     But I can change that if you want.
>
>     - I have considered making a very simple implementation of
>     barriers with a call to mem_fence and the actual barrier
>     intrinsic. But the close driver have special intrinsics so... ^^
>
>
> As mentioned in the LLVM thread, barrier can't be used to implement a 
> mem_fence
Yeah I know. It was a wrong first implementation for sure. But barriers 
queue memory fences right? So it could be possible to implement like a 
memory fence then a sync point, isn't it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20140823/beaa8d7b/attachment.html>