[Libclc-dev] [PATCH 1/4] Fix and improvements to barrier() for R600 targets
damien.hilloulin at epfl.ch
Sat Aug 23 11:37:58 PDT 2014
Le 23/08/2014 20:30, Matt Arsenault a écrit :
> - I have read the spec and my conclusion is that a barrier is a
> work-group syncpoint, whatever are the flags. So I think that we
> must have a barrier nofence() call.
> I would agree, though the spec is ambiguous. I would make it fence all
> address spaces as the fallback else case for a non compile time
> constant (though I remember finding that was not allowed, though I've
> never re-found where in the spec that is specified. It should be a
> frontend warning anyway)
I have seen that when flags is 0, the closed driver queues a memory
fence for local and global. So I think we should do like you say. I will
> - For the localglobal() stuff used everywhere, it is used to mimic
> how the closed driver seems to do. In their IR output we can see
> that they have chosen to use different pseudo-instructions for all
> the possibilities: barriers and memory fences seem to have
> different intrinsics according to the different flags and all.
> This is because in AMDIL the same fence instruction with different
> modifiers implements all of the variations of barrier and mem_fence.
> LLVM is not aware of the hardware details of how it works and does not
> do any real scheduling
Ok, it is certainly a better way of doing this.
> So I thought that maybe, it would be intereseting to do the same.
> Thanks to that, it is really easy to lower correctly intrinsics,
> and we have no change to do if someday some hardware has a special
> instruction for every combination (very irealistic however).
> But I can change that if you want.
> - I have considered making a very simple implementation of
> barriers with a call to mem_fence and the actual barrier
> intrinsic. But the close driver have special intrinsics so... ^^
> As mentioned in the LLVM thread, barrier can't be used to implement a
Yeah I know. It was a wrong first implementation for sure. But barriers
queue memory fences right? So it could be possible to implement like a
memory fence then a sync point, isn't it?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Libclc-dev