[Libclc-dev] [PATCH 1/4] Fix and improvements to barrier() for R600 targets
Matt Arsenault
arsenm2 at gmail.com
Sat Aug 23 11:30:03 PDT 2014
> - I have read the spec and my conclusion is that a barrier is a work-group
> syncpoint, whatever are the flags. So I think that we must have a barrier
> nofence() call.
>
> I would agree, though the spec is ambiguous. I would make it fence all
address spaces as the fallback else case for a non compile time constant
(though I remember finding that was not allowed, though I've never re-found
where in the spec that is specified. It should be a frontend warning anyway)
> - For the localglobal() stuff used everywhere, it is used to mimic how the
> closed driver seems to do. In their IR output we can see that they have
> chosen to use different pseudo-instructions for all the possibilities:
> barriers and memory fences seem to have different intrinsics according to
> the different flags and all.
This is because in AMDIL the same fence instruction with different
modifiers implements all of the variations of barrier and mem_fence. LLVM
is not aware of the hardware details of how it works and does not do any
real scheduling
> So I thought that maybe, it would be intereseting to do the same.
> Thanks to that, it is really easy to lower correctly intrinsics, and we
> have no change to do if someday some hardware has a special instruction for
> every combination (very irealistic however).
> But I can change that if you want.
>
> - I have considered making a very simple implementation of barriers with a
> call to mem_fence and the actual barrier intrinsic. But the close driver
> have special intrinsics so... ^^
As mentioned in the LLVM thread, barrier can't be used to implement a
mem_fence
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20140823/2e47d855/attachment.html>
More information about the Libclc-dev
mailing list