[Libclc-dev] [PATCH 1/1] r600: Add fence implementation, rework barrier

Wed Apr 30 18:12:56 PDT 2014

On Wed, 2014-04-30 at 22:27 +0100, Jeroen Ketema wrote:
> Hi,
> 
> >> My original approach implemented read/write fences by just calling
> >> mem_fence(). That should take care of both using only seq_cst and >>
> duplicate labels. I can post is as a v2 if you are ok with a patch that
> >> is not really useful on its own. >> >> regards, >> Jan >> > You
> can't use the LLVM atomic fence instruction for this. For reasons I do
> not understand, the atomic fence LLVM instruction only impacts the
> ordering of other atomic instructions, but OpenCL mem_fence is for all
> memory accesses
> 
> I don’t think this is totally accurate, as the LLVM atomic orderings —
> and the fence takes an ordering argument — are intended to implement
> C++11 atomic orderings and those put requirements on the writes to
> non-atomic memory locations. The IR language reference heavily leans on
> the C++11 spec for this reason [0]. However, the OpenCL mem_fence also
> has a requirement on the reads that were issued before the fence, which
> is something that is not required by any of the C++11 orderings. I’m
> not totally confident in stating this, but because of this read “issue”
> it seems that the llvm fence cannot be used to implement OpenCL’s
> mem_fence.

The memory model in [1] "defines happens" before relation that uses both
program order and "synchronizes with" used in fence semantic
definition[2].
Using the fence therefore introduces ordering to instructions across
multiple execution threads, i.e every instruction A that was executed in
any thread before(PO) the fence 'happens before' every instruction B in
any thread executed after(PO) the fence.

According to the rules in [1] a READ after (PO) the fence sees the last
WRITE to the memory location from every thread (earlier writes are
blocked by happens-before within single thread PO). It may also see any
WRITE after(PO) the fence that it is not in 'happens-before' relation
with i.e any from other threads or from the same thread(before the read
in PO)

This matches the requirement that stores before the fence are committed
before the loads.

The situation with reads is a bit simpler. All reads before(PO) the
fence are in 'happens-before' relation with all writes after the fence
(PO), hence they are not allowed to see them. This matches the
requirement that all reads before the fence must be complete before any
write(PO) after the fence.

I think using acq_rel or seq_cst should be ok for OpenCL mem_fecne. If
all the loads had acquire semantics and all the stores had release
semantics (on respective memory locations) it'd be ok to use only
acquire/release fences for mem_write_fence()/mem_read_fence()
respectively.

Matt, is no AS argument a problem? OpenCL allows fences only on local
and global AS, and since we don't need any for LDS, I thought having
implicit global AS was ok.

regards,
Jan

[1] http://llvm.org/docs/LangRef.html#memmodel
[2] http://llvm.org/docs/LangRef.html#i-fence

> 
> Jeroen
> 
> [0] http://llvm.org/docs/LangRef.html#ordering

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/libclc-dev/attachments/20140430/ee00f230/attachment.sig>