<div dir="ltr"><div dir="ltr">On Tue, Jun 11, 2019 at 12:08 PM JF Bastien via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div>I think we want option 2.: keep volatile memcpy, and implement it as touching each byte exactly once. That’s unlikely to be particularly useful for every direct-to-hardware uses, but it behaves intuitively enough that I think it’s desirable.<br></div><div></div></div></div></blockquote><div><br></div><div>As Eli pointed out, that precludes lowering a volatile memcpy into a call the memcpy library function. The usual "memcpy" library function may well use the same overlapping-memory trick, and there is no "volatile_memcpy" libc function which would provide a guarantee of not touching bytes multiple times. Perhaps it's okay to just always emit an inline loop instead of falling back to a memcpy call.</div><div><br></div><div>But, possibly option 3 would be better. Maybe it's better to force people/compiler-frontends to emit the raw load/store operations, so that it's more clear exactly what semantics are desired.</div><div></div><div><br></div><div>The fundamental issue to me is that for reasonable usages of volatile, the operand size and number of memmory instructions generated for a given operation actually <i>matters</i>. Certainly, this is a somewhat unfortunate situation, since the C standard explicitly doesn't forbid implementing any volatile access with smaller memory operations. (Which, among other issues, allows tearing as your wg21 doc nicely points out.) Nevertheless, it _is_ an important property -- required by POSIX for accesses of a volatile sig_atomic_t, even -- and is a property which LLVM/Clang does provide when dealing with volatile accesses of target-specific appropriate sizes and alignments.</div><div><br></div><div>But, what does that mean for volatile memcpy? What size should it use? Always a byte-by-byte copy? May it do larger-sized reads/writes as well? <i>Must</i> it do so? Does it have to read/write the data in order? Or can it do so in reverse order? Can it use CPU's block-copy instructions (e.g. rep movsb on x86) which may sometimes cause effectively-arbitrarily-sized memory-ops, in arbitrary order, in hardware?</div><div><br></div><div><div>If we're going to keep volatile memcpy support, possibly those other questions ought to be answered too?<br></div><div></div></div><div><br></div><div>I dunno...</div></div></div>