<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto" role="textbox" aria-label="Message Body"><div dir="ltr"><br></div><div dir="ltr"><br><blockquote type="cite">On Jun 12, 2019, at 9:38 PM, James Y Knight <jyknight@google.com> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div dir="ltr">On Tue, Jun 11, 2019 at 12:08 PM JF Bastien via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div>I think we want option 2.: keep volatile memcpy, and implement it as touching each byte exactly once. That’s unlikely to be particularly useful for every direct-to-hardware uses, but it behaves intuitively enough that I think it’s desirable.<br></div><div></div></div></div></blockquote><div><br></div><div>As Eli pointed out, that precludes lowering a volatile memcpy into a call the memcpy library function. The usual "memcpy" library function may well use the same overlapping-memory trick, and there is no "volatile_memcpy" libc function which would provide a guarantee of not touching bytes multiple times. Perhaps it's okay to just always emit an inline loop instead of falling back to a memcpy call.</div></div></div></div></blockquote><div><br></div><div>In which circumstances does this matter?</div><div><br></div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>But, possibly option 3 would be better. Maybe it's better to force people/compiler-frontends to emit the raw load/store operations, so that it's more clear exactly what semantics are desired.</div><div></div><div><br></div><div>The fundamental issue to me is that for reasonable usages of volatile, the operand size and number of memmory instructions generated for a given operation actually <i>matters</i>. Certainly, this is a somewhat unfortunate situation, since the C standard explicitly doesn't forbid implementing any volatile access with smaller memory operations. (Which, among other issues, allows tearing as your wg21 doc nicely points out.) Nevertheless, it _is_ an important property -- required by POSIX for accesses of a volatile sig_atomic_t, even -- and is a property which LLVM/Clang does provide when dealing with volatile accesses of target-specific appropriate sizes and alignments.</div><div><br></div><div>But, what does that mean for volatile memcpy? What size should it use?</div></div></div></div></blockquote><div><br></div><div>Any size that makes sense to HW. </div><div><br></div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>Always a byte-by-byte copy?</div></div></div></div></blockquote><div><br></div><div>It can. </div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>May it do larger-sized reads/writes as well?</div></div></div></div></blockquote><div><br></div><div>Any size, but no larger than memcpy’s size parameter specified. </div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div> <i>Must</i> it do so?</div></div></div></div></blockquote><div><br></div><div>No, but it has to be sensible (whatever that means). </div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>Does it have to read/write the data in order?</div></div></div></div></blockquote><div><br></div><div>No. </div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div> Or can it do so in reverse order?</div></div></div></div></blockquote><div><br></div><div>Yes. </div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div> Can it use CPU's block-copy instructions (e.g. rep movsb on x86) which may sometimes cause effectively-arbitrarily-sized memory-ops, in arbitrary order, in hardware?</div></div></div></div></blockquote><div><br></div><div>Sure. </div><div><br></div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div><div>If we're going to keep volatile memcpy support, possibly those other questions ought to be answered too?</div></div></div></div></div></blockquote><div><br></div><div>Paul McKenney has a follow on paper (linked from R2 of mine) which addresses some of your questions I think. LLVM can do what it wants for now since there’s no standard, but there’s likely to be one eventually and we probably should match what it’s likely to be. </div><div><br></div><div><br></div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>I dunno...</div></div></div>

</div></blockquote></body></html>