<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jun 13, 2019, at 8:08 AM, James Y Knight <<a href="mailto:jyknight@google.com" class="">jyknight@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">On Thu, Jun 13, 2019 at 12:54 AM JF Bastien <<a href="mailto:jfbastien@apple.com" target="_blank" class="">jfbastien@apple.com</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto" class=""><div dir="ltr" class=""><br class=""></div><div dir="ltr" class=""><br class=""><blockquote type="cite" class="">On Jun 12, 2019, at 9:38 PM, James Y Knight <<a href="mailto:jyknight@google.com" target="_blank" class="">jyknight@google.com</a>> wrote:<br class=""><br class=""></blockquote></div><blockquote type="cite" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class="">On Tue, Jun 11, 2019 at 12:08 PM JF Bastien via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><div class="">I think we want option 2.: keep volatile memcpy, and implement it as touching each byte exactly once. That’s unlikely to be particularly useful for every direct-to-hardware uses, but it behaves intuitively enough that I think it’s desirable.<br class=""></div><div class=""></div></div></div></blockquote><div class=""><br class=""></div><div class="">As Eli pointed out, that precludes lowering a volatile memcpy into a call the memcpy library function. The usual "memcpy" library function may well use the same overlapping-memory trick, and there is no "volatile_memcpy" libc function which would provide a guarantee of not touching bytes multiple times. Perhaps it's okay to just always emit an inline loop instead of falling back to a memcpy call.</div></div></div></div></blockquote><div class=""><br class=""></div><div class="">In which circumstances does this matter?</div></div></blockquote><div class=""><br class=""></div><div class="">If it's problematic to touch a byte multiple times when emitting inlined instructions for a "volatile memcpy", surely it's also problematic to emit a library function call which does the same thing?</div><div class=""><br class=""></div><div class="">But -- I don't know of any realistic circumstance where either one would be important to actual users. Someone would need to have a situation where doing 2 overlapping 4-byte writes to implement a 7-byte memcpy is problematic, but where it doesn't matter to them what permutation of non-overlapping memory read/write sizes is used -- and furthermore, where the order doesn't matter. That seems extremely unlikely to ever be the case.</div></div></div></div></blockquote><div><br class=""></div><div>Agreed, but that’s the stated intended behavior of volatile. Makes sense for hardware, weird otherwise, but we don’t need to do it any other way. I could construct a case where volatile is used in a signal handler, and where a partial result with overlap breaks expectations, but… I agree it’s unlikely.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto" class=""><div class="">Paul McKenney has a follow on paper (linked from R2 of mine) which addresses some of your questions I think. LLVM can do what it wants for now since there’s no standard, but there’s likely to be one eventually and we probably should match what it’s likely to be. </div></div></blockquote><div class=""><br class=""></div><div class="">I agree, Paul's paper describes the actually-required (vs C-standard-required) semantics for volatile loads and stores today -- that they must use non-tearing operations for sizes/alignments where the hardware provides such. (IMO, any usage of volatile where that cannot be done is extremely questionable). Of course, that doesn't say anything about memcpy, since volatile memcpy isn't part of C, just part of LLVM.</div></div></div></div></blockquote><div><br class=""></div><div>Indeed. After we standardize Paul’s paper I expect to also do something like volatile memcpy based on it.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Always a byte-by-byte copy?</blockquote></div></blockquote><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It can. </blockquote><div class=""> </div><div class="">So, why does llvm even provide a volatile memcpy intrinsic? One possible answer is that it was needed in order to implement volatile aggregate copies generated by the C frontend. So, given the real world requirement to use single instructions where possible...what about this code:</div><div class=""><br class=""></div><div class="">struct X {int n;}; <br class="">void foo(volatile struct X *n) {<br class="">    n[0] = n[1];<br class="">}<br class=""></div><div class=""><br class=""></div><div class="">Clang implements it by creating a volatile llvm.memcpy call. Which currently is generally lowered as a 32-bit read/write. Maybe it should be _required_ to always emit a 32-bit read/write instruction, just as if you were directly operating on a 'volatile int *n'? (Assuming a 32-bit platform which has such instructions, of course)?</div></div></div></div></blockquote><div><br class=""></div><div>In general, volatile instructions should behave as expected. Of course, we can disagree on expectations ;-)</div><div>Interpreting a non-specification is wonderful.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">Or -- maybe memcpy is actually not a reasonable thing to use for copying a volatile struct at all. Perhaps a volatile struct copy should do volatile element-wise copies of each fundamentally-typed field? That might make some sense. (But...unions?).</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto" class=""><blockquote type="cite" class=""><div dir="ltr" class="">

</div></blockquote></div></blockquote></div></div>

</div></blockquote><br class=""></div><div>I think copying a volatile struct is the unreasonable part, but that’s not super relevant to how we implement this silly thing :-)</div>Don’t get me started on volatile unions (and bitfields).<br class=""><div class=""><br class=""></div></body></html>