<div dir="ltr">Hi Peter,<div><br></div><div>thank you for concern and advice. Since we both write the compiler and design the language, we are not particularly</div><div>bound by any pre-existing spec. The concerns about multi-threaded data races are very relevant of course</div><div>and we're well aware of the implications. In the particular case where this comes up, language semantics<br></div><div>generally guarantee that this is unobservable both in single-threaded and multi-threaded contexts (though</div><div>we generally do allow the user to shoot themselves in the foot if they want to, the primary concern here</div><div>is not really observability, but what the programmer expects from the semantics of the language). For what</div><div>it's worth, this isn't exactly CICO. Our calling convention is generally by reference. However, we do have</div><div>notions of semantic immutability, which is where this particular pattern arises (in cases where a new immutable</div><div>gets created by taking an existing field and modifying it in one place only). Because of these semantic</div><div>guarantees, we know that there's no aliasing of the kind that would be problematic (and expose this</div><div>information to LLVM through the various AA mechanisms). Now, similar issues of course arise with mutable</div><div>memory locations as well. However, in such cases the data race would be explicitly present in the source</div><div>program, so we don't have a problem with the compiler making this optimization. FWIW, our multi-threading</div><div>programming model is in the early stages, and we're considering various language level constraints</div><div>on concurrent data modification to mostly disallow that situation unless explicitly opted in to by the user,</div><div>but that's a bit off. </div><div><br></div><div><div style="font-size:12.8px">From my perspective, I don't see a reason why GVN shouldn't be doing this (which is why I sent the original</div></div><div style="font-size:12.8px">email in the first place). It would of course be very possible for us to write our own pass that pattern matches</div><div style="font-size:12.8px">this and performs the transformation that we want. However, we generally tend to prefer working with the</div><div style="font-size:12.8px">community to put the optimizations in the best possible place such that others may automatically take advantage.</div><div style="font-size:12.8px">It sounds like the community consensus is that GVN should be able to do this kind of optimization (and thanks</div><div style="font-size:12.8px">to Daniel for providing some guidance on implementation!). If people feel strongly that memcpyopt (or a new pass)</div><div style="font-size:12.8px">with appropriate pattern matching would be a better place, I'd be happy to go that way as well of course.</div><div style="font-size:12.8px"><br></div><div style="font-size:12.8px">Keno</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 17, 2017 at 1:28 PM, Peter Lawrence <span dir="ltr"><<a href="mailto:peterl95124@sbcglobal.net" target="_blank">peterl95124@sbcglobal.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Keno,<div>         Hmmm, seems like you are saying “copy-in-copy-out” argument semantics are required by the language,</div><div>Otherwise you would be using “by-reference" argument semantics,</div><div>And that CICO is easiest for you to represent with memcpy.</div><div><br></div><div>Usually there are some very subtle issues with CICO and the memory model,</div><div>Typically the original object isn’t supposed to be modified until the function returns,</div><div>IE multiple stores, especially of different values, to a field in the original object should not be visible, only the final store,</div><div>This is clearly “observable" in multithreaded programs, but can also be observable in a single threaded program</div><div>If the same object is visible from within the called function for example as a global variable</div><div>Which would be seen to have its internal values change multiple times, even though the</div><div>Intent of the language using CICO is to try to ensure all-at-once state changes (at least for single-threaded programs)</div><div><br></div><div><br></div><div>My advice, if the above applies to you, is to add a new pass to the compiler that figures out if</div><div>The transformation from memcpy to explicit multiple load/store is actually legal (won’t produce intermediate</div><div>State changes before the exit of the function which would violate the strict CICO calling convention),</div><div>And also profitable (I don’t view the code explosion of [1000000 x double] as profitable!),</div><div>Or if the transformation from “CICO" to pure “by-reference” is both legal, and profitable.</div><div><br></div><div>(Also, don’t forget to check what the language spec says about this function passing the object,</div><div>Or parts of it, to other functions before or after making modifications)</div><div><br></div><div><br></div><div>My advice regarding teaching GVN about memcpy is not to. It would be one thing if the memcpy</div><div>Were copying in/out a single variable, in that case the memcpy can and should be viewed as a load / store pair,</div><div>But in your case it isn’t being used that way, it is being used to copy multiple values, and the only</div><div>Logical thing that GVN could do is expand those out to multiple individual loads and stores. GVN should not</div><div>Be doing this, instead your new pass (that first checks to see if it is legal w.r.t. calling convention) is</div><div>The place to do this, or if should convert to pure “by-reference” if legal, which also shouldn’t be done in GVN.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div><br></div><div>—Peter Lawrence.</div></font></span><div><div class="h5"><div><br></div><div><br></div><div><br><div><blockquote type="cite"><div>On May 17, 2017, at 8:55 AM, Keno Fischer <<a href="mailto:keno@juliacomputing.com" target="_blank">keno@juliacomputing.com</a>> wrote:</div><br class="m_-7043513410783469941Apple-interchange-newline"><div><div dir="ltr">Well, mostly I want to hoist the store to the stack and transform it into a store to the heap. After that the memcpys are essentially trivially dead, so instcombine or dse will delete them for me. If the memcpys were made of individual stores instead, there'd have to be some sort of exponential search somewhere in the compiler to figure that out. For the extreme case consider [100000000 x double]. The same optimization can apply here, but if it tried to do 100000000 stores instead, I wouldn't expect the compiler to really figure that out. What I meant was that I think the memcpys are the correct representation of this from the frontend, it's just that I'd like more optimization to happen here.<div><br></div><div><br><div class="gmail_extra"><div class="gmail_quote">On Wed, May 17, 2017 at 11:48 AM, Peter Lawrence <span dir="ltr"><<a href="mailto:peterl95124@sbcglobal.net" target="_blank">peterl95124@sbcglobal.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word">Keno,<div>          "No, I very much want the memcpys there” seems like a contradiction,</div><div>Aren’t you trying to optimize away the memcpys.</div><span class="m_-7043513410783469941gmail-HOEnZb"><font color="#888888"><div>Peter Lawrence</div></font></span><div><div class="m_-7043513410783469941gmail-h5"><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br><div><blockquote type="cite"><div>On May 17, 2017, at 8:22 AM, Keno Fischer <<a href="mailto:keno@juliacomputing.com" target="_blank">keno@juliacomputing.com</a>> wrote:</div><br class="m_-7043513410783469941gmail-m_703276034275331539Apple-interchange-newline"><div><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 17, 2017 at 12:09 AM, Peter Lawrence via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Keno,</div><div>          Perhaps you can view the problem to be the memcpys themselves,</div><div>We humans can look at the memcpys and see loads and stores</div><div>but to almost all optimizer passes they aren’t what it is looking for,</div><div>They instead see function calls which they mostly don’t touch,</div><div><br></div><div>If these memcpys were inlined into plain old loads and stores</div><div>The redundant loads and stores should be deleted by existing opts</div><div><br></div><div>A question I have for you is, because this looks like “copy-in-copy-out” argument semantics,</div><div>Which to me looks more like Ada than C, what was the source language ?</div><div><br></div><div><br></div><div>Peter Lawrence.</div></blockquote></div><br>No, I very much want the memcpys there. With individual stores I'd give up hope that the optimizer can figure out what's going on here, esp. if it gets beyond a few bytes, but I with memcpys it does seem doable. As for which frontend produced this, we're considering adding language semantics that would produce lots of code like this to julia, so we're looking into getting the optimizer to fold the extra copies away.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Keno</div></div>

</div></blockquote></div><br></div></div></div></div></blockquote></div><br></div></div></div>

</div></blockquote></div><br></div></div></div></div></blockquote></div><br></div>