<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Keno,<div class="">          I suspect that if you, Daniel B., and I were to have an in person meeting this would take 5~10 minutes</div><div class="">For everyone to (in terms D.B. will appreciate) “converge to a fixed point” (:-)! of understanding. Meanwhile we</div><div class="">are stuck with limited email bandwidth, but a meet-up at the next llvm Bay Area social might be a good idea.</div><div class=""><br class=""></div><div class="">Whether GVN is an appropriate place for this opt is going to hinge on the precise details of the calling</div><div class="">Convention, which you are still being vague about. You are simultaneously saying</div><div class="">1) the memcpy are necessary</div><div class="">2) the memcpy can and should be opt away</div><div class="">But these two statements are mutually exclusive, so you need to be more precise about the CC.</div><div class=""><br class=""></div><div class="">Why is this important?, because even if Daniel B. can enhance GVN to “look through” the memcpys</div><div class="">And optimize the loads from the local-stack-copy into loads through the original pointer argument,</div><div class="">And optimize the stores into the local-stack-copy into stores through the original pointer argument,</div><div class="">There is still the issue of deleting the memcpys themselves, which is the actual performance problem.</div><div class=""><br class=""></div><div class="">But the rules of the C/C++ programming language aren’t typically going to allow these deletions,</div><div class="">For example if the original pointer argument is passed to another function, or the address of any</div><div class="">Or all of the local stack copy are passed to another function, or simply calling *any* function because</div><div class="">It could by the C/C++ rules modify the original data, requiring the memcpys to be preserved.</div><div class=""><br class=""></div><div class="">Also the same logical arguments apply to the loads and stores that Daniel B. thinks he can optimize</div><div class="">In GVN, it depends on where they occur relative to calls to other functions within this function.</div><div class=""><br class=""></div><div class="">The only thing that allows the deletion of the memcpys is intimate knowledge of the Julia-specific</div><div class="">Calling convention. Again similar conclusions apply to even the loads and stores.</div><div class=""><br class=""></div><div class="">And that, IMHO, is inappropriate to include in GVN, which is otherwise a purely C/C++ optimizer,</div><div class="">So a separate Julia calling convention pass is indicated. </div><div class=""><br class=""></div><div class="">PS, Don’t be intimidated by writing an IR-to-IR pass, I’ve already written one, they are easy.</div><div class="">Yours will be particularly easy (after verifying the transform is legal) as it is just a “replace-all-</div><div class="">Uses-of” which already exists, deleting the memcpys, and finally deleting the stack object.</div><div class=""><br class=""></div><div class="">Peter Lawrence.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On May 18, 2017, at 8:47 AM, Keno Fischer <<a href="mailto:keno@juliacomputing.com" class="">keno@juliacomputing.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi Peter,<div class=""><br class=""></div><div class="">thank you for concern and advice. Since we both write the compiler and design the language, we are not particularly</div><div class="">bound by any pre-existing spec. The concerns about multi-threaded data races are very relevant of course</div><div class="">and we're well aware of the implications. In the particular case where this comes up, language semantics<br class=""></div><div class="">generally guarantee that this is unobservable both in single-threaded and multi-threaded contexts (though</div><div class="">we generally do allow the user to shoot themselves in the foot if they want to, the primary concern here</div><div class="">is not really observability, but what the programmer expects from the semantics of the language). For what</div><div class="">it's worth, this isn't exactly CICO. Our calling convention is generally by reference. However, we do have</div><div class="">notions of semantic immutability, which is where this particular pattern arises (in cases where a new immutable</div><div class="">gets created by taking an existing field and modifying it in one place only). Because of these semantic</div><div class="">guarantees, we know that there's no aliasing of the kind that would be problematic (and expose this</div><div class="">information to LLVM through the various AA mechanisms). Now, similar issues of course arise with mutable</div><div class="">memory locations as well. However, in such cases the data race would be explicitly present in the source</div><div class="">program, so we don't have a problem with the compiler making this optimization. FWIW, our multi-threading</div><div class="">programming model is in the early stages, and we're considering various language level constraints</div><div class="">on concurrent data modification to mostly disallow that situation unless explicitly opted in to by the user,</div><div class="">but that's a bit off. </div><div class=""><br class=""></div><div class=""><div style="font-size:12.8px" class="">From my perspective, I don't see a reason why GVN shouldn't be doing this (which is why I sent the original</div></div><div style="font-size:12.8px" class="">email in the first place). It would of course be very possible for us to write our own pass that pattern matches</div><div style="font-size:12.8px" class="">this and performs the transformation that we want. However, we generally tend to prefer working with the</div><div style="font-size:12.8px" class="">community to put the optimizations in the best possible place such that others may automatically take advantage.</div><div style="font-size:12.8px" class="">It sounds like the community consensus is that GVN should be able to do this kind of optimization (and thanks</div><div style="font-size:12.8px" class="">to Daniel for providing some guidance on implementation!). If people feel strongly that memcpyopt (or a new pass)</div><div style="font-size:12.8px" class="">with appropriate pattern matching would be a better place, I'd be happy to go that way as well of course.</div><div style="font-size:12.8px" class=""><br class=""></div><div style="font-size:12.8px" class="">Keno</div><div class=""><br class=""></div></div><div class="gmail_extra"><br class=""></div></div></blockquote></div><br class=""></div></body></html>