<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div><blockquote type="cite" class=""><div class="">On Mar 30, 2018, at 10:36 AM, Piotr Padlewski <<a href="mailto:piotr.padlewski@gmail.com" class="">piotr.padlewski@gmail.com</a>> wrote:</div><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote">2018-03-29 18:01 GMT+02:00 John McCall <span dir="ltr" class=""><<a href="mailto:rjmccall@apple.com" target="_blank" class="">rjmccall@apple.com</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space" class=""><div class=""><blockquote type="cite" class=""><span class=""><div class="">On Mar 29, 2018, at 9:12 AM, Piotr Padlewski <<a href="mailto:piotr.padlewski@gmail.com" target="_blank" class="">piotr.padlewski@gmail.com</a>> wrote:</div></span><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><span class="">2018-03-28 23:23 GMT+02:00 John McCall <span dir="ltr" class=""><<a href="mailto:rjmccall@apple.com" target="_blank" class="">rjmccall@apple.com</a>></span>:</span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><span class=""><div class="">On Mar 19, 2018, at 7:27 PM, Piotr Padlewski via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank" class="">cfe-dev@lists.llvm.org</a>> wrote:</div></span><span class=""><div class=""><div dir="ltr" class=""><div class=""><b style="font-weight:normal" id="m_5284604705455374278m_-1828198599501100078m_1414337835050969498gmail-m_7271471833428658469m_-8503527532884566477gmail-docs-internal-guid-a43ba0b5-406d-36b9-4e30-86e8c95f392a" class=""><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;text-align:justify" class=""><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">Note that adding calls to </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">strip</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class=""> and </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">launder</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class=""> related to pointer comparisons and integer<->pointer conversions will not cause any semantic information to be lost: if any piece of information could be inferred by the optimiser about some collection of variables (e.g. that two pointers are equal) can be inferred now about their </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">strip</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">ped versions, no matter how many </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">strip</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class=""> and </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">launder</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class=""> calls have been made to obtain them in the IR. As an example, the C++ expression </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">ptr == std::launder(ptr)</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class=""> will be optimised to </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">true</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">, because it will compare </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">strip(ptr)</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class=""> with </span><span style="font-size:11pt;font-family:"Courier New";background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">strip(launder(ptr))</span><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">, which are indeed equal according to our rules.</span></div></b></div></div></div></span></blockquote><span class=""><div class=""><br class=""></div><div class="">This proposal sounds great, even if it still doesn't solve some of the problems I personally need to solve with invariant loads. :)</div><div class=""><br class=""></div><div class="">I take it that the actual devirtualization here is ultimately still done by forwarding a visible store of the v-table pointer to an invariant load, just by noticing that they occur to the same laundered pointer and therefore must involve the same value. There's no way of saying "I know what the value of the v-table pointer is even if you can't see a store" when creating a laundered pointer. For example, in Swift we have constructor functions that are known to return a complete object of a specific type, even if we can't necessarily see the implementation of that function; there's no way for us to say anything about that function pointer</div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><b style="font-weight:normal" id="m_5284604705455374278m_-1828198599501100078m_1414337835050969498gmail-m_7271471833428658469m_-8503527532884566477gmail-docs-internal-guid-a43ba0b5-406d-36b9-4e30-86e8c95f392a" class=""><h2 dir="ltr" style="line-height:1.38;margin-top:18pt;margin-bottom:6pt" class=""><span style="font-size:16pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class=""></span></h2></b></div></div></div></blockquote></span></div></div></blockquote><span class=""><div class="">I think we have already solved that problem with calls to llvm.assume intrinsic. After calling the constructor, we load virtual pointer (with invariant group) and compare it with the vtable it should point to and then pass it to the assume.</div><div class=""><div class=""><br class=""></div><div class=""> call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor</div><div class=""> %3 = load {...} %a ; Load vptr</div><div class=""> %4 = icmp eq %3, @_ZTV1A ; compare vptr with vtable</div><div class=""> call void @llvm.assume(i1 %4)</div></div><div class=""> </div><div class="">(from <a href="http://blog.llvm.org/2017/03/devirtualization-in-llvm-and-clang.html" target="_blank" class="">http://blog.llvm.org/201<wbr class="">7/03/devirtualization-in-llvm-<wbr class="">and-clang.html</a> )</div><div class=""><br class=""></div><div class="">If I understand it correctly, you should be able to use the same technique for the constructor-like functions in Swift :)</div></span></div></div></div></div></blockquote><div class=""><br class=""></div>Yes, I think so. Although IIRC people have had significant trouble with llvm.assume — the work that's just done for assume purposes has a nasty habit of sticking around.</div></div></blockquote><div class="">I had a problem with assume couple of years ago, but I think it looks much better right now. We will how it works right now.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space" class=""><div class=""><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><b style="font-weight:normal" id="m_5284604705455374278m_-1828198599501100078m_1414337835050969498gmail-m_7271471833428658469m_-8503527532884566477gmail-docs-internal-guid-a43ba0b5-406d-36b9-4e30-86e8c95f392a" class=""><h2 dir="ltr" style="line-height:1.38;margin-top:18pt;margin-bottom:6pt" class=""><span style="font-size:16pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">LLVM</span></h2><br class=""><br class=""><div style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;text-align:justify" class=""><span style="font-size:11pt;font-family:Arial;background-color:transparent;font-weight:400;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap" class="">Because LTO between a module with and without devirtualization will be invalid, we will need to break LLVM level ABI. This is however already implemented, because LTO between modules with invariant.group.barriers and without is also invalid. This also means that if we don’t want to break ABI between modules with and without optimizations, we will need to have invariant.barriers and fatpointer.create/strip turned on all the time. For the users it will means that when switching to new compiler, they will have to recompile all of the generated object files for LTO builds.</span></div></b></div></div></div></blockquote><div class=""><br class=""></div>Is there really no way to have this degrade more gracefully? I continue to be very concerned about frontend interworking here, either between different versions of a single frontend (e.g. clang 6 vs. clang 8), or between different invocations of a single frontend with different language options set (e.g. clang vs. clang++), or even between different frontends that produce IR that gets linked together (e.g. clang vs. swift).</div><div class=""><br class=""></div><div class="">How about this approach:</div><div class=""> - Instead of taking a meaningless !{} argument, invariant.group takes a string argument which identifies a metadata-dependent optimization. In your case, it would be something like !"clang.cxx_devirtualization".</div><div class=""> - Functions have a "supported optimizations" list which declares all the metadata-reliant optimizations they promise to have correct metadata for. So e.g. clang++ would list "clang.cxx_devirtualization" on every single function it compiled, regardless of whether that function actually needed any metadata. I'm pretty sure metadata are optimized so that identical lists of options like this don't take up more space just because they're added to every single function in the module.</div><div class=""> - Interprocedural optimizations — which mostly means inlining — are required to be aware of the supported-optimizations list. The inliner would intersect the supported-optimizations lists and then strip metadata/intrinsics that don't belong anymore.</div><div class=""><br class=""></div><div class="">But the idea that every single metadata-dependent optimization is going to create a new "IR ABI break" just seems unacceptable to me. Compiler optimization IRs are not stable things; compiler engineers constantly find new things that they want to express.</div><div class=""><br class=""></div><div class="">John.</div></div></blockquote><div class=""><br class=""></div><div class="">I haven't thought about LTO between different languages, thanks for bringing that! </div><div class="">Can you actually use C++ objects without going through C interface? If it is possible, then that is heavy.</div></div></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Not yet, but it's a goal. But even without that, Swift might call a C interface and the code on the other side of the C interface might be C++.</div><div class=""><br class=""></div><div class="">Even putting Swift aside, it's not atypical to have a few C files in a majority-C++ project, or vice-versa. Or, for that matter, a few files that are compiled with different optimization settings.</div><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">To clarify how it works right now - if you would do LTO between IR compiled with -fstrict-vtable-pointers and without, then the linker would throw an error. I can see it right now, that it pretty much stops you from doing any LTO between different languages.</div></div></div></div></div></blockquote><div class=""><br class=""></div></span>Yeah. It also creates problems for people who are trying to make LTO-able static libraries; Apple encourages people to use bitcode for some things, and we'd like to do more of that.</div><div class=""><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">The other idea that we had, was to actually strip all the invariant.groups when linking with module that does not have them. This, opposed to the first idea would let us link whatever we want, but we could silently loose some optimizations.</div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><div class="">I like the idea that you proposed - it is somewhere between these two ideas, as you limit the potential loses to only some functions and in the same time you can link whaterver IR you like. </div></div></div></div></div></blockquote><div class=""><br class=""></div></span>Yeah, just losing the optimization in functions where you've actually merged different information is a really nice property.</div><div class=""><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">However, if you agree that the option 2 - stripping invariant.groups from whole modules - addresses all of your concerns, then I would propose to firstly go with this idea and then optimize it if we would find a problem with it.</div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class="">I feel that it might be an overkill to implement it on the first go, especially that we are not even in the point of thinking about turing -fstrict-vtable-pointers on by default.</div></div></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote"><div class=""><br class=""></div><div class="">What do you think about that?</div></div></div></div></div></blockquote><div class=""><br class=""></div></span>I certainly think it's fine for your summer project to just get the optimization working first.</div><div class=""><br class=""></div><div class="">When it comes time to actually harden the IR linker against this, I think we should go ahead and pursue the more aggressive function-by-function solution. That's not because the whole-module solution wouldn't solve the problem — you're absolutely right, it would. But it seems to me that (1) the function-by-function solution is where we ought to end up, and (2) it's not that much more work than the whole-module solution, because the big piece of work in either case is finding and stripping the right metadata and intrinsics, and (3) crucially, it's not an extension of the whole-module solution — it relies on information being provided in a completely different way. If we implement the whole-module approach, it becomes a legacy part of the system that we're stuck with *in addition to* whatever function-by-function approach we eventually settle on, and it probably permanently complicates the function-by-function approach.</div><span class="m_5284604705455374278HOEnZb"><font color="#888888" class=""><div class=""><br class=""></div><div class="">John.</div><div class=""><br class=""></div></font></span></div></blockquote></div>That's a good point, let's bring that problem back when the project progresses. </div><div class="gmail_extra">Do you know any other specific situations and metadata that would require, or would be good if would use the same solution?</div></div></div></blockquote><div><br class=""></div>Ah, sure. It's probably the right solution for TBAA metadata compatibility as well. Different compilers are likely to use different TBAA tag hierarchies, just because they have different rules for aliasing. In the absence of some way of officially declaring them compatible, we should assume they're incompatible and strip them during inlining. In fact, it's probably true that *most* annotation approaches are frontend-specific and should be stripped when merging information from different frontends.</div><div class=""><br class=""></div><div class="">John.</div></body></html>