<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 16, 2017, at 3:24 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" class="">chisophugis@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br class="Apple-interchange-newline"><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:mehdi.amini@apple.com" target="_blank" class="">mehdi.amini@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><br class=""><div class=""><span class="gmail-"><blockquote type="cite" class=""><div class="">On Jan 16, 2017, at 1:47 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank" class="">chisophugis@gmail.com</a>> wrote:</div><br class="gmail-m_-5968031970270721054Apple-interchange-newline"><div class=""><br class="gmail-m_-5968031970270721054Apple-interchange-newline"><br style="font-family: helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;" class=""><div class="gmail_quote" style="font-family: helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano<span class="gmail-m_-5968031970270721054Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:davide@freebsd.org" target="_blank" class="">davide@freebsd.org</a>></span><span class="gmail-m_-5968031970270721054Apple-converted-space"> </span><wbr class="">wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><span class="gmail-m_-5968031970270721054gmail-">On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev<br class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class="">> Do we have any open projects on LLD?<br class="">><br class="">> I know we usually try to avoid any big "projects" and mainly add/fix things<br class="">> in response to user needs, but just wondering if somebody has any ideas.<br class="">><br class=""><br class=""></span>I'm not particularly active in lld anymore, but the last big item I'd<br class="">like to see implemented is Pettis-Hansen layout.<br class=""><a href="http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Articles/papers15.pdf" rel="noreferrer" target="_blank" class="">http://perso.ensta-paristech.f<wbr class="">r/~bmonsuez/Cours/B6-4/Article<wbr class="">s/papers15.pdf</a><br class="">(mainly because it improves performances of the final executable).<br class="">GCC/gold have an implementation of the algorithm that can be used as<br class="">base. I'll expand if anybody is interested.<br class="">Side note: I'd like to propose a couple of llvm projects as well, I'll<br class="">sit down later today and write them.<br class=""></blockquote><div class=""><br class=""></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">I’m not sure, can you confirm that such layout optimization on ELF requires -ffunction-sections?</div></div></div></blockquote><div class=""><br class=""></div><div class="">In order for a standard ELF linker to safely be able to reorder sections at function granularity, -ffunction-sections would be required. This isn't a problem during LTO since the code generation is set up by the linker :)</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><div class=""><br class=""></div><div class="">Also, for clang on OSX the best layout we could get is to order functions in the order in which they get executed at runtime.</div></div></div></blockquote><div class=""><br class=""></div><div class="">What the optimal layout may be for given apps is a bit of a separate question. Right now we're mostly talking about how to plumb everything together so that we can do the reordering of the final executable.</div></div></div></blockquote><div><br class=""></div><div>Yes, I was raising this exactly with the idea of “we may want to try different algorithm based on different kind of data”.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class=""><br class=""></div><div class="">In fact, standard ELF linking semantics generally require input sections to be concatenated in command line order (this is e.g. how .init_array/.ctors build up their arrays of pointers to initializers; a crt*.o file at the beginning/end has a sentinel value and so the order matters). So the linker will generally need blessing from the compiler to do most sorts of reorderings as far as I'm aware.</div><div class=""><br class=""></div><div class="">Other signals besides profile info, such as a startup trace, might be useful too, and we should make sure we can plug that into the design.</div><div class=""><div class="">My understanding of the clang on OSX case is based on a comparison of the `form_by_*` functions in clang/utils/perf-training/perf-helper.py which offer a relatively simple set of algorithms, so I think the jury is still out on the best approach (that script also uses a data collection method that is not part of LLVM's usual instrumentation or sampling workflows for PGO, so we may not be able to provide the same signals out of the box as part of our standard offering in the compiler)</div></div></div></div></blockquote><div><br class=""></div><div>Yes, I was thinking that some Xray-based instrumentation could be used to provided the same data.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class="">I think that once we have this ordering capability integrated more deeply into the compiler, we'll be able to evaluate more complicated algorithms like Pettis-Hansen, have access to signals like global profile info, do interesting call graph analyses, etc. to find interesting approaches.<br class=""></div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><span class="gmail-"><div class=""> </div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family: helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><div class="">For FullLTO it is conceptually pretty easy to get profile data we need for this, but I'm not sure about the ThinLTO case.</div><div class=""><br class=""></div><div class="">Teresa, Mehdi,</div><div class=""><br class=""></div><div class="">Are there any plans (or things already working!) for getting profile data from ThinLTO in a format that the linker can use for code layout? I assume that profile data is being used already to guide importing, so it may just be a matter of siphoning that off.</div></div></div></blockquote><div class=""><br class=""></div></span><div class="">I’m not sure what kind of “profile information” is needed, and what makes it easier for MonolithicLTO compared to ThinLTO?</div></div></div></blockquote><div class=""><br class=""></div><div class="">For MonolithicLTO I had in mind that a simple implementation would be:</div><div class="">```</div><div class="">std::vector<std::string> Ordering;</div><div class="">auto Pass = make_unique<LayoutModulePass>(&Ordering);</div><div class="">addPassToLTOPipeline(std::move(Pass))</div><div class="">```</div><div class=""><br class=""></div><div class="">The module pass would just query the profile data directly on IR datastructures and get the order out. This would require very little "plumbing".</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><span class="gmail-"><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family: helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><div class="">Or maybe that layout code should be inside LLVM; maybe part of the general LTO interface? It looks like the current gcc plugin calls back into gcc for the actual layout algorithm itself (function call find_pettis_hansen_function_<wbr class="">layout) rather than the reordering logic living in the linker: <a href="https://android.googlesource.com/toolchain/gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_plugin/function_reordering_plugin.c" target="_blank" class="">https://android.<wbr class="">googlesource.com/toolchain/<wbr class="">gcc/+/<wbr class="">3f73d6ef90458b45bbbb33ef4c2b17<wbr class="">4d4662a22d/gcc-4.6/function_<wbr class="">reordering_plugin/function_<wbr class="">reordering_plugin.c</a></div></div></div></blockquote><div class=""><br class=""></div></span><div class="">I was thinking about this: could this be done by reorganizing the module itself for LTO?</div></div></div></blockquote><div class=""><br class=""></div><div class="">For MonolithicLTO that's another simple approach.</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word;" class=""><div class=""><div class=""><br class=""></div><div class="">That wouldn’t help non-LTO and ThinLTO though.</div></div></div></blockquote><div class=""><br class=""></div><div class="">I think we should ideally aim for something that works uniformly for Monolithic and Thin. For example, GCC emits special sections containing the profile data and the linker just reads those sections; something analogous in LLVM would just happen in the backend and be common to Monolithic and Thin. If ThinLTO already has profile summaries in some nice form though, it may be possible to bypass this.<br class=""></div><div class=""><br class=""></div><div class="">Another advantage of using special sections in the output like GCC does is that you don't actually need LTO at all to get the function reordering. The profile data passed to the compiler during per-TU compilation can be lowered into the same kind of annotations. (though LTO and function ordering are likely to go hand-in-hand most often for peak-performance builds).</div></div></div></blockquote><div><br class=""></div><div>Yes I agree with all of this :)</div><div>That makes it for interesting design trade-off!</div><div> </div><div>— </div><div>Mehdi</div><div><br class=""></div></div></body></html>