<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 26, 2016, at 10:58 AM, Xinliang David Li <<a href="mailto:davidxl@google.com" class="">davidxl@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br class="Apple-interchange-newline"><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On Fri, Feb 26, 2016 at 10:53 AM, Mehdi Amini<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:mehdi.amini@apple.com" target="_blank" class="">mehdi.amini@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><span class=""><br class="">> On Feb 26, 2016, at 10:38 AM, Teresa Johnson <<a href="mailto:tejohnson@google.com" class="">tejohnson@google.com</a>> wrote:<br class="">><br class="">> tejohnson added a comment.<br class="">><br class="">> In<span class="Apple-converted-space"> </span><a href="http://reviews.llvm.org/D17212#362864" rel="noreferrer" target="_blank" class="">http://reviews.llvm.org/D17212#362864</a>, @joker.eph wrote:<br class="">><br class="">>> How could we integrate accesses to global variable as part of this?<br class="">>> It turns out that to be able to benefit from the linker information on what symbol is exported during the import, this is a must.<br class="">><br class="">><br class="">> Well without it you still can see which function symbols will be exported, just not the variables, so you are running with less info and I guess need to assume that all static variables will be exposed and promote them.<br class=""><br class=""></span>The scheme I am currently setting up is:<br class=""><br class="">1) The linker gives us the list of symbols that need to be "preserved" (can't internalize)<br class="">2) Link the combined index<br class="">3) Compute the import list for every module *by just looking at the profile*<br class="">4) Do the promotion<br class=""><br class="">There is absolutely no assumption for the promotion (last step): you exactly know what will be imported by *every module*, and you can promote the optimal minimal amount of symbols.<br class=""><br class="">All of that is good and should work with your call-graph patch "as is".<br class=""><br class="">I'm looking to go two steps further during stage 3:<br class=""><br class="">1) I want to drive the importing heuristic cost to actually take into account the need for promotion.<br class="">I'll start some test on the extreme case by *forbiding* any promotion, i.e. if a function references an internal function or global, then it can't be imported in any other module. On the long term it may be interesting to include this in the importing threshold.<br class="">This can be implemented with a flag or an int in the summary "numberOfInternalGlobalsReferenced", but will not be enough for step 2 (below).<br class=""><br class=""></blockquote><div class=""><br class=""></div><div class="">Interesting. What is the motivation for this type of tuning? Does is try to preserve more variable analysis (e.g, aliasing) in backend compilation?</div></div></div></blockquote><div><br class=""></div><div>Yes! LLVM optimization bails terribly with global variable, but likes a lot "internal" variables. On some other (platform dependent) aspect, accessing non-internal globals requires going through the GOT which is not the case with internal. Internal globals can be totally eliminated or turned into allocas, which in turn enables more other optimization. </div><div>In the same way, in many aspects, the optimizer benefits from internal functions. </div><div><br class=""></div><div>Some things can be recovered with more effort (for instance in the linker plugin for ThinLTO we're running :   internalize + global opt + global DCE *before* doing any promotion). But we will fight assumptions in the optimizer for a long time.</div><div><br class=""></div><div>All of this is driven by our current performance tuning of course. </div><div>We are seeing Full-LTO generating a binary almost two times smaller on test-suite/trunk/SingleSource/Benchmarks/Adobe-C++/loop_unroll.cpp while ThinLTO is on the same level as the regular O3.</div><div>(this is a single source benchmark, I wonder if the inliner heuristic wouldn't need some tuning for O3 in general).</div><div><br class=""></div>If you have any other idea or input :)</div><div><br class=""></div><div>-- </div><div>Mehdi</div><div><br class="Apple-interchange-newline"> <br class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">2) I want to benefit from the linker information from stage 1 to internalize symbols.<br class="">It means that the information about the fact that a function is referencing an internal global *can't be in the summary* because the front-end does not know that the global will be internalized.<br class="">This can be implemented by not having a "call graph" but a "reference graph" (not sure on the terminology): i.e. edges would be there for any uses of a symbol to another one and not only calls.<br class=""></blockquote><div class=""><br class=""></div><div class="">Reference graph is what GCC uses :)</div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><span class=""><br class=""><br class=""><br class="">> To refine that behavior for variables, yes, we'd need additional info in the summary.<br class="">><br class="">> (For davidxl or anyone else who didn't see the IRC conversation, Mehdi is looking at doing pure summary-based importing decisions in the linker step, then giving this info to the ThinLTO backends to avoid promotion of local values that aren't exported. For a distributed build if we wanted to do it this way the importing decisions would all be made in the plugin step, then would need to be serialized out for the distributed backends to check.)<br class="">><br class="">> Two possibilities depending on the fidelity of the info you think you need:<br class="">><br class="">> 1. One possibility is to just put a flag in the function summary if it accesses *any* local variables, and adjust the importing threshold accordingly. Then in the ThinLTO backend for the exporting module you need to check which of your own functions are on the import list, and which local variables they access, and promote accordingly.<br class="">><br class="">> 2. If it will be really beneficial to note exactly which local variables are accessed by which function, we'll need to broaden the edges list to include accesses to variables (I assume you only care about local variables here). E.g. the per-module summary edge list for a function would need to include value ids for any local variables referenced by that function (not sure that the other parts of the triple, the static and profile counts, are needed for that). Then in the combined VST we need to include entries for GUIDs of things that don't have a function summary, but are referenced by these edges. When accessing a function summary edge list for a candidate function to import, you could then see the GUID of any local variables accessed. You wouldn't know them by name, but if for example you wanted a heuristic like "if >=N hot import candidate functions from module M access a local variable with GUID G, go ahead and import those and let G be promoted by the backend (which like in approach #1 needs to check which local variables are accessed by any functions on an import list)".<br class="">><br class="">> Obviously 1) is easier and cheaper space-wise. What are your thoughts?<br class=""><br class=""><br class=""></span>So 1) is cheaper, but 2) a lot more powerful as explained above :)<br class=""></blockquote><div class=""><br class=""></div><div class="">yes -- we may find other good uses of it.</div><div class=""><br class=""></div><div class="">David</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><span class="HOEnZb"><font color="#888888" class=""><br class=""><br class="">--<br class="">Mehdi</font></span></blockquote></div></blockquote></div><br class=""></body></html>