<div dir="ltr">As Mehdi and Steven noted, regenerating the summaries on the fly will be prohibitively expensive, so it would be better to have the summaries always available, and just ignore if the user wants full LTO. <div><br><div>However, the biggest issue will be the different pipelines. As Katya notes, with Full LTO more is done in the compile step, whereas ThinLTO exits early since aggressive optimizations can be performed in the backends, and also we avoid bloating out the code due to things like loop unrolling, etc (which at the very least would require adjustment of the importing thresholds). Making ThinLTO use the Full LTO pipeline will reduce performance (even if we adjust all the thresholds due to the changed compile pipeline, the backend pipeline is currently more aggressive for ThinLTO). Making Full LTO use ThinLTO's pipeline will increase its compile time. You'd have to do some performance experiments to see if, for example, we could make the ThinLTO compile step optimization pipeline the same as FullLTO's for the purpose of sharing the bitcode(+summary), but then use either the Thin or Full LTO pipeline in the backend depending on the mode.</div></div><div><br></div><div>We could, as Mehdi notes, allow importing from FullLTO modules if they had summaries, without too much difficulty.</div><div><br></div><div>Teresa</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 10, 2018 at 8:37 PM, Steven Wu via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space">I think for ld64, you can mix thinLTO and fullLTO files and ld64 is going to compile them separately and combine the result. (Mehdi can confirm). I think this is aligned with the fact that whether to use full or thin LTO is decided during clang invocation, not linker invocation. I am not against any of the model, but I think we need to do some research before making the effort to switch the model.<div><br></div><div>On the other hand, I think it should work if you feed thinLTO object file into FullLTO code generator (if not, it is probably easy to implement). The issue there is thin and full LTO uses different optimization pipeline. Probably need to do some benchmark to figure out the impact of that.</div><div><br></div><div>Mehdi is correct that recompute summaries is expensive. You will get either memory or disk IO overhead that might make the compile time even slower than fullLTO. If you want to pick one format to use in both, it has to be the thinLTO format with the summary. But now, you need to deal with what happen if there is a legacy library with fullLTO info and user specify thinLTO on the linker command line.</div><span class="HOEnZb"><font color="#888888"><div><br></div></font></span><div><span class="HOEnZb"><font color="#888888">Steven</font></span><div><div class="h5"><br><div><br></div><div><br><div><br><blockquote type="cite"><div>On Apr 10, 2018, at 5:25 PM, Mehdi AMINI via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="m_9119684291464802385Apple-interchange-newline"><div><div dir="ltr">Hi,<div><br></div><div>It is non trivial to recompute summaries (which is why we have summaries in the bitcode in the first place by the way), because bitcode is expensive to load.<div><br></div><div><div>I think shipping two different variant of the bitcode, one with and one without summaries isn't providing much benefit while complicating the flow. We could achieve what you're looking for by revisiting the flow a little.</div></div><div><br></div><div>I would try to consider if we can:</div><div><br></div><div>1) always generate summaries.</div><div>2) Use the same compile-phase optimization pipeline for ThinLTO and LTO.</div><div>3) Decide at link time if you want to do FullLTO or ThinLTO.</div><div><br></div><div>We haven't got this route 2 years ago because during the bringup we didn't want to affect FullLTO in any way, but it may make sense now to have `clang -flto=thin` and `clang -flto=full` be identical and change the linker plugins to operate either in full-LTO mode or in ThinLTO mode but not differentiate based on the availability of the summaries.</div><div><br></div><div>A possible behavior could be:</div><div><br></div><div># The -flto flag in the compile phase does not change the produced bitcode but for a flag that record the preference in the bitcode (FullLTO vs ThinLTO)</div><div>$ clang -c -flto=thin a.cpp<br></div><div>$ clang -c -flto=full b.cpp<br></div><div></div><div>$ clang -c -flto=full c.cpp</div><div><br></div><div># At link time the behavior depends on the -flto flag passed in.</div><div><br></div><div># No flag: use the compile-phase preference, perform ThinLTO on a.o and FullLTO on b.o/c.o, but allow ThinLTO import between the LTO group and the ThinLTO objects<br></div><div></div><div>$ clang a.o b.o c.o<br></div><div><br></div><div># Forces full LTO, merges all the objects, no cross module importing will happen.</div><div><div>clang a.o b.o c.o -flto=full<br></div></div><div><br></div><div># Forces ThinLTO for all objects, FullLTO won't happen, no objects will be merged.</div><div><div>clang a.o b.o c.o -flto=thin<br></div></div><div><br></div><div>Cheers,</div><div><br></div><div>-- </div><div>Mehdi</div><div><br></div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">Le mar. 10 avr. 2018 à 15:51, via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div lang="EN-US" link="blue" vlink="purple">

<div class="m_9119684291464802385m_-8078028122406665408WordSection1"><p class="MsoNormal">Hi David,<u></u><u></u></p><p class="MsoNormal">Thank you so much for your reply!<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">>> You're dealing with a situation where you are shipped BC files offline and then do one, or multiple builds with these BC files?<br>

<span style="font-family:Arial,sans-serif">Yes, that’s exactly the case.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal">>> If the scenario was more like a naive build: Multiple BC files generated on a single (multi-core/threaded) machine (but some Thin, some

<u></u><u></u></p><p class="MsoNormal">>> Full) & then fed to the linker, I would wonder if it'd be relatively cheap for the LTO step to support this by computing summaries for

<u></u><u></u></p><p class="MsoNormal">>> FullLTO files on the fly (without a separate tool/writing the summary to disk, etc).<span style="font-family:"Arial",sans-serif;color:#1f497d"><u></u><u></u></span></p><p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-family:Arial,sans-serif">I think so. My understanding that for FullLTO files, it’s possible to perform name anonymous globals pass and compute summaries on the fly, which should allow to perform ThinLTO at

 link phase. <u></u><u></u></span></p><p class="MsoNormal"><span style="font-family:Arial,sans-serif"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-family:Arial,sans-serif">Katya.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>>

<br>

<b>Sent:</b> Tuesday, April 10, 2018 7:38 AM<br>

<b>To:</b> Romanova, Katya <<a href="mailto:katya.romanova@sony.com" target="_blank">katya.romanova@sony.com</a>>; Teresa Johnson <<a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a>><br>

<b>Cc:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<b>Subject:</b> Re: [llvm-dev] exploring possibilities for unifying ThinLTO and FullLTO frontend + initial optimization pipeline<u></u><u></u></span></p><p class="MsoNormal"><u></u> <u></u></p>

<div><p class="MsoNormal" style="margin-bottom:12.0pt">Hi Katya,<br>

<br>

[+Teresa since this is about ThinLTO & she's the owner there]<br>

<br>

I'm not sure how other folks feel, but terminologically I'm not sure I think of these as different formats (for example you mention the idea of stripping the summaries from ThinLTO BC files to then feed them in as FullLTO files - I would imagine it'd be reasonable

 to modify/fix/improve the linker integration to have it (perhaps optionally) /ignore/ the summaries, or use the summaries but in a non-siloed way (so that there's not that optimization boundary between ThinLTO and FullLTO))<br>

<br>

You're dealing with a situation where you are shipped BC files offline and then do one, or multiple builds with these BC files?<br>

<br>

If the scenario was more like a naive build: Multiple BC files generated on a single (multi-core/threaded) machine (but some Thin, some Full) & then fed to the linker, I would wonder if it'd be relatively cheap for the LTO step to support this by computing

 summaries for FullLTO files on the fly (without a separate tool/writing the summary to disk, etc). Though I suppose that'd produce a pretty wildly different behavior in the link when just a single ThinLTO BC file was added to an otherwise FullLTO build.<br>

<br>

Anyway - just some (admittedly fairly uninformed) thoughts. I'm sure Teresa has more informed ideas about how this might all look.<span style="color:#1f497d"><u></u><u></u></span></p>

<div>

<div><p class="MsoNormal">On Mon, Apr 9, 2018 at 12:20 PM via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<u></u><u></u></p>

</div>

<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">

<div>

<div><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>Hello,</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>I am exploring the possibility of unifying the BC file generation phase for ThinLTO and FullLTO. Our third party library providers prefer to give us only one version of the BC archives, rather than test and ship both Thin and Full

 LTO BC archives. We want to find a way to allow our users to pick either Thin or Full LTO, while having only one “unified” version of the BC archive.</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>Note, I am not necessarily proposing to do this work in the upstream compiler. If there is no interest from other companies, we might have to keep this as a private patch for Sony.</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>One of the ideas (not my preference) is to mix and match files in the Thin and Full BC formats.  I'm not sure how well the "mix and match" scenario works in general. I was wondering if Apple or Google are doing this for production?</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>I wrote a toy example, compiled one group of files with ThinLTO and the rest with FullLTO, linked them with gold. I saw that irrespective of whether the Thin or Full LTO option was used at the link step, files are optimized within

 the Thin group and within the Full group separately, but they don't know about the files in the other group (which makes sense). Basically, the border between Thin and Full LTO bitcode files created an artificial "barrier" which prevented cross-border optimization.</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>Obviously, I am not too fond of this idea. Even if mixing and matching ThinLTO and FullLTO bitcode files will work “as is”, I suspect we will see a non-trivial runtime performance degradation because of the "ThinLTO"/"FullLTO" border.

 Are you aware of any potential problems with this solution, other than performance?</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span> </span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>Another, hopefully, better idea is to introduce a "unified" BC format, which could either be FullLTO, ThinLTO, or neither (e.g., something in between).</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>If the user chooses FullLTO at the link step, but some of the files are in the Thin BC format – the linker will call a special LTO API to convert these files to the Full LTO BC format (i.e., stripping the module summary section + potentially

 do some additional optimizations from the FullLTO pass manager pipeline).</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>If the user chooses ThinLTO at the link step, but some of the files are in the Full BC format – the linker will call an LTO API to convert these files to the Thin LTO bitcode format (by regenerating the module summary section dynamically

 for the Full LTO bitcode files). </span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>I think the most reasonable idea for the unification of the Thin and Full LTO compilation pipelines is to use Full LTO as the “unified” BC format. If the user requests FullLTO – no additional work is needed, the linker will perform

 FullLTO as usual. If the user request ThinLTO, the linker will call an API to regenerate the module summary section for all the files in the FullLTO format and perform ThinLTO as usual.   

</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>In reality I suspect things will be much more complicated. The pipelines for the Thin and Full LTO compilation phases are quite different. ThinLTO can afford to do much more optimization in the linking phase (since it has parallel

 backends & smaller IR compared to FullLTO), while for FullLTO we are forced to move some optimizations from linking to the compilation phase.</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>So, if we pick FullLTO as our unified format, we would increase the build time for ThinLTO (we will be doing the FullLTO initial optimization pipeline in the compile phase, which is more than what ThinLTO is currently doing, but the

 pipeline of the optimizations in the backend will stay the same). It’s not clear what will happen with the runtime performance: we might improve it (because we repeat some of the optimizations several times), or we might make it worse (because we might do

 an optimization in the early compilation phase, potentially preventing more aggressive optimization later). What are your expectations? Will this approach work in general? If so, what do you think will happen with the runtime performance?</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>I also noticed that the pass manager pipeline is different for ThinLTO+Sample PGO (use profile case). This might create some additional complications for unification of Thin and FullLTO BC generation phase too, but it’s too small detail

 to worry about right now. I’m more interested in choosing a right general direction for solving this problem now.</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>Please share your thoughts!</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>Thank you!</span><u></u><u></u></p><p class="MsoNormal" style="margin-bottom:8.0pt;line-height:105%">

<span>Katya.</span><u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal"><span style="font-family:"Arial",sans-serif"> </span><u></u><u></u></p>

</div>

</div><p class="MsoNormal">______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><u></u><u></u></p>

</blockquote>

</div>

</div>

</div>

</div>

______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

</blockquote></div>

______________________________<wbr>_________________<br>LLVM Developers mailing list<br><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br></div></blockquote></div><br></div></div></div></div></div><br>______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><span style="font-family:Times;font-size:medium"><table cellspacing="0" cellpadding="0"><tbody><tr style="color:rgb(85,85,85);font-family:sans-serif;font-size:small"><td nowrap style="border-top-style:solid;border-top-color:rgb(213,15,37);border-top-width:2px">Teresa Johnson |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(51,105,232);border-top-width:2px"> Software Engineer |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(0,153,57);border-top-width:2px"> <a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a> |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(238,178,17);border-top-width:2px"> 408-460-2413</td></tr></tbody></table></span></div>

</div>