<div dir="ltr">Hi Teresa,<div><br></div><div>Very excited to see this work progressing :)<br><br><div class="gmail_quote"></div></div><div dir="ltr"><div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

The second and third implementation stages will initially be very<br>

volatile, requiring a lot of iterations and tuning with large apps to<br>

get stabilized. Therefore it will be important to do fast commits for<br>

these implementation stages.<br>

<br></blockquote><div><br></div></div></div></div><div dir="ltr"><div><div class="gmail_quote"><div>This sounds interesting. Could use some more description of what you think is going to be needed here.</div></div></div></div><div dir="ltr"><div><div class="gmail_quote"><div> </div></div></div></div><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

2. Stage 2: ThinLTO Infrastructure<br>

----------------------------------------------<br>

<br>

The next set of patches adds the base implementation of the ThinLTO<br>

infrastructure, specifically those required to make ThinLTO functional<br>

and generate correct but not necessarily high-performing binaries. It<br>

also does not include support to make debug support under -g efficient<br>

with ThinLTO.<br>

<br></blockquote><div><br></div><div>This is probably something we should give some more thought to up front. People will definitely want to be able to at least get decent back traces out of their code (functions, file/line/col, arguments maybe) and leaving this as an afterthought could cause more efficiency problems down the road.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

a. Clang/LLVM/gold linker options:<br>

<br>

An early set of clang/llvm patches is needed to provide options to<br>

enable ThinLTO (off by default), so that the rest of the<br>

implementation can be disabled by default as it is added.<br>

Specifically, clang options -fthinlto (used instead of -flto) will<br>

cause clang to invoke the phase-1 emission of LLVM bitcode and<br>

function summary/index on a compile step, and pass the appropriate<br>

option to the gold plugin on a link step. The -thinlto option will be<br>

added to the gold plugin and llvm-lto tool to launch the phase-2 thin<br>

archive step. The -thinlto option will also be added to the ‘opt’ tool<br>

to invoke it as a phase-3 parallel backend instance.<br>

<br>

<br>

b. Thin-archive linking support in Gold plugin and llvm-lto:<br>

<br>

Under the new plugin option (see above), the plugin needs to perform<br>

the phase-2 (thin archive) link which simply emits a combined function<br>

map from the linked modules, without actually performing the normal<br>

link. Corresponding support should be added to the standalone llvm-lto<br>

tool to enable testing/debugging without involving the linker and<br>

plugin.<br>

<br></blockquote><div><br></div><div>Have you described thin archives anywhere? I might have missed it, but I'm curious how you see this working.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

c. ThinLTO backend support:<br>

<br>

Support for invoking a phase-3 backend invocation (including<br>

importing) on a module should be added to the ‘opt’ tool under the new<br>

option. The main change under the option is to instantiate a Linker<br>

object used to manage the process of linking imported functions into<br>

the module, efficient read of the combined function map, and enable<br>

the ThinLTO import pass.<br></blockquote><div><br></div><div>In general the phases that you have here sound interesting, but I'm not sure that I've seen the background describing them? Can you describe this sort of change here in more detail?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Each function available for importing from the module contains an<br>

entry in the module’s function index/summary section and in the<br>

resulting combined function map. Each function entry contains that<br>

function’s offset within the bitcode file, used to efficiently locate<br>

and quickly import just that function. The entry also contains summary<br>

information (e.g. basic information determined during parsing such as<br>

the number of instructions in the function), that will be used to help<br>

guide later import decisions. Because the contents of this section<br>

will change frequently during ThinLTO tuning, it should also be marked<br>

with a version id for backwards compatibility or version checking.<br>

<br></blockquote><div><br></div><div><Insert bike shed discussion of formatting, versioning, etc></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

e. ThinLTO importing support:<br>

<br>

Support for the mechanics of importing functions from other modules,<br>

which can go in gradually as a set of patches since it will be off by<br>

default. Separate patches can include:<br>

<br>

- BitcodeReader changes to use function index to import/deserialize<br>

single function of interest (small changes, leverages existing lazy<br>

streamer support).<br>

<br></blockquote><div><br></div><div>Sounds like this is trying to optimize the O(n) (effectively) module scan with an AoT computation of offset in a file. Perhaps it might be worth adding such a functionality into the module itself anyhow?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">- Marking of imported functions (for use in ThinLTO-specific symbol<br>

linking and global DCE, for example). This can be in-memory initially,<br>

but IR support may be required in order to support streaming bitcode<br>

out and back in again after importing.<br>

<br></blockquote><div><br></div><div>How is this different from the existing linkage facilities?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- ModuleLinker changes to do ThinLTO-specific symbol linking and<br>

static promotion when necessary. The linkage type of imported<br>

functions changes to AvailableExternallyLinkage, for example. Statics<br>

must be promoted in certain cases, and renamed in consistent ways.<br>

<br></blockquote><div><br></div><div>Ditto.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- GlobalDCE changes to support removing imported functions that were<br>

not inlined (very small changes to existing pass logic).<br>

<br></blockquote><div><br></div><div>Ditto.</div><div><br></div><div>(I think I've seen some discussion here already, if I should go and read those threads just feel free to say that :)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

f. ThinLTO Import Driver SCC pass:<br>

<br>

Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via<br>

an SCC pass, enabled only under -fthinlto options. The pass includes<br>

utilizing the thin archive (global function index/summary), import<br>

decision heuristics, invocation of LTOModule/ModuleLinker routines<br>

that perform the import, and any necessary callgraph updates and<br>

verification.<br>

<br></blockquote><div><br></div><div>Would it be worth instead of trying to hook some of this in to clang/opt but have a separate driver to prototype this up? This way the functionality and the driver could be separate from the rest of the optimization pipeline as well as making it (I'd hope) be more testable.</div><div><br></div><div>We could also use that as a way to test the decision making etc ala some of the -### stuff out of clang or -debug output. (This description is a bit of a stretch, but hopefully my point gets across).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">3. Stage 3: ThinLTO Tuning and Enhancements<br>

----------------------------------------------------------------<br>

<br>

This refers to the patches that are not required for ThinLTO to work,<br>

but rather to improve compile time, memory, run-time performance and<br>

usability.<br>

<br>

<br>

a. Lazy Debug Metadata Linking:<br>

<br>

The prototype implementation included lazy importing of module-level<br>

metadata during the ThinLTO pass finalization (i.e. after all function<br>

importing is complete). This actually applies to all module-level<br>

metadata, not just debug, although it is the largest. This can be<br>

added as a separate set of patches. Changes to BitcodeReader,<br>

ValueMapper, ModuleLinker<br></blockquote><div><br></div><div>Can you describe more of what you've done here? We're trying to optimize a lot of these areas for normal LTO as well.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

b. Import Tuning:<br>

<br>

Tuning the import strategy will be an iterative process that will<br>

continue to be refined over time. It involves several different types<br>

of changes: adding support for recording additional metrics in the<br>

function summary, such as profile data and optional heavier-weight IPA<br>

analyses, and tuning the import heuristics based on the summary and<br>

callsite context.<br>

<br></blockquote><div><br></div><div>How is this different from the existing profile work that Diego has been doing? I.e. how are the formats etc going to communicate?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

c. Combined Function Map Pruning:<br>

<br>

The combined function map can be pruned of functions that are unlikely<br>

to benefit from being imported. For example, during the phase-2 thin<br>

archive plug step we can safely omit large and (with profile data)<br>

cold functions, which are unlikely to benefit from being inlined.<br>

Additionally, all but one copy of comdat functions can be suppressed.<br>

<br></blockquote><div><br></div><div>The comdat function bit will happen with module linking, but perhaps an idea would be to make a first pass over the code and:</div><div><br></div><div>a) create a new module</div><div>b) move cold functions inside while leaving declarations behind</div><div>c) migrate comdat functions the same sort of way (though perhaps not out of line)</div><div><br></div><div>One random thought is that you'll need to work on the internalize pass to handle the distributed information you have.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

d. Distributed Build System Integration:<br>

<br>

For a distributed build system, the gold plugin should write the<br>

parallel backend invocations into a makefile, including the mapping<br>

from the IR file to the real object file path, and exit. Additional<br>

work needs to be done in the distributed build system itself to<br>

distribute and dispatch the parallel backend jobs to the build<br>

cluster.<br>

<br></blockquote><div><br></div><div>Hmm? I'd love to see you elaborate here, but it's probably just far enough in the future that we can hit that when we get there.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

e. Dependence Tracking and Incremental Compiles:<br>

<br>

In order to support build systems that stage from local disks or<br>

network storage, the plugin will optionally support computation of<br>

dependent sets of IR files that each module may import from. This can<br>

be computed from profile data, if it exists, or from the symbol table<br>

and heuristics if not. These dependence sets also enable support for<br>

incremental backend compiles.<br>

<br>

<br></blockquote><div><br></div><div>Ditto.</div><div><br></div><div>-eric</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

--<br>

Teresa Johnson | Software Engineer | <a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a> | 408-460-2413<br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</blockquote></div></div></div>