<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 3, 2015 at 10:51 AM, Xinliang David Li <span dir="ltr"><<a href="mailto:xinliangli@gmail.com" target="_blank">xinliangli@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>Last I heard, llvm-ar is much faster than the GNU ar, etc.<br>

<span><font color="#888888"><br></font></span></blockquote><div><br></div></span><div>They are not fully feature compatible. For instance, support of thin archive is missing (not that it is hard to implement it though).</div></div></div></div></blockquote><div><br></div><div>It's on the LLVM roadmap to make these a drop-in replacement, so any work to improve them is not wasted and would be greatly appreciated. Don't let their current state hold you back from considering them for ThinLTO deployment; as you've said, the features are generally pretty straightforward to implement (and ones that aren't are usually looking deeply-enough into an object file that running them on bitcode-in-object-file doesn't make sense).</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="HOEnZb"><font color="#888888"><div><br></div><div>David</div></font></span><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><font color="#888888">

Alex<br>

</font></span><div><div><br>

> Teresa<br>

><br>

>><br>

>> -- Sean Silva<br>

>><br>

>>><br>

>>> And not all build systems have the plugin or<br>

>>> currently pass it to the native tools that can take a plugin for<br>

>>> handling bitcode. In those cases the bitcode support is not<br>

>>> transparently available, and our aim is to reduce the friction as much<br>

>>> as possible. And not all use LTO currently (I know we don't due to the<br>

>>> scalability issues we're trying to address with this design), and in<br>

>>> those cases the migration to bitcode-aware tools and plugins was not<br>

>>> previously required.<br>

>>><br>

>>> For Sony's linker, are you using the gold plugin or libLTO interfaces?<br>

>>> If the latter, I suppose some ThinLTO handling would have to be added<br>

>>> to your linker (e.g. to invoke the LLVM hooks to write the stage-2<br>

>>> combined function map and either launch the backend processes in<br>

>>> parallel or write out a make or other build file). The current support<br>

>>> for reading native object wrapped bitcode is baked into IRObjectFile<br>

>>> so presumably the Sony linker can handle these native object wrapped<br>

>>> bitcode files if it uses libLTO. We would similarly embed the handling<br>

>>> of the function index/summary behind an API that can handle either so<br>

>>> it is similarly transparent to the linkers. Let me know if there would<br>

>>> be additional issues that make wrapped bitcode more difficult in your<br>

>>> case, or how we could make ThinLTO usage simpler for you in general.<br>

>>><br>

>>>><br>

>>>> The only tool in the list of tools you mentioned that do not support<br>

>>>> bitcode directly is objcopy, and that's because nobody has yet written an<br>

>>>> LLVM-project implementation of it. Personally, I'd much rather you focus on<br>

>>>> making ThinLTO work by extending bitcode as needed, and we work as a<br>

>>>> community toward replacing objcopy with an LLVM-native one. It's a big<br>

>>>> missing piece of the LLVM project today and could be so much better if we<br>

>>>> could use it to replace Apple's lipo and possibly other extant object file<br>

>>>> modification tools. (Has anyone surveyed this area?)<br>

>>>><br>

>>>> That older toolchains have tried to slip non-object file data through<br>

>>>> the binary utilities isn't really proof that this is a good choice. It might<br>

>>>> simply reflect the realities of those engineering teams. I wasn't at Sun for<br>

>>>> this, but DTrace needed a linker feature that apparently the Sun linker team<br>

>>>> was unwilling or unable to provide, so dtrace(1) gained the ability to<br>

>>>> modify ELF files directly as needed. That doesn't prove that DTrace's USDT<br>

>>>> feature shouldn't have been implemented in the linker (as ld64 does directly<br>

>>>> for Apple), does it?<br>

>>><br>

>>> I'd argue that the realities being addressed by using native object<br>

>>> format in those cases still exist.<br>

>>><br>

>>>><br>

>>>> If in the end using native object-wrapped bitcode is the best solution,<br>

>>>> so be it. However, I think it is largely orthogonal to ThinLTO's needs for<br>

>>>> transporting symtab data alongside the existing bitcode format.<br>

>>><br>

>>> That's certainly true, ThinLTO can be implemented using either format,<br>

>>> and bitcode only support can certainly be implemented. It is a matter<br>

>>> of prioritizing which format to implement first. I had added some<br>

>>> description to the updated RFC on how the function index/summary can<br>

>>> be represented, etc in bitcode. Prioritizing the native object format<br>

>>> doesn't make it easier to implement ThinLTO, but should make it easier<br>

>>> to deploy.<br>

>>><br>

>>> Thanks!<br>

>>> Teresa<br>

>>><br>

>>>><br>

>>>> Alex<br>

>>>><br>

>>>>> On May 28, 2015, at 2:10 PM, Teresa Johnson <<a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a>><br>

>>>>> wrote:<br>

>>>>><br>

>>>>> As promised, here is an new version of the ThinLTO RFC, updated based<br>

>>>>> on some of the comments, questions and feedback from the first RFC.<br>

>>>>> Hopefully we have addressed many of these, and as noted below, will<br>

>>>>> fork some of the detailed discussion on particular aspects into<br>

>>>>> separate design doc threads. Please send any additional feedback and<br>

>>>>> questions on the overall design.<br>

>>>>> Thanks!<br>

>>>>> Teresa<br>

>>>>><br>

>>>>><br>

>>>>> Updated RFC to discuss plans for implementing ThinLTO upstream,<br>

>>>>> reflecting feedback and discussion from initial RFC<br>

>>>>> (<a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085557.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085557.html</a>). As<br>

>>>>> discussed in the earlier thread and below, more detailed design<br>

>>>>> documents for several pieces (native object format, linkage type<br>

>>>>> changes and static promotions, etc) are in progress and will be sent<br>

>>>>> separately. This RFC covers the overall design and the breakdown of<br>

>>>>> work at a higher level.<br>

>>>>><br>

>>>>><br>

>>>>> Background on ThinLTO can be found in slides from EuroLLVM 2015:<br>

>>>>><br>

>>>>> <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_open-3Fid-3D0B036uwnWM6RWWER1ZEl5SUNENjQ-26authuser-3D0&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=0PsInzni6pJ8kT96juwCqS61kiWswqLV5VwBAECwl1Q&s=hdSlg1-d6YUc9987m7wHEHsDl40iTNslm8-tBb4CoCU&e=" target="_blank">https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0</a><br>

>>>>> As described in the talk, we have a prototype implementation, and<br>

>>>>> would like to start staging patches upstream. This RFC describes a<br>

>>>>> breakdown of the major pieces. We would like to commit upstream<br>

>>>>> gradually in several stages, with all functionality off by default.<br>

>>>>> The core ThinLTO importing support and tuning will require frequent<br>

>>>>> change and iteration during testing and tuning, and for that part we<br>

>>>>> would like to commit rapidly (off by default). See the proposed staged<br>

>>>>> implementation described in the Implementation Plan section.<br>

>>>>><br>

>>>>><br>

>>>>> ThinLTO Overview<br>

>>>>> ==================<br>

>>>>><br>

>>>>><br>

>>>>> See the talk slides linked above for more details. The following is a<br>

>>>>> high-level overview of the motivation.<br>

>>>>><br>

>>>>><br>

>>>>> Cross Module Optimization (CMO) is an effective means for improving<br>

>>>>> runtime performance, by extending the scope of optimizations across<br>

>>>>> source module boundaries. Without CMO, the compiler is limited to<br>

>>>>> optimizing within the scope of single source modules. Two solutions<br>

>>>>> for enabling CMO are Link-Time Optimization (LTO), which is currently<br>

>>>>> supported in LLVM and GCC, and Lightweight-Interprocedural<br>

>>>>> Optimization (LIPO). However, each of these solutions has limitations<br>

>>>>> that prevent it from being enabled by default. ThinLTO is a new<br>

>>>>> approach that attempts to address these limitations, with a goal of<br>

>>>>> being enabled more broadly. ThinLTO is designed with many of the same<br>

>>>>> principals as LIPO, and therefore its advantages, without any of its<br>

>>>>> inherent weakness. Unlike in LIPO where the module group decision is<br>

>>>>> made at profile training runtime, ThinLTO makes the decision at<br>

>>>>> compile time, but in a lazy mode that facilitates large scale<br>

>>>>> parallelism. LTO implementations all contain a serial IPA/IPO step<br>

>>>>> that is both memory intensive and slow, limiting usability on both<br>

>>>>> smaller workstations and huge applications. In contrast, the ThinLTO<br>

>>>>> serial linker plugin phase is designed to be razor thin and blazingly<br>

>>>>> fast. By default this step only does minimal preparation work to<br>

>>>>> enable the parallel lazy importing performed later. ThinLTO aims to be<br>

>>>>> scalable like a regular O2 build, enabling CMO on machines without<br>

>>>>> large memory configurations, while also integrating well with<br>

>>>>> distributed build systems. Results from early prototyping on SPEC<br>

>>>>> cpu2006 C++ benchmarks are in line with expectations that ThinLTO can<br>

>>>>> scale like O2 while enabling much of the CMO performed during a full<br>

>>>>> LTO build.<br>

>>>>><br>

>>>>><br>

>>>>> A ThinLTO build is divided into 3 phases, which are referred to in the<br>

>>>>> following implementation plan:<br>

>>>>> 1. phase-1: IR and Function Summary Generation (-c compile)<br>

>>>>> 2. phase-2: Thin Linker Plugin Layer (thin archive linker step)<br>

>>>>> 3. phase-3: Parallel Backend with Demand-Driven Importing<br>

>>>>><br>

>>>>><br>

>>>>> Implementation Plan<br>

>>>>> ====================<br>

>>>>><br>

>>>>><br>

>>>>> This section gives a high-level breakdown of the ThinLTO support that<br>

>>>>> will be added, in roughly the order that the patches would be staged.<br>

>>>>> The patches are divided into three stages. The first stage contains a<br>

>>>>> minimal amount of preparation work that is not ThinLTO-specific. The<br>

>>>>> second stage contains most of the infrastructure for ThinLTO, which<br>

>>>>> will be off by default. The third stage includes<br>

>>>>> enhancements/improvements/tunings that can be performed after the main<br>

>>>>> ThinLTO infrastructure is in.<br>

>>>>><br>

>>>>><br>

>>>>> The second and third implementation stages will initially be very<br>

>>>>> volatile, requiring a lot of iterations and tuning with large apps to<br>

>>>>> get stabilized. Therefore it will be important to do fast commits for<br>

>>>>> these implementation stages.<br>

>>>>><br>

>>>>><br>

>>>>> 1. Stage 1: Preparation<br>

>>>>> ------------------------------------<br>

>>>>><br>

>>>>><br>

>>>>> The first planned sets of patches are enablers for ThinLTO work:<br>

>>>>><br>

>>>>><br>

>>>>> a. LTO directory structure<br>

>>>>><br>

>>>>><br>

>>>>> Restructure the LTO directory to remove circular dependence when<br>

>>>>> ThinLTO pass added. Because ThinLTO is being implemented as a SCC pass<br>

>>>>> within Transforms/IPO, and leverages the LTOModule class for linking<br>

>>>>> in functions from modules, IPO then requires the LTO library. This<br>

>>>>> creates a circular dependence between LTO and IPO. To break that, we<br>

>>>>> need to split the lib/LTO directory/library into lib/LTO/CodeGen and<br>

>>>>> lib/LTO/Module, containing LTOCodeGenerator and LTOModule,<br>

>>>>> respectively. Only LTOCodeGenerator has a dependence on IPO, removing<br>

>>>>> the circular dependence.<br>

>>>>><br>

>>>>><br>

>>>>> Note that libLTO and llvm-lto use LTOModule/LTOCodeGenerator, whereas<br>

>>>>> the gold plugin uses lib/Object/IRObject and lib/Linker directly. The<br>

>>>>> use of LTOModule in the ThinLTO pass is a convenience, but could be<br>

>>>>> avoided by using the IRObject/Linker methods directly if that is<br>

>>>>> preferred.<br>

>>>>><br>

>>>>><br>

>>>>> b. Native object wrapper generation support<br>

>>>>><br>

>>>>><br>

>>>>> Implement native-object wrapped bitcode writer. The main goal is to<br>

>>>>> more easily interact with existing native tools such as $AR, $NM, “$LD<br>

>>>>> -r”, $OBJCOPY, and $RANLIB, without requiring the build system to find<br>

>>>>> and pass the plugin as an option. We plan to emit the phase-1 bitcode<br>

>>>>> wrapped in native object format via the .llvmbc section, along with a<br>

>>>>> symbol table. We will implement ELF first, but subsequently extend<br>

>>>>> support to COFF and Mach-O. Additionally, we also want to avoid doing<br>

>>>>> partial LTO/ThinLTO across files linked with “$LD -r” (i.e. the<br>

>>>>> resulting object file should still contain native object-wrapped<br>

>>>>> bitcode to enable ThinLTO at the full link step). I will send a<br>

>>>>> separate design document for these changes, including the format of<br>

>>>>> the symtab and function index/summary section, but the following is a<br>

>>>>> high-level motivation and overview.<br>

>>>>><br>

>>>>><br>

>>>>> Note that support for ThinLTO using bitcode can be added as a<br>

>>>>> follow-on under an option, so that bitcode-aware tools do not need to<br>

>>>>> use the wrapper. Under the bitcode-only option, the symbol table will<br>

>>>>> be replaced by the bitcode form of the function index and summary<br>

>>>>> section, which can be encoded as a new bitcode block type. Changes<br>

>>>>> should be made to the gold plugin to avoid partial link of bitcode<br>

>>>>> files under “$LD -r” (emitting bitcode rather than compiling all the<br>

>>>>> way down to native code, which is how ld64 behaves on Darwin as per<br>

>>>>> dexonsmith).<br>

>>>>><br>

>>>>><br>

>>>>> Advantages of using native object format:<br>

>>>>> * Out of the box interoperability with existing native build tools<br>

>>>>> ($AR, $NM, “$LD -r”, $OBJCOPY, and $RANLIB) which may not currently<br>

>>>>> know how to locate/pass the appropriate plugin.<br>

>>>>> * There is precedence in using this format: other compilers also wrap<br>

>>>>> intermediate LTO files (probably related to the above advantage)[1].<br>

>>>>> * Tools that modify symbol linkage and visibility (e.g. $OBJCOPY and<br>

>>>>> “$LD -r”) can mark the change in the symbol table without needing to<br>

>>>>> parse/change/encode bitcode. The change can be propagated to bitcode<br>

>>>>> by the ThinLTO backend.<br>

>>>>> * Some tools only need to read/write the symtab and can avoid<br>

>>>>> parsing/encoding bitcode (e.g. $NM, $OBJCOPY).<br>

>>>>> * The second phase of ThinLTO does not need to parse the bitcode when<br>

>>>>> creating the combined function index.<br>

>>>>><br>

>>>>><br>

>>>>> Disadvantages of using native object format:<br>

>>>>> * Unnecessary when using plugins with plugin-aware native tools, or<br>

>>>>> LLVM’s custom tools.<br>

>>>>> * Slightly increase disk storage and I/O from symtab. However, with<br>

>>>>> our design the symtab is leveraged to hold function indexing info<br>

>>>>> required for ThinLTO. The I/O for some build tools and build steps can<br>

>>>>> actually be reduced as there is no need to read the bitcode, as<br>

>>>>> described above.<br>

>>>>><br>

>>>>><br>

>>>>> Support was added to LLVM for reading native object-wrapped bitcode<br>

>>>>> (<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_rL218078&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=0PsInzni6pJ8kT96juwCqS61kiWswqLV5VwBAECwl1Q&s=3hPqXdNQhS0J-nJXyciEj3q-WK2NBFpYP5zKiu4u3s8&e=" target="_blank">http://reviews.llvm.org/rL218078</a>), but there does not yet exist<br>

>>>>> support in LLVM/Clang for emitting bitcode wrapped in native object<br>

>>>>> format. I plan to add support for optionally generating bitcode in an<br>

>>>>> native object file containing a single .llvmbc section holding the<br>

>>>>> bitcode. Specifically, the patch would add new options<br>

>>>>> “emit-llvm-native-object” (object file) and corresponding<br>

>>>>> “emit-llvm-native-assembly” (textual assembly code equivalent).<br>

>>>>> Eventually these would be automatically triggered under “-fthinlto -c”<br>

>>>>> and “-fthinlto -S”, respectively.<br>

>>>>><br>

>>>>><br>

>>>>> Additionally, a symbol table will be generated in the native object<br>

>>>>> file, holding the function symbols within the bitcode. This<br>

>>>>> facilitates handling archives of the native object-wrapped bitcode<br>

>>>>> created with $AR, since the archive will have a symbol table as well.<br>

>>>>> The archive symbol table enables gold to extract and pass to the<br>

>>>>> plugin the constituent native object-wrapped bitcode files. To support<br>

>>>>> the concatenated llvmbc section generated by “$LD -r”, some handling<br>

>>>>> needs to be added to gold and to the backend driver to process each<br>

>>>>> original module’s bitcode.<br>

>>>>><br>

>>>>><br>

>>>>> The function index/summary will later be added as a special native<br>

>>>>> object section alongside the .llvmbc sections. The offset and size of<br>

>>>>> the corresponding function summary can be placed in the associated<br>

>>>>> symtab entry. As noted above, a separate design document will be sent<br>

>>>>> for the native object format changes.<br>

>>>>><br>

>>>>><br>

>>>>> 2. Stage 2: ThinLTO Infrastructure<br>

>>>>> ------------------------------------------------------<br>

>>>>><br>

>>>>><br>

>>>>> The next set of patches adds the base implementation of the ThinLTO<br>

>>>>> infrastructure, specifically those required to make ThinLTO functional<br>

>>>>> and generate correct but not necessarily high-performing binaries.<br>

>>>>><br>

>>>>><br>

>>>>> a. Clang/LLVM/gold linker options<br>

>>>>><br>

>>>>><br>

>>>>> An early set of clang/llvm patches is needed to provide options to<br>

>>>>> enable ThinLTO (off by default), so that the rest of the<br>

>>>>> implementation can be disabled by default as it is added.<br>

>>>>> Specifically, clang options -fthinlto (used instead of -flto) will<br>

>>>>> cause clang to invoke the phase-1 emission of LLVM bitcode and<br>

>>>>> function summary/index on a compile step, and pass the appropriate<br>

>>>>> option to the gold plugin on a link step. The -thinlto option will be<br>

>>>>> added to the gold plugin and llvm-lto tool to launch the phase-2 thin<br>

>>>>> archive step. The -thinlto-be option will also be added to clang to<br>

>>>>> invoke it as a phase-3 parallel backend instance with a bitcode file<br>

>>>>> as input.<br>

>>>>><br>

>>>>><br>

>>>>> b. Thin-archive linking support in Gold plugin and llvm-lto<br>

>>>>><br>

>>>>><br>

>>>>> Under the new plugin option (see above), the plugin needs to perform<br>

>>>>> the phase-2 (thin archive) link which simply emits a combined function<br>

>>>>> index from the linked modules, without actually performing the normal<br>

>>>>> link. Corresponding support should be added to the standalone llvm-lto<br>

>>>>> tool to enable testing/debugging without involving the linker and<br>

>>>>> plugin.<br>

>>>>><br>

>>>>><br>

>>>>> c. ThinLTO backend support<br>

>>>>><br>

>>>>><br>

>>>>> Support for invoking a phase-3 backend invocation (including<br>

>>>>> importing) on a module should be added to the clang driver under the<br>

>>>>> new option. The main change under the option is to instantiate a<br>

>>>>> Linker object used to manage the process of linking imported functions<br>

>>>>> into the module, efficient read of the combined function index, and<br>

>>>>> enable the ThinLTO import pass.<br>

>>>>><br>

>>>>><br>

>>>>> d. Function index/summary support<br>

>>>>><br>

>>>>><br>

>>>>> This includes infrastructure for writing and reading the function<br>

>>>>> index/summary section. As noted earlier this will be encoded in a<br>

>>>>> special section within the native object file for the module,<br>

>>>>> alongside the .llvmbc section containing the bitcode. The thin archive<br>

>>>>> (combined function index) generated by phase-2 of ThinLTO simply<br>

>>>>> contains all of the function index/summary sections across the linked<br>

>>>>> modules, organized for efficient function lookup. As mentioned earlier<br>

>>>>> when discussing the native object wrapper format, a separate design<br>

>>>>> document will be sent for this format.<br>

>>>>><br>

>>>>><br>

>>>>> Each function available for importing from the module contains an<br>

>>>>> entry in the module’s function index/summary section and in the<br>

>>>>> resulting combined function index. Each function entry contains that<br>

>>>>> function’s offset within the bitcode file, used to efficiently locate<br>

>>>>> and quickly import just that function (see below in 2e for more<br>

>>>>> details on the importing mechanics). The entry also contains summary<br>

>>>>> information (e.g. basic information determined during parsing such as<br>

>>>>> the number of instructions in the function), that will be used to help<br>

>>>>> guide later import decisions. Because the contents of this section<br>

>>>>> will change frequently during ThinLTO tuning, it should also be marked<br>

>>>>> with a version id for backwards compatibility or version checking.<br>

>>>>><br>

>>>>><br>

>>>>> e. ThinLTO importing support<br>

>>>>><br>

>>>>><br>

>>>>> Support for the mechanics of importing functions from other modules,<br>

>>>>> which can go in gradually as a set of patches since it will be off by<br>

>>>>> default (the ThinLTO pass itself discussed below in 2f).<br>

>>>>><br>

>>>>><br>

>>>>> Note that ThinLTO function importing is iterative, and we may import<br>

>>>>> from a number of modules in an interleaved fashion. For example,<br>

>>>>> assume we have hot call chains a()->b1()->c() and a()->b2()->d(),<br>

>>>>> where functions a(), b1()/b2(), c() and d() are from modules A, B, C<br>

>>>>> and D, respectively. When performing ThinLTO backend compilation of<br>

>>>>> module A, we may decide to import in the following order (based on<br>

>>>>> callsite and function summary info):<br>

>>>>> 1. B::b1()  # exposes call to c()<br>

>>>>> 2. C::c()<br>

>>>>> 3. B::b2()  # exposes call to d()<br>

>>>>> 4. D::d()<br>

>>>>> For this reason, ThinLTO importing is different than regular LTO<br>

>>>>> bitcode reading and linking, which reads and links in a module in its<br>

>>>>> entirety on a single pass through each module (notice in the above<br>

>>>>> example the imports of the two module B functions have an intervening<br>

>>>>> import from module C). As a result, for example, the existing support<br>

>>>>> for lazy metadata parsing that delays it until the first function is<br>

>>>>> materialized can’t be leveraged (metadata handling is discussed more<br>

>>>>> below in 2h). Therefore, the ThinLTO importing pass instantiates a new<br>

>>>>> BitcodeReader and LTOModule object for each function we decide to<br>

>>>>> import, parsing only what is needed and linking in just that function.<br>

>>>>> This is fast and efficient as found in the prototype results shown in<br>

>>>>> the linked EuroLLVM slides.<br>

>>>>><br>

>>>>><br>

>>>>> Separate patches can include:<br>

>>>>><br>

>>>>><br>

>>>>> * BitcodeReader changes to use function index to import/deserialize<br>

>>>>> single function of interest (small changes, leverages existing lazy<br>

>>>>> function streamer support). The declarations and other symbol table<br>

>>>>> info in the bitcode must be reloaded, but the bitcode parsing can stop<br>

>>>>> once the first function body is hit. We simply set up an entry in the<br>

>>>>> lazy streamer’s DeferredFunctionInfo function index map from the<br>

>>>>> bitcode index that was saved in the ThinLTO function summary (and<br>

>>>>> therefore don’t need to build up this function index structure through<br>

>>>>> repeated calls to RememberAndSkipFunctionBody via<br>

>>>>> FindFunctionInStream).<br>

>>>>> * Minor LTOModule changes to pass the ThinLTO function to import and<br>

>>>>> its index into bitcode reader (see 1a for discussion on LTOModule<br>

>>>>> use).<br>

>>>>> * Marking of imported functions. Most handling for ThinLTO imported<br>

>>>>> functions will simply rely on applying the appropriate linkage type.<br>

>>>>> But it is useful to know which functions were imported, both for<br>

>>>>> compiler debugging and and verification, and possibly to modify some<br>

>>>>> optimization heuristics along with the summary information. This can<br>

>>>>> be in-memory initially, but IR support may be required in order to<br>

>>>>> support streaming bitcode out and back in again after importing.<br>

>>>>> * ModuleLinker changes to do ThinLTO-specific symbol linking and<br>

>>>>> static promotion when necessary. The linkage type of imported<br>

>>>>> non-local functions and variables changes to<br>

>>>>> AvailableExternallyLinkage, for example. Statics must be promoted in<br>

>>>>> certain cases, and accordingly renamed in consistent ways. Read-write<br>

>>>>> or address-taken static variables must always be promoted. Other<br>

>>>>> discardable functions, i.e. link-once such as comdats, will be force<br>

>>>>> imported on reference by another imported function. We are working on<br>

>>>>> a separate design document describing these changes in more detail<br>

>>>>> with examples, as a more detailed discussion of these changes is<br>

>>>>> beyond the scope of this RFC.<br>

>>>>> * GlobalDCE changes to support removing imported non-local functions<br>

>>>>> that were not inlined and imported non-local variables, which are<br>

>>>>> marked AvailableExternallyLinkage (very small changes to existing pass<br>

>>>>> logic). As discussed in the original RFC threads, currently GlobalDCE<br>

>>>>> does not remove referenced AvailableExternallyLinkage functions.<br>

>>>>> Instead, these are suppressed later during code generation. It isn’t<br>

>>>>> clear that these functions are useful past the first call to<br>

>>>>> GlobalDCE, which is after inlining, GlobalOpt and IPSCCP (so<br>

>>>>> presumably after inter procedural constant prop, etc). Patch with<br>

>>>>> these changes in testing as discussed in this thread:<br>

>>>>> <a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085807.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085807.html</a>.<br>

>>>>><br>

>>>>><br>

>>>>> f. ThinLTO Import Driver SCC pass<br>

>>>>><br>

>>>>><br>

>>>>> Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via<br>

>>>>> an SCC pass, enabled only under the -fthinlto-be option. The pass<br>

>>>>> includes utilizing the thin archive[2] (combined global function<br>

>>>>> index/summary), import decision heuristics, invocation of<br>

>>>>> LTOModule/ModuleLinker routines that perform the import, and any<br>

>>>>> necessary callgraph updates and verification.<br>

>>>>><br>

>>>>><br>

>>>>> g. Backend Driver<br>

>>>>><br>

>>>>><br>

>>>>> For a single node build, the gold plugin will initially exec the<br>

>>>>> backend processes directly, with the amount of parallelism controlled<br>

>>>>> via an option and/or env variable. It is also possible to leverage<br>

>>>>> existing single node build system task dispatching mechanisms such as<br>

>>>>> Unix Makefiles, Ninja, etc., where the plugin can simply write a build<br>

>>>>> file and fork the parallel backend instances directly under an<br>

>>>>> appropriate option. We will also initially add support for our<br>

>>>>> distributed build system as described below under 3c.<br>

>>>>><br>

>>>>><br>

>>>>> h. Lazy Debug Metadata Linking<br>

>>>>><br>

>>>>><br>

>>>>> The prototype implementation included lazy importing of module-level<br>

>>>>> metadata during the ThinLTO pass finalization (i.e. after all function<br>

>>>>> importing is complete). This actually applies to all module-level<br>

>>>>> metadata, not just debug, although it is the largest. This can be<br>

>>>>> added as a separate set of patches, and the detailed design will be<br>

>>>>> sent with those. Includes changes to BitcodeReader, ValueMapper, and<br>

>>>>> the ModuleLinker classes. As described in 2e, due to the<br>

>>>>> iterative/interleaved nature of ThinLTO importing, the bitcode parsing<br>

>>>>> is structured differently than LTO where a single pass over each<br>

>>>>> module can be performed to parse and materialize all functions and<br>

>>>>> metadata. Therefore, the lazy metadata parsing support in<br>

>>>>> BitcodeReader, which parses all the metadata once the first function<br>

>>>>> is materialized, are not applicable. We may instantiate a<br>

>>>>> BitcodeReader multiple times for a module, if multiple functions are<br>

>>>>> eventually imported, and we need a way to suture up the metadata to<br>

>>>>> the functions imported by an earlier BitcodeReader instantiation. The<br>

>>>>> high level summary is that during the initial import we leave the<br>

>>>>> temporary metadata on the instructions that were imported, but save<br>

>>>>> the index used by the bitcode reader used to correlate with the<br>

>>>>> metadata when it is ready (i.e. the MDValuePtrs index), and skip the<br>

>>>>> metadata parsing. During the ThinLTO pass finalization we parse just<br>

>>>>> the metadata, and suture it up during metadata value mapping using the<br>

>>>>> saved index. As mentioned earlier, this will be described in more<br>

>>>>> detail when the patches are ready.<br>

>>>>><br>

>>>>><br>

>>>>> 3. Stage 3: ThinLTO Tuning and Enhancements<br>

>>>>><br>

>>>>> -------------------------------------------------------------------------<br>

>>>>><br>

>>>>><br>

>>>>> This refers to the patches that are not required for ThinLTO to work,<br>

>>>>> but rather to improve compile time, memory, run-time performance and<br>

>>>>> usability.<br>

>>>>><br>

>>>>><br>

>>>>> a. Import Tuning<br>

>>>>><br>

>>>>><br>

>>>>> Tuning the import strategy will be an iterative process that will<br>

>>>>> continue to be refined over time. It involves several different types<br>

>>>>> of changes: adding support for recording additional metrics in the<br>

>>>>> function summary, such as profile data and optional heavier-weight IPA<br>

>>>>> analyses, and tuning the import heuristics based on the summary and<br>

>>>>> callsite context.<br>

>>>>><br>

>>>>><br>

>>>>> b. Combined Function Index Pruning<br>

>>>>><br>

>>>>><br>

>>>>> The combined function index can be pruned of functions that are<br>

>>>>> unlikely to benefit from being imported. For example, during the<br>

>>>>> phase-2 thin archive plug step we can safely omit large and (with<br>

>>>>> profile data) cold functions, which are unlikely to benefit from being<br>

>>>>> inlined. Additionally, all but one copy of comdat functions can be<br>

>>>>> suppressed.<br>

>>>>><br>

>>>>><br>

>>>>> c. Distributed Build System Integration<br>

>>>>><br>

>>>>><br>

>>>>> For a distributed build system such as Bazel (<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__bazel.io_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=0PsInzni6pJ8kT96juwCqS61kiWswqLV5VwBAECwl1Q&s=65whxawpYDsRwKfFOkJtwKog-n2UbU_REkAKKQAzzHU&e=" target="_blank">http://bazel.io/</a>), the<br>

>>>>> gold plugin should write the parallel backend invocations into a build<br>

>>>>> file, including the mapping from the IR file to the real object file<br>

>>>>> path, and exit. Additional work needs to be done in the distributed<br>

>>>>> build system itself to distribute and dispatch the parallel backend<br>

>>>>> jobs to the build cluster.<br>

>>>>><br>

>>>>><br>

>>>>> d. Dependence Tracking and Incremental Compiles<br>

>>>>><br>

>>>>><br>

>>>>> In order to support build systems that stage from local disks or<br>

>>>>> network storage, the plugin will optionally support computation of<br>

>>>>> dependent sets of IR files that each module may import from. This can<br>

>>>>> be computed from profile data, if it exists, or from the symbol table<br>

>>>>> and heuristics if not. These dependence sets also enable support for<br>

>>>>> incremental backend compiles.<br>

>>>>><br>

>>>>><br>

>>>>> ________________<br>

>>>>> [1] The following compilers currently wrap intermediate LTO files in<br>

>>>>> native object format: GCC fat and non-fat objects (with a custom<br>

>>>>> symtab), Intel icc non-fat (IR-only) objects (with a full native<br>

>>>>> symtab), HP’s aCC non-fat objects (with full native symtab), IBM xlC<br>

>>>>> both fat and non-fat objects (with full native symtab).<br>

>>>>> [2] The “thin archive” here (also referred to as a combined function<br>

>>>>> index) has some similarities to the AR tool thin archive format, but<br>

>>>>> is not exactly the same. Both contain the symtab and not the code, but<br>

>>>>> the ThinLTO combined function index contains the summary sections as<br>

>>>>> well.<br>

>>>>><br>

>>>>> --<br>

>>>>> Teresa Johnson | Software Engineer | <a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a> |<br>

>>>>> <a href="tel:408-460-2413" value="+14084602413" target="_blank">408-460-2413</a><br>

>>>>><br>

>>>>> _______________________________________________<br>

>>>>> LLVM Developers mailing list<br>

>>>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

>>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

>>><br>

>>><br>

>>><br>

>>> --<br>

>>> Teresa Johnson | Software Engineer | <a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a> | <a href="tel:408-460-2413" value="+14084602413" target="_blank">408-460-2413</a><br>

>>><br>

>>> _______________________________________________<br>

>>> LLVM Developers mailing list<br>

>>> <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

>><br>

>><br>

><br>

><br>

><br>

> --<br>

> Teresa Johnson | Software Engineer | <a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a> | <a href="tel:408-460-2413" value="+14084602413" target="_blank">408-460-2413</a><br>

<br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</div></div></blockquote></div></div></div><br></div></div>

<br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br></blockquote></div><br></div></div>