[LLVMdev] Updated RFC: ThinLTO Implementation Plan

Mon Jun 1 06:34:33 PDT 2015

On Fri, May 29, 2015 at 6:15 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
> On Fri, May 29, 2015 at 8:01 AM, Teresa Johnson <tejohnson at google.com>
> wrote:
>>
>> On Fri, May 29, 2015 at 6:56 AM, Alex Rosenberg <alexr at leftfield.org>
>> wrote:
>> > My earlier statement about wrapping things in a native object file held
>> > in that it is controversial. It appears to be still central to your design.
>> >
>> > It may help to look at the problem from a different viewpoint: LLVM is
>> > not a compiler. It is a framework that can be used to make compiler-like
>> > tools.
>> >
>> > From that view, it no longer makes sense to discuss "the plugin," or
>> > gold, or $AR, because there isn't just one of any of those things. ld64
>> > isn't the only outlier linker to consider. We have our own linker at Sony,
>> > for example. From this perspective, then it makes more sense to consider
>> > replacing the binary utilities with ones that support bitcode, because from
>> > a user-perspective, all of the linkers already transparently support bitcode
>> > directly today, as do ar, nm, etc. This has been necessary for the regular
>> > LTO process.
>>
>> Hi Alex,
>>
>> It's true that the LLVM versions of these tools support bitcode
>> transparently, but not all build systems use LLVM versions of these
>> tools, particularly build systems that support a variety of compilers,
>> or legacy build systems.
>
>
> If a build system can do
> CC=clang
> why wouldn't it be able to do
> AR=llvm-ar
> ?

That assumes that the LLVM tools are all deployed in the build system,
and adds a requirement for using clang in this mode that wasn't there
before when using clang for -O2. We are trying to make the transition
from clang -O2 to clang -O2 + ThinLTO as seamless as possible.

Teresa

>
> -- Sean Silva
>
>>
>> And not all build systems have the plugin or
>> currently pass it to the native tools that can take a plugin for
>> handling bitcode. In those cases the bitcode support is not
>> transparently available, and our aim is to reduce the friction as much
>> as possible. And not all use LTO currently (I know we don't due to the
>> scalability issues we're trying to address with this design), and in
>> those cases the migration to bitcode-aware tools and plugins was not
>> previously required.
>>
>> For Sony's linker, are you using the gold plugin or libLTO interfaces?
>> If the latter, I suppose some ThinLTO handling would have to be added
>> to your linker (e.g. to invoke the LLVM hooks to write the stage-2
>> combined function map and either launch the backend processes in
>> parallel or write out a make or other build file). The current support
>> for reading native object wrapped bitcode is baked into IRObjectFile
>> so presumably the Sony linker can handle these native object wrapped
>> bitcode files if it uses libLTO. We would similarly embed the handling
>> of the function index/summary behind an API that can handle either so
>> it is similarly transparent to the linkers. Let me know if there would
>> be additional issues that make wrapped bitcode more difficult in your
>> case, or how we could make ThinLTO usage simpler for you in general.
>>
>> >
>> > The only tool in the list of tools you mentioned that do not support
>> > bitcode directly is objcopy, and that's because nobody has yet written an
>> > LLVM-project implementation of it. Personally, I'd much rather you focus on
>> > making ThinLTO work by extending bitcode as needed, and we work as a
>> > community toward replacing objcopy with an LLVM-native one. It's a big
>> > missing piece of the LLVM project today and could be so much better if we
>> > could use it to replace Apple's lipo and possibly other extant object file
>> > modification tools. (Has anyone surveyed this area?)
>> >
>> > That older toolchains have tried to slip non-object file data through
>> > the binary utilities isn't really proof that this is a good choice. It might
>> > simply reflect the realities of those engineering teams. I wasn't at Sun for
>> > this, but DTrace needed a linker feature that apparently the Sun linker team
>> > was unwilling or unable to provide, so dtrace(1) gained the ability to
>> > modify ELF files directly as needed. That doesn't prove that DTrace's USDT
>> > feature shouldn't have been implemented in the linker (as ld64 does directly
>> > for Apple), does it?
>>
>> I'd argue that the realities being addressed by using native object
>> format in those cases still exist.
>>
>> >
>> > If in the end using native object-wrapped bitcode is the best solution,
>> > so be it. However, I think it is largely orthogonal to ThinLTO's needs for
>> > transporting symtab data alongside the existing bitcode format.
>>
>> That's certainly true, ThinLTO can be implemented using either format,
>> and bitcode only support can certainly be implemented. It is a matter
>> of prioritizing which format to implement first. I had added some
>> description to the updated RFC on how the function index/summary can
>> be represented, etc in bitcode. Prioritizing the native object format
>> doesn't make it easier to implement ThinLTO, but should make it easier
>> to deploy.
>>
>> Thanks!
>> Teresa
>>
>> >
>> > Alex
>> >
>> >> On May 28, 2015, at 2:10 PM, Teresa Johnson <tejohnson at google.com>
>> >> wrote:
>> >>
>> >> As promised, here is an new version of the ThinLTO RFC, updated based
>> >> on some of the comments, questions and feedback from the first RFC.
>> >> Hopefully we have addressed many of these, and as noted below, will
>> >> fork some of the detailed discussion on particular aspects into
>> >> separate design doc threads. Please send any additional feedback and
>> >> questions on the overall design.
>> >> Thanks!
>> >> Teresa
>> >>
>> >>
>> >> Updated RFC to discuss plans for implementing ThinLTO upstream,
>> >> reflecting feedback and discussion from initial RFC
>> >> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085557.html). As
>> >> discussed in the earlier thread and below, more detailed design
>> >> documents for several pieces (native object format, linkage type
>> >> changes and static promotions, etc) are in progress and will be sent
>> >> separately. This RFC covers the overall design and the breakdown of
>> >> work at a higher level.
>> >>
>> >>
>> >> Background on ThinLTO can be found in slides from EuroLLVM 2015:
>> >>
>> >> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
>> >> As described in the talk, we have a prototype implementation, and
>> >> would like to start staging patches upstream. This RFC describes a
>> >> breakdown of the major pieces. We would like to commit upstream
>> >> gradually in several stages, with all functionality off by default.
>> >> The core ThinLTO importing support and tuning will require frequent
>> >> change and iteration during testing and tuning, and for that part we
>> >> would like to commit rapidly (off by default). See the proposed staged
>> >> implementation described in the Implementation Plan section.
>> >>
>> >>
>> >> ThinLTO Overview
>> >> ==================
>> >>
>> >>
>> >> See the talk slides linked above for more details. The following is a
>> >> high-level overview of the motivation.
>> >>
>> >>
>> >> Cross Module Optimization (CMO) is an effective means for improving
>> >> runtime performance, by extending the scope of optimizations across
>> >> source module boundaries. Without CMO, the compiler is limited to
>> >> optimizing within the scope of single source modules. Two solutions
>> >> for enabling CMO are Link-Time Optimization (LTO), which is currently
>> >> supported in LLVM and GCC, and Lightweight-Interprocedural
>> >> Optimization (LIPO). However, each of these solutions has limitations
>> >> that prevent it from being enabled by default. ThinLTO is a new
>> >> approach that attempts to address these limitations, with a goal of
>> >> being enabled more broadly. ThinLTO is designed with many of the same
>> >> principals as LIPO, and therefore its advantages, without any of its
>> >> inherent weakness. Unlike in LIPO where the module group decision is
>> >> made at profile training runtime, ThinLTO makes the decision at
>> >> compile time, but in a lazy mode that facilitates large scale
>> >> parallelism. LTO implementations all contain a serial IPA/IPO step
>> >> that is both memory intensive and slow, limiting usability on both
>> >> smaller workstations and huge applications. In contrast, the ThinLTO
>> >> serial linker plugin phase is designed to be razor thin and blazingly
>> >> fast. By default this step only does minimal preparation work to
>> >> enable the parallel lazy importing performed later. ThinLTO aims to be
>> >> scalable like a regular O2 build, enabling CMO on machines without
>> >> large memory configurations, while also integrating well with
>> >> distributed build systems. Results from early prototyping on SPEC
>> >> cpu2006 C++ benchmarks are in line with expectations that ThinLTO can
>> >> scale like O2 while enabling much of the CMO performed during a full
>> >> LTO build.
>> >>
>> >>
>> >> A ThinLTO build is divided into 3 phases, which are referred to in the
>> >> following implementation plan:
>> >> 1. phase-1: IR and Function Summary Generation (-c compile)
>> >> 2. phase-2: Thin Linker Plugin Layer (thin archive linker step)
>> >> 3. phase-3: Parallel Backend with Demand-Driven Importing
>> >>
>> >>
>> >> Implementation Plan
>> >> ====================
>> >>
>> >>
>> >> This section gives a high-level breakdown of the ThinLTO support that
>> >> will be added, in roughly the order that the patches would be staged.
>> >> The patches are divided into three stages. The first stage contains a
>> >> minimal amount of preparation work that is not ThinLTO-specific. The
>> >> second stage contains most of the infrastructure for ThinLTO, which
>> >> will be off by default. The third stage includes
>> >> enhancements/improvements/tunings that can be performed after the main
>> >> ThinLTO infrastructure is in.
>> >>
>> >>
>> >> The second and third implementation stages will initially be very
>> >> volatile, requiring a lot of iterations and tuning with large apps to
>> >> get stabilized. Therefore it will be important to do fast commits for
>> >> these implementation stages.
>> >>
>> >>
>> >> 1. Stage 1: Preparation
>> >> ------------------------------------
>> >>
>> >>
>> >> The first planned sets of patches are enablers for ThinLTO work:
>> >>
>> >>
>> >> a. LTO directory structure
>> >>
>> >>
>> >> Restructure the LTO directory to remove circular dependence when
>> >> ThinLTO pass added. Because ThinLTO is being implemented as a SCC pass
>> >> within Transforms/IPO, and leverages the LTOModule class for linking
>> >> in functions from modules, IPO then requires the LTO library. This
>> >> creates a circular dependence between LTO and IPO. To break that, we
>> >> need to split the lib/LTO directory/library into lib/LTO/CodeGen and
>> >> lib/LTO/Module, containing LTOCodeGenerator and LTOModule,
>> >> respectively. Only LTOCodeGenerator has a dependence on IPO, removing
>> >> the circular dependence.
>> >>
>> >>
>> >> Note that libLTO and llvm-lto use LTOModule/LTOCodeGenerator, whereas
>> >> the gold plugin uses lib/Object/IRObject and lib/Linker directly. The
>> >> use of LTOModule in the ThinLTO pass is a convenience, but could be
>> >> avoided by using the IRObject/Linker methods directly if that is
>> >> preferred.
>> >>
>> >>
>> >> b. Native object wrapper generation support
>> >>
>> >>
>> >> Implement native-object wrapped bitcode writer. The main goal is to
>> >> more easily interact with existing native tools such as $AR, $NM, “$LD
>> >> -r”, $OBJCOPY, and $RANLIB, without requiring the build system to find
>> >> and pass the plugin as an option. We plan to emit the phase-1 bitcode
>> >> wrapped in native object format via the .llvmbc section, along with a
>> >> symbol table. We will implement ELF first, but subsequently extend
>> >> support to COFF and Mach-O. Additionally, we also want to avoid doing
>> >> partial LTO/ThinLTO across files linked with “$LD -r” (i.e. the
>> >> resulting object file should still contain native object-wrapped
>> >> bitcode to enable ThinLTO at the full link step). I will send a
>> >> separate design document for these changes, including the format of
>> >> the symtab and function index/summary section, but the following is a
>> >> high-level motivation and overview.
>> >>
>> >>
>> >> Note that support for ThinLTO using bitcode can be added as a
>> >> follow-on under an option, so that bitcode-aware tools do not need to
>> >> use the wrapper. Under the bitcode-only option, the symbol table will
>> >> be replaced by the bitcode form of the function index and summary
>> >> section, which can be encoded as a new bitcode block type. Changes
>> >> should be made to the gold plugin to avoid partial link of bitcode
>> >> files under “$LD -r” (emitting bitcode rather than compiling all the
>> >> way down to native code, which is how ld64 behaves on Darwin as per
>> >> dexonsmith).
>> >>
>> >>
>> >> Advantages of using native object format:
>> >> * Out of the box interoperability with existing native build tools
>> >> ($AR, $NM, “$LD -r”, $OBJCOPY, and $RANLIB) which may not currently
>> >> know how to locate/pass the appropriate plugin.
>> >> * There is precedence in using this format: other compilers also wrap
>> >> intermediate LTO files (probably related to the above advantage)[1].
>> >> * Tools that modify symbol linkage and visibility (e.g. $OBJCOPY and
>> >> “$LD -r”) can mark the change in the symbol table without needing to
>> >> parse/change/encode bitcode. The change can be propagated to bitcode
>> >> by the ThinLTO backend.
>> >> * Some tools only need to read/write the symtab and can avoid
>> >> parsing/encoding bitcode (e.g. $NM, $OBJCOPY).
>> >> * The second phase of ThinLTO does not need to parse the bitcode when
>> >> creating the combined function index.
>> >>
>> >>
>> >> Disadvantages of using native object format:
>> >> * Unnecessary when using plugins with plugin-aware native tools, or
>> >> LLVM’s custom tools.
>> >> * Slightly increase disk storage and I/O from symtab. However, with
>> >> our design the symtab is leveraged to hold function indexing info
>> >> required for ThinLTO. The I/O for some build tools and build steps can
>> >> actually be reduced as there is no need to read the bitcode, as
>> >> described above.
>> >>
>> >>
>> >> Support was added to LLVM for reading native object-wrapped bitcode
>> >> (http://reviews.llvm.org/rL218078), but there does not yet exist
>> >> support in LLVM/Clang for emitting bitcode wrapped in native object
>> >> format. I plan to add support for optionally generating bitcode in an
>> >> native object file containing a single .llvmbc section holding the
>> >> bitcode. Specifically, the patch would add new options
>> >> “emit-llvm-native-object” (object file) and corresponding
>> >> “emit-llvm-native-assembly” (textual assembly code equivalent).
>> >> Eventually these would be automatically triggered under “-fthinlto -c”
>> >> and “-fthinlto -S”, respectively.
>> >>
>> >>
>> >> Additionally, a symbol table will be generated in the native object
>> >> file, holding the function symbols within the bitcode. This
>> >> facilitates handling archives of the native object-wrapped bitcode
>> >> created with $AR, since the archive will have a symbol table as well.
>> >> The archive symbol table enables gold to extract and pass to the
>> >> plugin the constituent native object-wrapped bitcode files. To support
>> >> the concatenated llvmbc section generated by “$LD -r”, some handling
>> >> needs to be added to gold and to the backend driver to process each
>> >> original module’s bitcode.
>> >>
>> >>
>> >> The function index/summary will later be added as a special native
>> >> object section alongside the .llvmbc sections. The offset and size of
>> >> the corresponding function summary can be placed in the associated
>> >> symtab entry. As noted above, a separate design document will be sent
>> >> for the native object format changes.
>> >>
>> >>
>> >> 2. Stage 2: ThinLTO Infrastructure
>> >> ------------------------------------------------------
>> >>
>> >>
>> >> The next set of patches adds the base implementation of the ThinLTO
>> >> infrastructure, specifically those required to make ThinLTO functional
>> >> and generate correct but not necessarily high-performing binaries.
>> >>
>> >>
>> >> a. Clang/LLVM/gold linker options
>> >>
>> >>
>> >> An early set of clang/llvm patches is needed to provide options to
>> >> enable ThinLTO (off by default), so that the rest of the
>> >> implementation can be disabled by default as it is added.
>> >> Specifically, clang options -fthinlto (used instead of -flto) will
>> >> cause clang to invoke the phase-1 emission of LLVM bitcode and
>> >> function summary/index on a compile step, and pass the appropriate
>> >> option to the gold plugin on a link step. The -thinlto option will be
>> >> added to the gold plugin and llvm-lto tool to launch the phase-2 thin
>> >> archive step. The -thinlto-be option will also be added to clang to
>> >> invoke it as a phase-3 parallel backend instance with a bitcode file
>> >> as input.
>> >>
>> >>
>> >> b. Thin-archive linking support in Gold plugin and llvm-lto
>> >>
>> >>
>> >> Under the new plugin option (see above), the plugin needs to perform
>> >> the phase-2 (thin archive) link which simply emits a combined function
>> >> index from the linked modules, without actually performing the normal
>> >> link. Corresponding support should be added to the standalone llvm-lto
>> >> tool to enable testing/debugging without involving the linker and
>> >> plugin.
>> >>
>> >>
>> >> c. ThinLTO backend support
>> >>
>> >>
>> >> Support for invoking a phase-3 backend invocation (including
>> >> importing) on a module should be added to the clang driver under the
>> >> new option. The main change under the option is to instantiate a
>> >> Linker object used to manage the process of linking imported functions
>> >> into the module, efficient read of the combined function index, and
>> >> enable the ThinLTO import pass.
>> >>
>> >>
>> >> d. Function index/summary support
>> >>
>> >>
>> >> This includes infrastructure for writing and reading the function
>> >> index/summary section. As noted earlier this will be encoded in a
>> >> special section within the native object file for the module,
>> >> alongside the .llvmbc section containing the bitcode. The thin archive
>> >> (combined function index) generated by phase-2 of ThinLTO simply
>> >> contains all of the function index/summary sections across the linked
>> >> modules, organized for efficient function lookup. As mentioned earlier
>> >> when discussing the native object wrapper format, a separate design
>> >> document will be sent for this format.
>> >>
>> >>
>> >> Each function available for importing from the module contains an
>> >> entry in the module’s function index/summary section and in the
>> >> resulting combined function index. Each function entry contains that
>> >> function’s offset within the bitcode file, used to efficiently locate
>> >> and quickly import just that function (see below in 2e for more
>> >> details on the importing mechanics). The entry also contains summary
>> >> information (e.g. basic information determined during parsing such as
>> >> the number of instructions in the function), that will be used to help
>> >> guide later import decisions. Because the contents of this section
>> >> will change frequently during ThinLTO tuning, it should also be marked
>> >> with a version id for backwards compatibility or version checking.
>> >>
>> >>
>> >> e. ThinLTO importing support
>> >>
>> >>
>> >> Support for the mechanics of importing functions from other modules,
>> >> which can go in gradually as a set of patches since it will be off by
>> >> default (the ThinLTO pass itself discussed below in 2f).
>> >>
>> >>
>> >> Note that ThinLTO function importing is iterative, and we may import
>> >> from a number of modules in an interleaved fashion. For example,
>> >> assume we have hot call chains a()->b1()->c() and a()->b2()->d(),
>> >> where functions a(), b1()/b2(), c() and d() are from modules A, B, C
>> >> and D, respectively. When performing ThinLTO backend compilation of
>> >> module A, we may decide to import in the following order (based on
>> >> callsite and function summary info):
>> >> 1. B::b1()  # exposes call to c()
>> >> 2. C::c()
>> >> 3. B::b2()  # exposes call to d()
>> >> 4. D::d()
>> >> For this reason, ThinLTO importing is different than regular LTO
>> >> bitcode reading and linking, which reads and links in a module in its
>> >> entirety on a single pass through each module (notice in the above
>> >> example the imports of the two module B functions have an intervening
>> >> import from module C). As a result, for example, the existing support
>> >> for lazy metadata parsing that delays it until the first function is
>> >> materialized can’t be leveraged (metadata handling is discussed more
>> >> below in 2h). Therefore, the ThinLTO importing pass instantiates a new
>> >> BitcodeReader and LTOModule object for each function we decide to
>> >> import, parsing only what is needed and linking in just that function.
>> >> This is fast and efficient as found in the prototype results shown in
>> >> the linked EuroLLVM slides.
>> >>
>> >>
>> >> Separate patches can include:
>> >>
>> >>
>> >> * BitcodeReader changes to use function index to import/deserialize
>> >> single function of interest (small changes, leverages existing lazy
>> >> function streamer support). The declarations and other symbol table
>> >> info in the bitcode must be reloaded, but the bitcode parsing can stop
>> >> once the first function body is hit. We simply set up an entry in the
>> >> lazy streamer’s DeferredFunctionInfo function index map from the
>> >> bitcode index that was saved in the ThinLTO function summary (and
>> >> therefore don’t need to build up this function index structure through
>> >> repeated calls to RememberAndSkipFunctionBody via
>> >> FindFunctionInStream).
>> >> * Minor LTOModule changes to pass the ThinLTO function to import and
>> >> its index into bitcode reader (see 1a for discussion on LTOModule
>> >> use).
>> >> * Marking of imported functions. Most handling for ThinLTO imported
>> >> functions will simply rely on applying the appropriate linkage type.
>> >> But it is useful to know which functions were imported, both for
>> >> compiler debugging and and verification, and possibly to modify some
>> >> optimization heuristics along with the summary information. This can
>> >> be in-memory initially, but IR support may be required in order to
>> >> support streaming bitcode out and back in again after importing.
>> >> * ModuleLinker changes to do ThinLTO-specific symbol linking and
>> >> static promotion when necessary. The linkage type of imported
>> >> non-local functions and variables changes to
>> >> AvailableExternallyLinkage, for example. Statics must be promoted in
>> >> certain cases, and accordingly renamed in consistent ways. Read-write
>> >> or address-taken static variables must always be promoted. Other
>> >> discardable functions, i.e. link-once such as comdats, will be force
>> >> imported on reference by another imported function. We are working on
>> >> a separate design document describing these changes in more detail
>> >> with examples, as a more detailed discussion of these changes is
>> >> beyond the scope of this RFC.
>> >> * GlobalDCE changes to support removing imported non-local functions
>> >> that were not inlined and imported non-local variables, which are
>> >> marked AvailableExternallyLinkage (very small changes to existing pass
>> >> logic). As discussed in the original RFC threads, currently GlobalDCE
>> >> does not remove referenced AvailableExternallyLinkage functions.
>> >> Instead, these are suppressed later during code generation. It isn’t
>> >> clear that these functions are useful past the first call to
>> >> GlobalDCE, which is after inlining, GlobalOpt and IPSCCP (so
>> >> presumably after inter procedural constant prop, etc). Patch with
>> >> these changes in testing as discussed in this thread:
>> >> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085807.html.
>> >>
>> >>
>> >> f. ThinLTO Import Driver SCC pass
>> >>
>> >>
>> >> Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via
>> >> an SCC pass, enabled only under the -fthinlto-be option. The pass
>> >> includes utilizing the thin archive[2] (combined global function
>> >> index/summary), import decision heuristics, invocation of
>> >> LTOModule/ModuleLinker routines that perform the import, and any
>> >> necessary callgraph updates and verification.
>> >>
>> >>
>> >> g. Backend Driver
>> >>
>> >>
>> >> For a single node build, the gold plugin will initially exec the
>> >> backend processes directly, with the amount of parallelism controlled
>> >> via an option and/or env variable. It is also possible to leverage
>> >> existing single node build system task dispatching mechanisms such as
>> >> Unix Makefiles, Ninja, etc., where the plugin can simply write a build
>> >> file and fork the parallel backend instances directly under an
>> >> appropriate option. We will also initially add support for our
>> >> distributed build system as described below under 3c.
>> >>
>> >>
>> >> h. Lazy Debug Metadata Linking
>> >>
>> >>
>> >> The prototype implementation included lazy importing of module-level
>> >> metadata during the ThinLTO pass finalization (i.e. after all function
>> >> importing is complete). This actually applies to all module-level
>> >> metadata, not just debug, although it is the largest. This can be
>> >> added as a separate set of patches, and the detailed design will be
>> >> sent with those. Includes changes to BitcodeReader, ValueMapper, and
>> >> the ModuleLinker classes. As described in 2e, due to the
>> >> iterative/interleaved nature of ThinLTO importing, the bitcode parsing
>> >> is structured differently than LTO where a single pass over each
>> >> module can be performed to parse and materialize all functions and
>> >> metadata. Therefore, the lazy metadata parsing support in
>> >> BitcodeReader, which parses all the metadata once the first function
>> >> is materialized, are not applicable. We may instantiate a
>> >> BitcodeReader multiple times for a module, if multiple functions are
>> >> eventually imported, and we need a way to suture up the metadata to
>> >> the functions imported by an earlier BitcodeReader instantiation. The
>> >> high level summary is that during the initial import we leave the
>> >> temporary metadata on the instructions that were imported, but save
>> >> the index used by the bitcode reader used to correlate with the
>> >> metadata when it is ready (i.e. the MDValuePtrs index), and skip the
>> >> metadata parsing. During the ThinLTO pass finalization we parse just
>> >> the metadata, and suture it up during metadata value mapping using the
>> >> saved index. As mentioned earlier, this will be described in more
>> >> detail when the patches are ready.
>> >>
>> >>
>> >> 3. Stage 3: ThinLTO Tuning and Enhancements
>> >>
>> >> -------------------------------------------------------------------------
>> >>
>> >>
>> >> This refers to the patches that are not required for ThinLTO to work,
>> >> but rather to improve compile time, memory, run-time performance and
>> >> usability.
>> >>
>> >>
>> >> a. Import Tuning
>> >>
>> >>
>> >> Tuning the import strategy will be an iterative process that will
>> >> continue to be refined over time. It involves several different types
>> >> of changes: adding support for recording additional metrics in the
>> >> function summary, such as profile data and optional heavier-weight IPA
>> >> analyses, and tuning the import heuristics based on the summary and
>> >> callsite context.
>> >>
>> >>
>> >> b. Combined Function Index Pruning
>> >>
>> >>
>> >> The combined function index can be pruned of functions that are
>> >> unlikely to benefit from being imported. For example, during the
>> >> phase-2 thin archive plug step we can safely omit large and (with
>> >> profile data) cold functions, which are unlikely to benefit from being
>> >> inlined. Additionally, all but one copy of comdat functions can be
>> >> suppressed.
>> >>
>> >>
>> >> c. Distributed Build System Integration
>> >>
>> >>
>> >> For a distributed build system such as Bazel (http://bazel.io/), the
>> >> gold plugin should write the parallel backend invocations into a build
>> >> file, including the mapping from the IR file to the real object file
>> >> path, and exit. Additional work needs to be done in the distributed
>> >> build system itself to distribute and dispatch the parallel backend
>> >> jobs to the build cluster.
>> >>
>> >>
>> >> d. Dependence Tracking and Incremental Compiles
>> >>
>> >>
>> >> In order to support build systems that stage from local disks or
>> >> network storage, the plugin will optionally support computation of
>> >> dependent sets of IR files that each module may import from. This can
>> >> be computed from profile data, if it exists, or from the symbol table
>> >> and heuristics if not. These dependence sets also enable support for
>> >> incremental backend compiles.
>> >>
>> >>
>> >> ________________
>> >> [1] The following compilers currently wrap intermediate LTO files in
>> >> native object format: GCC fat and non-fat objects (with a custom
>> >> symtab), Intel icc non-fat (IR-only) objects (with a full native
>> >> symtab), HP’s aCC non-fat objects (with full native symtab), IBM xlC
>> >> both fat and non-fat objects (with full native symtab).
>> >> [2] The “thin archive” here (also referred to as a combined function
>> >> index) has some similarities to the AR tool thin archive format, but
>> >> is not exactly the same. Both contain the symtab and not the code, but
>> >> the ThinLTO combined function index contains the summary sections as
>> >> well.
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> >> 408-460-2413
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413