[LLVMdev] Updated RFC: ThinLTO Implementation Plan

Wed Jun 3 10:51:40 PDT 2015

>
>
> Last I heard, llvm-ar is much faster than the GNU ar, etc.
>
>
They are not fully feature compatible. For instance, support of thin
archive is missing (not that it is hard to implement it though).

David


> Alex
>
> > Teresa
> >
> >>
> >> -- Sean Silva
> >>
> >>>
> >>> And not all build systems have the plugin or
> >>> currently pass it to the native tools that can take a plugin for
> >>> handling bitcode. In those cases the bitcode support is not
> >>> transparently available, and our aim is to reduce the friction as much
> >>> as possible. And not all use LTO currently (I know we don't due to the
> >>> scalability issues we're trying to address with this design), and in
> >>> those cases the migration to bitcode-aware tools and plugins was not
> >>> previously required.
> >>>
> >>> For Sony's linker, are you using the gold plugin or libLTO interfaces?
> >>> If the latter, I suppose some ThinLTO handling would have to be added
> >>> to your linker (e.g. to invoke the LLVM hooks to write the stage-2
> >>> combined function map and either launch the backend processes in
> >>> parallel or write out a make or other build file). The current support
> >>> for reading native object wrapped bitcode is baked into IRObjectFile
> >>> so presumably the Sony linker can handle these native object wrapped
> >>> bitcode files if it uses libLTO. We would similarly embed the handling
> >>> of the function index/summary behind an API that can handle either so
> >>> it is similarly transparent to the linkers. Let me know if there would
> >>> be additional issues that make wrapped bitcode more difficult in your
> >>> case, or how we could make ThinLTO usage simpler for you in general.
> >>>
> >>>>
> >>>> The only tool in the list of tools you mentioned that do not support
> >>>> bitcode directly is objcopy, and that's because nobody has yet
> written an
> >>>> LLVM-project implementation of it. Personally, I'd much rather you
> focus on
> >>>> making ThinLTO work by extending bitcode as needed, and we work as a
> >>>> community toward replacing objcopy with an LLVM-native one. It's a big
> >>>> missing piece of the LLVM project today and could be so much better
> if we
> >>>> could use it to replace Apple's lipo and possibly other extant object
> file
> >>>> modification tools. (Has anyone surveyed this area?)
> >>>>
> >>>> That older toolchains have tried to slip non-object file data through
> >>>> the binary utilities isn't really proof that this is a good choice.
> It might
> >>>> simply reflect the realities of those engineering teams. I wasn't at
> Sun for
> >>>> this, but DTrace needed a linker feature that apparently the Sun
> linker team
> >>>> was unwilling or unable to provide, so dtrace(1) gained the ability to
> >>>> modify ELF files directly as needed. That doesn't prove that DTrace's
> USDT
> >>>> feature shouldn't have been implemented in the linker (as ld64 does
> directly
> >>>> for Apple), does it?
> >>>
> >>> I'd argue that the realities being addressed by using native object
> >>> format in those cases still exist.
> >>>
> >>>>
> >>>> If in the end using native object-wrapped bitcode is the best
> solution,
> >>>> so be it. However, I think it is largely orthogonal to ThinLTO's
> needs for
> >>>> transporting symtab data alongside the existing bitcode format.
> >>>
> >>> That's certainly true, ThinLTO can be implemented using either format,
> >>> and bitcode only support can certainly be implemented. It is a matter
> >>> of prioritizing which format to implement first. I had added some
> >>> description to the updated RFC on how the function index/summary can
> >>> be represented, etc in bitcode. Prioritizing the native object format
> >>> doesn't make it easier to implement ThinLTO, but should make it easier
> >>> to deploy.
> >>>
> >>> Thanks!
> >>> Teresa
> >>>
> >>>>
> >>>> Alex
> >>>>
> >>>>> On May 28, 2015, at 2:10 PM, Teresa Johnson <tejohnson at google.com>
> >>>>> wrote:
> >>>>>
> >>>>> As promised, here is an new version of the ThinLTO RFC, updated based
> >>>>> on some of the comments, questions and feedback from the first RFC.
> >>>>> Hopefully we have addressed many of these, and as noted below, will
> >>>>> fork some of the detailed discussion on particular aspects into
> >>>>> separate design doc threads. Please send any additional feedback and
> >>>>> questions on the overall design.
> >>>>> Thanks!
> >>>>> Teresa
> >>>>>
> >>>>>
> >>>>> Updated RFC to discuss plans for implementing ThinLTO upstream,
> >>>>> reflecting feedback and discussion from initial RFC
> >>>>> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085557.html).
> As
> >>>>> discussed in the earlier thread and below, more detailed design
> >>>>> documents for several pieces (native object format, linkage type
> >>>>> changes and static promotions, etc) are in progress and will be sent
> >>>>> separately. This RFC covers the overall design and the breakdown of
> >>>>> work at a higher level.
> >>>>>
> >>>>>
> >>>>> Background on ThinLTO can be found in slides from EuroLLVM 2015:
> >>>>>
> >>>>>
> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
> >>>>> As described in the talk, we have a prototype implementation, and
> >>>>> would like to start staging patches upstream. This RFC describes a
> >>>>> breakdown of the major pieces. We would like to commit upstream
> >>>>> gradually in several stages, with all functionality off by default.
> >>>>> The core ThinLTO importing support and tuning will require frequent
> >>>>> change and iteration during testing and tuning, and for that part we
> >>>>> would like to commit rapidly (off by default). See the proposed
> staged
> >>>>> implementation described in the Implementation Plan section.
> >>>>>
> >>>>>
> >>>>> ThinLTO Overview
> >>>>> ==================
> >>>>>
> >>>>>
> >>>>> See the talk slides linked above for more details. The following is a
> >>>>> high-level overview of the motivation.
> >>>>>
> >>>>>
> >>>>> Cross Module Optimization (CMO) is an effective means for improving
> >>>>> runtime performance, by extending the scope of optimizations across
> >>>>> source module boundaries. Without CMO, the compiler is limited to
> >>>>> optimizing within the scope of single source modules. Two solutions
> >>>>> for enabling CMO are Link-Time Optimization (LTO), which is currently
> >>>>> supported in LLVM and GCC, and Lightweight-Interprocedural
> >>>>> Optimization (LIPO). However, each of these solutions has limitations
> >>>>> that prevent it from being enabled by default. ThinLTO is a new
> >>>>> approach that attempts to address these limitations, with a goal of
> >>>>> being enabled more broadly. ThinLTO is designed with many of the same
> >>>>> principals as LIPO, and therefore its advantages, without any of its
> >>>>> inherent weakness. Unlike in LIPO where the module group decision is
> >>>>> made at profile training runtime, ThinLTO makes the decision at
> >>>>> compile time, but in a lazy mode that facilitates large scale
> >>>>> parallelism. LTO implementations all contain a serial IPA/IPO step
> >>>>> that is both memory intensive and slow, limiting usability on both
> >>>>> smaller workstations and huge applications. In contrast, the ThinLTO
> >>>>> serial linker plugin phase is designed to be razor thin and blazingly
> >>>>> fast. By default this step only does minimal preparation work to
> >>>>> enable the parallel lazy importing performed later. ThinLTO aims to
> be
> >>>>> scalable like a regular O2 build, enabling CMO on machines without
> >>>>> large memory configurations, while also integrating well with
> >>>>> distributed build systems. Results from early prototyping on SPEC
> >>>>> cpu2006 C++ benchmarks are in line with expectations that ThinLTO can
> >>>>> scale like O2 while enabling much of the CMO performed during a full
> >>>>> LTO build.
> >>>>>
> >>>>>
> >>>>> A ThinLTO build is divided into 3 phases, which are referred to in
> the
> >>>>> following implementation plan:
> >>>>> 1. phase-1: IR and Function Summary Generation (-c compile)
> >>>>> 2. phase-2: Thin Linker Plugin Layer (thin archive linker step)
> >>>>> 3. phase-3: Parallel Backend with Demand-Driven Importing
> >>>>>
> >>>>>
> >>>>> Implementation Plan
> >>>>> ====================
> >>>>>
> >>>>>
> >>>>> This section gives a high-level breakdown of the ThinLTO support that
> >>>>> will be added, in roughly the order that the patches would be staged.
> >>>>> The patches are divided into three stages. The first stage contains a
> >>>>> minimal amount of preparation work that is not ThinLTO-specific. The
> >>>>> second stage contains most of the infrastructure for ThinLTO, which
> >>>>> will be off by default. The third stage includes
> >>>>> enhancements/improvements/tunings that can be performed after the
> main
> >>>>> ThinLTO infrastructure is in.
> >>>>>
> >>>>>
> >>>>> The second and third implementation stages will initially be very
> >>>>> volatile, requiring a lot of iterations and tuning with large apps to
> >>>>> get stabilized. Therefore it will be important to do fast commits for
> >>>>> these implementation stages.
> >>>>>
> >>>>>
> >>>>> 1. Stage 1: Preparation
> >>>>> ------------------------------------
> >>>>>
> >>>>>
> >>>>> The first planned sets of patches are enablers for ThinLTO work:
> >>>>>
> >>>>>
> >>>>> a. LTO directory structure
> >>>>>
> >>>>>
> >>>>> Restructure the LTO directory to remove circular dependence when
> >>>>> ThinLTO pass added. Because ThinLTO is being implemented as a SCC
> pass
> >>>>> within Transforms/IPO, and leverages the LTOModule class for linking
> >>>>> in functions from modules, IPO then requires the LTO library. This
> >>>>> creates a circular dependence between LTO and IPO. To break that, we
> >>>>> need to split the lib/LTO directory/library into lib/LTO/CodeGen and
> >>>>> lib/LTO/Module, containing LTOCodeGenerator and LTOModule,
> >>>>> respectively. Only LTOCodeGenerator has a dependence on IPO, removing
> >>>>> the circular dependence.
> >>>>>
> >>>>>
> >>>>> Note that libLTO and llvm-lto use LTOModule/LTOCodeGenerator, whereas
> >>>>> the gold plugin uses lib/Object/IRObject and lib/Linker directly. The
> >>>>> use of LTOModule in the ThinLTO pass is a convenience, but could be
> >>>>> avoided by using the IRObject/Linker methods directly if that is
> >>>>> preferred.
> >>>>>
> >>>>>
> >>>>> b. Native object wrapper generation support
> >>>>>
> >>>>>
> >>>>> Implement native-object wrapped bitcode writer. The main goal is to
> >>>>> more easily interact with existing native tools such as $AR, $NM,
> “$LD
> >>>>> -r”, $OBJCOPY, and $RANLIB, without requiring the build system to
> find
> >>>>> and pass the plugin as an option. We plan to emit the phase-1 bitcode
> >>>>> wrapped in native object format via the .llvmbc section, along with a
> >>>>> symbol table. We will implement ELF first, but subsequently extend
> >>>>> support to COFF and Mach-O. Additionally, we also want to avoid doing
> >>>>> partial LTO/ThinLTO across files linked with “$LD -r” (i.e. the
> >>>>> resulting object file should still contain native object-wrapped
> >>>>> bitcode to enable ThinLTO at the full link step). I will send a
> >>>>> separate design document for these changes, including the format of
> >>>>> the symtab and function index/summary section, but the following is a
> >>>>> high-level motivation and overview.
> >>>>>
> >>>>>
> >>>>> Note that support for ThinLTO using bitcode can be added as a
> >>>>> follow-on under an option, so that bitcode-aware tools do not need to
> >>>>> use the wrapper. Under the bitcode-only option, the symbol table will
> >>>>> be replaced by the bitcode form of the function index and summary
> >>>>> section, which can be encoded as a new bitcode block type. Changes
> >>>>> should be made to the gold plugin to avoid partial link of bitcode
> >>>>> files under “$LD -r” (emitting bitcode rather than compiling all the
> >>>>> way down to native code, which is how ld64 behaves on Darwin as per
> >>>>> dexonsmith).
> >>>>>
> >>>>>
> >>>>> Advantages of using native object format:
> >>>>> * Out of the box interoperability with existing native build tools
> >>>>> ($AR, $NM, “$LD -r”, $OBJCOPY, and $RANLIB) which may not currently
> >>>>> know how to locate/pass the appropriate plugin.
> >>>>> * There is precedence in using this format: other compilers also wrap
> >>>>> intermediate LTO files (probably related to the above advantage)[1].
> >>>>> * Tools that modify symbol linkage and visibility (e.g. $OBJCOPY and
> >>>>> “$LD -r”) can mark the change in the symbol table without needing to
> >>>>> parse/change/encode bitcode. The change can be propagated to bitcode
> >>>>> by the ThinLTO backend.
> >>>>> * Some tools only need to read/write the symtab and can avoid
> >>>>> parsing/encoding bitcode (e.g. $NM, $OBJCOPY).
> >>>>> * The second phase of ThinLTO does not need to parse the bitcode when
> >>>>> creating the combined function index.
> >>>>>
> >>>>>
> >>>>> Disadvantages of using native object format:
> >>>>> * Unnecessary when using plugins with plugin-aware native tools, or
> >>>>> LLVM’s custom tools.
> >>>>> * Slightly increase disk storage and I/O from symtab. However, with
> >>>>> our design the symtab is leveraged to hold function indexing info
> >>>>> required for ThinLTO. The I/O for some build tools and build steps
> can
> >>>>> actually be reduced as there is no need to read the bitcode, as
> >>>>> described above.
> >>>>>
> >>>>>
> >>>>> Support was added to LLVM for reading native object-wrapped bitcode
> >>>>> (http://reviews.llvm.org/rL218078), but there does not yet exist
> >>>>> support in LLVM/Clang for emitting bitcode wrapped in native object
> >>>>> format. I plan to add support for optionally generating bitcode in an
> >>>>> native object file containing a single .llvmbc section holding the
> >>>>> bitcode. Specifically, the patch would add new options
> >>>>> “emit-llvm-native-object” (object file) and corresponding
> >>>>> “emit-llvm-native-assembly” (textual assembly code equivalent).
> >>>>> Eventually these would be automatically triggered under “-fthinlto
> -c”
> >>>>> and “-fthinlto -S”, respectively.
> >>>>>
> >>>>>
> >>>>> Additionally, a symbol table will be generated in the native object
> >>>>> file, holding the function symbols within the bitcode. This
> >>>>> facilitates handling archives of the native object-wrapped bitcode
> >>>>> created with $AR, since the archive will have a symbol table as well.
> >>>>> The archive symbol table enables gold to extract and pass to the
> >>>>> plugin the constituent native object-wrapped bitcode files. To
> support
> >>>>> the concatenated llvmbc section generated by “$LD -r”, some handling
> >>>>> needs to be added to gold and to the backend driver to process each
> >>>>> original module’s bitcode.
> >>>>>
> >>>>>
> >>>>> The function index/summary will later be added as a special native
> >>>>> object section alongside the .llvmbc sections. The offset and size of
> >>>>> the corresponding function summary can be placed in the associated
> >>>>> symtab entry. As noted above, a separate design document will be sent
> >>>>> for the native object format changes.
> >>>>>
> >>>>>
> >>>>> 2. Stage 2: ThinLTO Infrastructure
> >>>>> ------------------------------------------------------
> >>>>>
> >>>>>
> >>>>> The next set of patches adds the base implementation of the ThinLTO
> >>>>> infrastructure, specifically those required to make ThinLTO
> functional
> >>>>> and generate correct but not necessarily high-performing binaries.
> >>>>>
> >>>>>
> >>>>> a. Clang/LLVM/gold linker options
> >>>>>
> >>>>>
> >>>>> An early set of clang/llvm patches is needed to provide options to
> >>>>> enable ThinLTO (off by default), so that the rest of the
> >>>>> implementation can be disabled by default as it is added.
> >>>>> Specifically, clang options -fthinlto (used instead of -flto) will
> >>>>> cause clang to invoke the phase-1 emission of LLVM bitcode and
> >>>>> function summary/index on a compile step, and pass the appropriate
> >>>>> option to the gold plugin on a link step. The -thinlto option will be
> >>>>> added to the gold plugin and llvm-lto tool to launch the phase-2 thin
> >>>>> archive step. The -thinlto-be option will also be added to clang to
> >>>>> invoke it as a phase-3 parallel backend instance with a bitcode file
> >>>>> as input.
> >>>>>
> >>>>>
> >>>>> b. Thin-archive linking support in Gold plugin and llvm-lto
> >>>>>
> >>>>>
> >>>>> Under the new plugin option (see above), the plugin needs to perform
> >>>>> the phase-2 (thin archive) link which simply emits a combined
> function
> >>>>> index from the linked modules, without actually performing the normal
> >>>>> link. Corresponding support should be added to the standalone
> llvm-lto
> >>>>> tool to enable testing/debugging without involving the linker and
> >>>>> plugin.
> >>>>>
> >>>>>
> >>>>> c. ThinLTO backend support
> >>>>>
> >>>>>
> >>>>> Support for invoking a phase-3 backend invocation (including
> >>>>> importing) on a module should be added to the clang driver under the
> >>>>> new option. The main change under the option is to instantiate a
> >>>>> Linker object used to manage the process of linking imported
> functions
> >>>>> into the module, efficient read of the combined function index, and
> >>>>> enable the ThinLTO import pass.
> >>>>>
> >>>>>
> >>>>> d. Function index/summary support
> >>>>>
> >>>>>
> >>>>> This includes infrastructure for writing and reading the function
> >>>>> index/summary section. As noted earlier this will be encoded in a
> >>>>> special section within the native object file for the module,
> >>>>> alongside the .llvmbc section containing the bitcode. The thin
> archive
> >>>>> (combined function index) generated by phase-2 of ThinLTO simply
> >>>>> contains all of the function index/summary sections across the linked
> >>>>> modules, organized for efficient function lookup. As mentioned
> earlier
> >>>>> when discussing the native object wrapper format, a separate design
> >>>>> document will be sent for this format.
> >>>>>
> >>>>>
> >>>>> Each function available for importing from the module contains an
> >>>>> entry in the module’s function index/summary section and in the
> >>>>> resulting combined function index. Each function entry contains that
> >>>>> function’s offset within the bitcode file, used to efficiently locate
> >>>>> and quickly import just that function (see below in 2e for more
> >>>>> details on the importing mechanics). The entry also contains summary
> >>>>> information (e.g. basic information determined during parsing such as
> >>>>> the number of instructions in the function), that will be used to
> help
> >>>>> guide later import decisions. Because the contents of this section
> >>>>> will change frequently during ThinLTO tuning, it should also be
> marked
> >>>>> with a version id for backwards compatibility or version checking.
> >>>>>
> >>>>>
> >>>>> e. ThinLTO importing support
> >>>>>
> >>>>>
> >>>>> Support for the mechanics of importing functions from other modules,
> >>>>> which can go in gradually as a set of patches since it will be off by
> >>>>> default (the ThinLTO pass itself discussed below in 2f).
> >>>>>
> >>>>>
> >>>>> Note that ThinLTO function importing is iterative, and we may import
> >>>>> from a number of modules in an interleaved fashion. For example,
> >>>>> assume we have hot call chains a()->b1()->c() and a()->b2()->d(),
> >>>>> where functions a(), b1()/b2(), c() and d() are from modules A, B, C
> >>>>> and D, respectively. When performing ThinLTO backend compilation of
> >>>>> module A, we may decide to import in the following order (based on
> >>>>> callsite and function summary info):
> >>>>> 1. B::b1()  # exposes call to c()
> >>>>> 2. C::c()
> >>>>> 3. B::b2()  # exposes call to d()
> >>>>> 4. D::d()
> >>>>> For this reason, ThinLTO importing is different than regular LTO
> >>>>> bitcode reading and linking, which reads and links in a module in its
> >>>>> entirety on a single pass through each module (notice in the above
> >>>>> example the imports of the two module B functions have an intervening
> >>>>> import from module C). As a result, for example, the existing support
> >>>>> for lazy metadata parsing that delays it until the first function is
> >>>>> materialized can’t be leveraged (metadata handling is discussed more
> >>>>> below in 2h). Therefore, the ThinLTO importing pass instantiates a
> new
> >>>>> BitcodeReader and LTOModule object for each function we decide to
> >>>>> import, parsing only what is needed and linking in just that
> function.
> >>>>> This is fast and efficient as found in the prototype results shown in
> >>>>> the linked EuroLLVM slides.
> >>>>>
> >>>>>
> >>>>> Separate patches can include:
> >>>>>
> >>>>>
> >>>>> * BitcodeReader changes to use function index to import/deserialize
> >>>>> single function of interest (small changes, leverages existing lazy
> >>>>> function streamer support). The declarations and other symbol table
> >>>>> info in the bitcode must be reloaded, but the bitcode parsing can
> stop
> >>>>> once the first function body is hit. We simply set up an entry in the
> >>>>> lazy streamer’s DeferredFunctionInfo function index map from the
> >>>>> bitcode index that was saved in the ThinLTO function summary (and
> >>>>> therefore don’t need to build up this function index structure
> through
> >>>>> repeated calls to RememberAndSkipFunctionBody via
> >>>>> FindFunctionInStream).
> >>>>> * Minor LTOModule changes to pass the ThinLTO function to import and
> >>>>> its index into bitcode reader (see 1a for discussion on LTOModule
> >>>>> use).
> >>>>> * Marking of imported functions. Most handling for ThinLTO imported
> >>>>> functions will simply rely on applying the appropriate linkage type.
> >>>>> But it is useful to know which functions were imported, both for
> >>>>> compiler debugging and and verification, and possibly to modify some
> >>>>> optimization heuristics along with the summary information. This can
> >>>>> be in-memory initially, but IR support may be required in order to
> >>>>> support streaming bitcode out and back in again after importing.
> >>>>> * ModuleLinker changes to do ThinLTO-specific symbol linking and
> >>>>> static promotion when necessary. The linkage type of imported
> >>>>> non-local functions and variables changes to
> >>>>> AvailableExternallyLinkage, for example. Statics must be promoted in
> >>>>> certain cases, and accordingly renamed in consistent ways. Read-write
> >>>>> or address-taken static variables must always be promoted. Other
> >>>>> discardable functions, i.e. link-once such as comdats, will be force
> >>>>> imported on reference by another imported function. We are working on
> >>>>> a separate design document describing these changes in more detail
> >>>>> with examples, as a more detailed discussion of these changes is
> >>>>> beyond the scope of this RFC.
> >>>>> * GlobalDCE changes to support removing imported non-local functions
> >>>>> that were not inlined and imported non-local variables, which are
> >>>>> marked AvailableExternallyLinkage (very small changes to existing
> pass
> >>>>> logic). As discussed in the original RFC threads, currently GlobalDCE
> >>>>> does not remove referenced AvailableExternallyLinkage functions.
> >>>>> Instead, these are suppressed later during code generation. It isn’t
> >>>>> clear that these functions are useful past the first call to
> >>>>> GlobalDCE, which is after inlining, GlobalOpt and IPSCCP (so
> >>>>> presumably after inter procedural constant prop, etc). Patch with
> >>>>> these changes in testing as discussed in this thread:
> >>>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085807.html.
> >>>>>
> >>>>>
> >>>>> f. ThinLTO Import Driver SCC pass
> >>>>>
> >>>>>
> >>>>> Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via
> >>>>> an SCC pass, enabled only under the -fthinlto-be option. The pass
> >>>>> includes utilizing the thin archive[2] (combined global function
> >>>>> index/summary), import decision heuristics, invocation of
> >>>>> LTOModule/ModuleLinker routines that perform the import, and any
> >>>>> necessary callgraph updates and verification.
> >>>>>
> >>>>>
> >>>>> g. Backend Driver
> >>>>>
> >>>>>
> >>>>> For a single node build, the gold plugin will initially exec the
> >>>>> backend processes directly, with the amount of parallelism controlled
> >>>>> via an option and/or env variable. It is also possible to leverage
> >>>>> existing single node build system task dispatching mechanisms such as
> >>>>> Unix Makefiles, Ninja, etc., where the plugin can simply write a
> build
> >>>>> file and fork the parallel backend instances directly under an
> >>>>> appropriate option. We will also initially add support for our
> >>>>> distributed build system as described below under 3c.
> >>>>>
> >>>>>
> >>>>> h. Lazy Debug Metadata Linking
> >>>>>
> >>>>>
> >>>>> The prototype implementation included lazy importing of module-level
> >>>>> metadata during the ThinLTO pass finalization (i.e. after all
> function
> >>>>> importing is complete). This actually applies to all module-level
> >>>>> metadata, not just debug, although it is the largest. This can be
> >>>>> added as a separate set of patches, and the detailed design will be
> >>>>> sent with those. Includes changes to BitcodeReader, ValueMapper, and
> >>>>> the ModuleLinker classes. As described in 2e, due to the
> >>>>> iterative/interleaved nature of ThinLTO importing, the bitcode
> parsing
> >>>>> is structured differently than LTO where a single pass over each
> >>>>> module can be performed to parse and materialize all functions and
> >>>>> metadata. Therefore, the lazy metadata parsing support in
> >>>>> BitcodeReader, which parses all the metadata once the first function
> >>>>> is materialized, are not applicable. We may instantiate a
> >>>>> BitcodeReader multiple times for a module, if multiple functions are
> >>>>> eventually imported, and we need a way to suture up the metadata to
> >>>>> the functions imported by an earlier BitcodeReader instantiation. The
> >>>>> high level summary is that during the initial import we leave the
> >>>>> temporary metadata on the instructions that were imported, but save
> >>>>> the index used by the bitcode reader used to correlate with the
> >>>>> metadata when it is ready (i.e. the MDValuePtrs index), and skip the
> >>>>> metadata parsing. During the ThinLTO pass finalization we parse just
> >>>>> the metadata, and suture it up during metadata value mapping using
> the
> >>>>> saved index. As mentioned earlier, this will be described in more
> >>>>> detail when the patches are ready.
> >>>>>
> >>>>>
> >>>>> 3. Stage 3: ThinLTO Tuning and Enhancements
> >>>>>
> >>>>>
> -------------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>> This refers to the patches that are not required for ThinLTO to work,
> >>>>> but rather to improve compile time, memory, run-time performance and
> >>>>> usability.
> >>>>>
> >>>>>
> >>>>> a. Import Tuning
> >>>>>
> >>>>>
> >>>>> Tuning the import strategy will be an iterative process that will
> >>>>> continue to be refined over time. It involves several different types
> >>>>> of changes: adding support for recording additional metrics in the
> >>>>> function summary, such as profile data and optional heavier-weight
> IPA
> >>>>> analyses, and tuning the import heuristics based on the summary and
> >>>>> callsite context.
> >>>>>
> >>>>>
> >>>>> b. Combined Function Index Pruning
> >>>>>
> >>>>>
> >>>>> The combined function index can be pruned of functions that are
> >>>>> unlikely to benefit from being imported. For example, during the
> >>>>> phase-2 thin archive plug step we can safely omit large and (with
> >>>>> profile data) cold functions, which are unlikely to benefit from
> being
> >>>>> inlined. Additionally, all but one copy of comdat functions can be
> >>>>> suppressed.
> >>>>>
> >>>>>
> >>>>> c. Distributed Build System Integration
> >>>>>
> >>>>>
> >>>>> For a distributed build system such as Bazel (http://bazel.io/), the
> >>>>> gold plugin should write the parallel backend invocations into a
> build
> >>>>> file, including the mapping from the IR file to the real object file
> >>>>> path, and exit. Additional work needs to be done in the distributed
> >>>>> build system itself to distribute and dispatch the parallel backend
> >>>>> jobs to the build cluster.
> >>>>>
> >>>>>
> >>>>> d. Dependence Tracking and Incremental Compiles
> >>>>>
> >>>>>
> >>>>> In order to support build systems that stage from local disks or
> >>>>> network storage, the plugin will optionally support computation of
> >>>>> dependent sets of IR files that each module may import from. This can
> >>>>> be computed from profile data, if it exists, or from the symbol table
> >>>>> and heuristics if not. These dependence sets also enable support for
> >>>>> incremental backend compiles.
> >>>>>
> >>>>>
> >>>>> ________________
> >>>>> [1] The following compilers currently wrap intermediate LTO files in
> >>>>> native object format: GCC fat and non-fat objects (with a custom
> >>>>> symtab), Intel icc non-fat (IR-only) objects (with a full native
> >>>>> symtab), HP’s aCC non-fat objects (with full native symtab), IBM xlC
> >>>>> both fat and non-fat objects (with full native symtab).
> >>>>> [2] The “thin archive” here (also referred to as a combined function
> >>>>> index) has some similarities to the AR tool thin archive format, but
> >>>>> is not exactly the same. Both contain the symtab and not the code,
> but
> >>>>> the ThinLTO combined function index contains the summary sections as
> >>>>> well.
> >>>>>
> >>>>> --
> >>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
> >>>>> 408-460-2413
> >>>>>
> >>>>> _______________________________________________
> >>>>> LLVM Developers mailing list
> >>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>>
> >>>
> >>> --
> >>> Teresa Johnson | Software Engineer | tejohnson at google.com |
> 408-460-2413
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
> >>
> >
> >
> >
> > --
> > Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150603/78c729a0/attachment.html>