[LLVMdev] Updated RFC: ThinLTO Implementation Plan

Wed Jun 3 13:35:12 PDT 2015

On Wed, Jun 3, 2015 at 10:51 AM, Xinliang David Li <xinliangli at gmail.com>
wrote:

>
>> Last I heard, llvm-ar is much faster than the GNU ar, etc.
>>
>>
> They are not fully feature compatible. For instance, support of thin
> archive is missing (not that it is hard to implement it though).
>

It's on the LLVM roadmap to make these a drop-in replacement, so any work
to improve them is not wasted and would be greatly appreciated. Don't let
their current state hold you back from considering them for ThinLTO
deployment; as you've said, the features are generally pretty
straightforward to implement (and ones that aren't are usually looking
deeply-enough into an object file that running them on
bitcode-in-object-file doesn't make sense).

-- Sean Silva

>
> David
>
>
>> Alex
>>
>> > Teresa
>> >
>> >>
>> >> -- Sean Silva
>> >>
>> >>>
>> >>> And not all build systems have the plugin or
>> >>> currently pass it to the native tools that can take a plugin for
>> >>> handling bitcode. In those cases the bitcode support is not
>> >>> transparently available, and our aim is to reduce the friction as much
>> >>> as possible. And not all use LTO currently (I know we don't due to the
>> >>> scalability issues we're trying to address with this design), and in
>> >>> those cases the migration to bitcode-aware tools and plugins was not
>> >>> previously required.
>> >>>
>> >>> For Sony's linker, are you using the gold plugin or libLTO interfaces?
>> >>> If the latter, I suppose some ThinLTO handling would have to be added
>> >>> to your linker (e.g. to invoke the LLVM hooks to write the stage-2
>> >>> combined function map and either launch the backend processes in
>> >>> parallel or write out a make or other build file). The current support
>> >>> for reading native object wrapped bitcode is baked into IRObjectFile
>> >>> so presumably the Sony linker can handle these native object wrapped
>> >>> bitcode files if it uses libLTO. We would similarly embed the handling
>> >>> of the function index/summary behind an API that can handle either so
>> >>> it is similarly transparent to the linkers. Let me know if there would
>> >>> be additional issues that make wrapped bitcode more difficult in your
>> >>> case, or how we could make ThinLTO usage simpler for you in general.
>> >>>
>> >>>>
>> >>>> The only tool in the list of tools you mentioned that do not support
>> >>>> bitcode directly is objcopy, and that's because nobody has yet
>> written an
>> >>>> LLVM-project implementation of it. Personally, I'd much rather you
>> focus on
>> >>>> making ThinLTO work by extending bitcode as needed, and we work as a
>> >>>> community toward replacing objcopy with an LLVM-native one. It's a
>> big
>> >>>> missing piece of the LLVM project today and could be so much better
>> if we
>> >>>> could use it to replace Apple's lipo and possibly other extant
>> object file
>> >>>> modification tools. (Has anyone surveyed this area?)
>> >>>>
>> >>>> That older toolchains have tried to slip non-object file data through
>> >>>> the binary utilities isn't really proof that this is a good choice.
>> It might
>> >>>> simply reflect the realities of those engineering teams. I wasn't at
>> Sun for
>> >>>> this, but DTrace needed a linker feature that apparently the Sun
>> linker team
>> >>>> was unwilling or unable to provide, so dtrace(1) gained the ability
>> to
>> >>>> modify ELF files directly as needed. That doesn't prove that
>> DTrace's USDT
>> >>>> feature shouldn't have been implemented in the linker (as ld64 does
>> directly
>> >>>> for Apple), does it?
>> >>>
>> >>> I'd argue that the realities being addressed by using native object
>> >>> format in those cases still exist.
>> >>>
>> >>>>
>> >>>> If in the end using native object-wrapped bitcode is the best
>> solution,
>> >>>> so be it. However, I think it is largely orthogonal to ThinLTO's
>> needs for
>> >>>> transporting symtab data alongside the existing bitcode format.
>> >>>
>> >>> That's certainly true, ThinLTO can be implemented using either format,
>> >>> and bitcode only support can certainly be implemented. It is a matter
>> >>> of prioritizing which format to implement first. I had added some
>> >>> description to the updated RFC on how the function index/summary can
>> >>> be represented, etc in bitcode. Prioritizing the native object format
>> >>> doesn't make it easier to implement ThinLTO, but should make it easier
>> >>> to deploy.
>> >>>
>> >>> Thanks!
>> >>> Teresa
>> >>>
>> >>>>
>> >>>> Alex
>> >>>>
>> >>>>> On May 28, 2015, at 2:10 PM, Teresa Johnson <tejohnson at google.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> As promised, here is an new version of the ThinLTO RFC, updated
>> based
>> >>>>> on some of the comments, questions and feedback from the first RFC.
>> >>>>> Hopefully we have addressed many of these, and as noted below, will
>> >>>>> fork some of the detailed discussion on particular aspects into
>> >>>>> separate design doc threads. Please send any additional feedback and
>> >>>>> questions on the overall design.
>> >>>>> Thanks!
>> >>>>> Teresa
>> >>>>>
>> >>>>>
>> >>>>> Updated RFC to discuss plans for implementing ThinLTO upstream,
>> >>>>> reflecting feedback and discussion from initial RFC
>> >>>>> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085557.html).
>> As
>> >>>>> discussed in the earlier thread and below, more detailed design
>> >>>>> documents for several pieces (native object format, linkage type
>> >>>>> changes and static promotions, etc) are in progress and will be sent
>> >>>>> separately. This RFC covers the overall design and the breakdown of
>> >>>>> work at a higher level.
>> >>>>>
>> >>>>>
>> >>>>> Background on ThinLTO can be found in slides from EuroLLVM 2015:
>> >>>>>
>> >>>>>
>> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_open-3Fid-3D0B036uwnWM6RWWER1ZEl5SUNENjQ-26authuser-3D0&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=0PsInzni6pJ8kT96juwCqS61kiWswqLV5VwBAECwl1Q&s=hdSlg1-d6YUc9987m7wHEHsDl40iTNslm8-tBb4CoCU&e=>
>> >>>>> As described in the talk, we have a prototype implementation, and
>> >>>>> would like to start staging patches upstream. This RFC describes a
>> >>>>> breakdown of the major pieces. We would like to commit upstream
>> >>>>> gradually in several stages, with all functionality off by default.
>> >>>>> The core ThinLTO importing support and tuning will require frequent
>> >>>>> change and iteration during testing and tuning, and for that part we
>> >>>>> would like to commit rapidly (off by default). See the proposed
>> staged
>> >>>>> implementation described in the Implementation Plan section.
>> >>>>>
>> >>>>>
>> >>>>> ThinLTO Overview
>> >>>>> ==================
>> >>>>>
>> >>>>>
>> >>>>> See the talk slides linked above for more details. The following is
>> a
>> >>>>> high-level overview of the motivation.
>> >>>>>
>> >>>>>
>> >>>>> Cross Module Optimization (CMO) is an effective means for improving
>> >>>>> runtime performance, by extending the scope of optimizations across
>> >>>>> source module boundaries. Without CMO, the compiler is limited to
>> >>>>> optimizing within the scope of single source modules. Two solutions
>> >>>>> for enabling CMO are Link-Time Optimization (LTO), which is
>> currently
>> >>>>> supported in LLVM and GCC, and Lightweight-Interprocedural
>> >>>>> Optimization (LIPO). However, each of these solutions has
>> limitations
>> >>>>> that prevent it from being enabled by default. ThinLTO is a new
>> >>>>> approach that attempts to address these limitations, with a goal of
>> >>>>> being enabled more broadly. ThinLTO is designed with many of the
>> same
>> >>>>> principals as LIPO, and therefore its advantages, without any of its
>> >>>>> inherent weakness. Unlike in LIPO where the module group decision is
>> >>>>> made at profile training runtime, ThinLTO makes the decision at
>> >>>>> compile time, but in a lazy mode that facilitates large scale
>> >>>>> parallelism. LTO implementations all contain a serial IPA/IPO step
>> >>>>> that is both memory intensive and slow, limiting usability on both
>> >>>>> smaller workstations and huge applications. In contrast, the ThinLTO
>> >>>>> serial linker plugin phase is designed to be razor thin and
>> blazingly
>> >>>>> fast. By default this step only does minimal preparation work to
>> >>>>> enable the parallel lazy importing performed later. ThinLTO aims to
>> be
>> >>>>> scalable like a regular O2 build, enabling CMO on machines without
>> >>>>> large memory configurations, while also integrating well with
>> >>>>> distributed build systems. Results from early prototyping on SPEC
>> >>>>> cpu2006 C++ benchmarks are in line with expectations that ThinLTO
>> can
>> >>>>> scale like O2 while enabling much of the CMO performed during a full
>> >>>>> LTO build.
>> >>>>>
>> >>>>>
>> >>>>> A ThinLTO build is divided into 3 phases, which are referred to in
>> the
>> >>>>> following implementation plan:
>> >>>>> 1. phase-1: IR and Function Summary Generation (-c compile)
>> >>>>> 2. phase-2: Thin Linker Plugin Layer (thin archive linker step)
>> >>>>> 3. phase-3: Parallel Backend with Demand-Driven Importing
>> >>>>>
>> >>>>>
>> >>>>> Implementation Plan
>> >>>>> ====================
>> >>>>>
>> >>>>>
>> >>>>> This section gives a high-level breakdown of the ThinLTO support
>> that
>> >>>>> will be added, in roughly the order that the patches would be
>> staged.
>> >>>>> The patches are divided into three stages. The first stage contains
>> a
>> >>>>> minimal amount of preparation work that is not ThinLTO-specific. The
>> >>>>> second stage contains most of the infrastructure for ThinLTO, which
>> >>>>> will be off by default. The third stage includes
>> >>>>> enhancements/improvements/tunings that can be performed after the
>> main
>> >>>>> ThinLTO infrastructure is in.
>> >>>>>
>> >>>>>
>> >>>>> The second and third implementation stages will initially be very
>> >>>>> volatile, requiring a lot of iterations and tuning with large apps
>> to
>> >>>>> get stabilized. Therefore it will be important to do fast commits
>> for
>> >>>>> these implementation stages.
>> >>>>>
>> >>>>>
>> >>>>> 1. Stage 1: Preparation
>> >>>>> ------------------------------------
>> >>>>>
>> >>>>>
>> >>>>> The first planned sets of patches are enablers for ThinLTO work:
>> >>>>>
>> >>>>>
>> >>>>> a. LTO directory structure
>> >>>>>
>> >>>>>
>> >>>>> Restructure the LTO directory to remove circular dependence when
>> >>>>> ThinLTO pass added. Because ThinLTO is being implemented as a SCC
>> pass
>> >>>>> within Transforms/IPO, and leverages the LTOModule class for linking
>> >>>>> in functions from modules, IPO then requires the LTO library. This
>> >>>>> creates a circular dependence between LTO and IPO. To break that, we
>> >>>>> need to split the lib/LTO directory/library into lib/LTO/CodeGen and
>> >>>>> lib/LTO/Module, containing LTOCodeGenerator and LTOModule,
>> >>>>> respectively. Only LTOCodeGenerator has a dependence on IPO,
>> removing
>> >>>>> the circular dependence.
>> >>>>>
>> >>>>>
>> >>>>> Note that libLTO and llvm-lto use LTOModule/LTOCodeGenerator,
>> whereas
>> >>>>> the gold plugin uses lib/Object/IRObject and lib/Linker directly.
>> The
>> >>>>> use of LTOModule in the ThinLTO pass is a convenience, but could be
>> >>>>> avoided by using the IRObject/Linker methods directly if that is
>> >>>>> preferred.
>> >>>>>
>> >>>>>
>> >>>>> b. Native object wrapper generation support
>> >>>>>
>> >>>>>
>> >>>>> Implement native-object wrapped bitcode writer. The main goal is to
>> >>>>> more easily interact with existing native tools such as $AR, $NM,
>> “$LD
>> >>>>> -r”, $OBJCOPY, and $RANLIB, without requiring the build system to
>> find
>> >>>>> and pass the plugin as an option. We plan to emit the phase-1
>> bitcode
>> >>>>> wrapped in native object format via the .llvmbc section, along with
>> a
>> >>>>> symbol table. We will implement ELF first, but subsequently extend
>> >>>>> support to COFF and Mach-O. Additionally, we also want to avoid
>> doing
>> >>>>> partial LTO/ThinLTO across files linked with “$LD -r” (i.e. the
>> >>>>> resulting object file should still contain native object-wrapped
>> >>>>> bitcode to enable ThinLTO at the full link step). I will send a
>> >>>>> separate design document for these changes, including the format of
>> >>>>> the symtab and function index/summary section, but the following is
>> a
>> >>>>> high-level motivation and overview.
>> >>>>>
>> >>>>>
>> >>>>> Note that support for ThinLTO using bitcode can be added as a
>> >>>>> follow-on under an option, so that bitcode-aware tools do not need
>> to
>> >>>>> use the wrapper. Under the bitcode-only option, the symbol table
>> will
>> >>>>> be replaced by the bitcode form of the function index and summary
>> >>>>> section, which can be encoded as a new bitcode block type. Changes
>> >>>>> should be made to the gold plugin to avoid partial link of bitcode
>> >>>>> files under “$LD -r” (emitting bitcode rather than compiling all the
>> >>>>> way down to native code, which is how ld64 behaves on Darwin as per
>> >>>>> dexonsmith).
>> >>>>>
>> >>>>>
>> >>>>> Advantages of using native object format:
>> >>>>> * Out of the box interoperability with existing native build tools
>> >>>>> ($AR, $NM, “$LD -r”, $OBJCOPY, and $RANLIB) which may not currently
>> >>>>> know how to locate/pass the appropriate plugin.
>> >>>>> * There is precedence in using this format: other compilers also
>> wrap
>> >>>>> intermediate LTO files (probably related to the above advantage)[1].
>> >>>>> * Tools that modify symbol linkage and visibility (e.g. $OBJCOPY and
>> >>>>> “$LD -r”) can mark the change in the symbol table without needing to
>> >>>>> parse/change/encode bitcode. The change can be propagated to bitcode
>> >>>>> by the ThinLTO backend.
>> >>>>> * Some tools only need to read/write the symtab and can avoid
>> >>>>> parsing/encoding bitcode (e.g. $NM, $OBJCOPY).
>> >>>>> * The second phase of ThinLTO does not need to parse the bitcode
>> when
>> >>>>> creating the combined function index.
>> >>>>>
>> >>>>>
>> >>>>> Disadvantages of using native object format:
>> >>>>> * Unnecessary when using plugins with plugin-aware native tools, or
>> >>>>> LLVM’s custom tools.
>> >>>>> * Slightly increase disk storage and I/O from symtab. However, with
>> >>>>> our design the symtab is leveraged to hold function indexing info
>> >>>>> required for ThinLTO. The I/O for some build tools and build steps
>> can
>> >>>>> actually be reduced as there is no need to read the bitcode, as
>> >>>>> described above.
>> >>>>>
>> >>>>>
>> >>>>> Support was added to LLVM for reading native object-wrapped bitcode
>> >>>>> (http://reviews.llvm.org/rL218078
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_rL218078&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=0PsInzni6pJ8kT96juwCqS61kiWswqLV5VwBAECwl1Q&s=3hPqXdNQhS0J-nJXyciEj3q-WK2NBFpYP5zKiu4u3s8&e=>),
>> but there does not yet exist
>> >>>>> support in LLVM/Clang for emitting bitcode wrapped in native object
>> >>>>> format. I plan to add support for optionally generating bitcode in
>> an
>> >>>>> native object file containing a single .llvmbc section holding the
>> >>>>> bitcode. Specifically, the patch would add new options
>> >>>>> “emit-llvm-native-object” (object file) and corresponding
>> >>>>> “emit-llvm-native-assembly” (textual assembly code equivalent).
>> >>>>> Eventually these would be automatically triggered under “-fthinlto
>> -c”
>> >>>>> and “-fthinlto -S”, respectively.
>> >>>>>
>> >>>>>
>> >>>>> Additionally, a symbol table will be generated in the native object
>> >>>>> file, holding the function symbols within the bitcode. This
>> >>>>> facilitates handling archives of the native object-wrapped bitcode
>> >>>>> created with $AR, since the archive will have a symbol table as
>> well.
>> >>>>> The archive symbol table enables gold to extract and pass to the
>> >>>>> plugin the constituent native object-wrapped bitcode files. To
>> support
>> >>>>> the concatenated llvmbc section generated by “$LD -r”, some handling
>> >>>>> needs to be added to gold and to the backend driver to process each
>> >>>>> original module’s bitcode.
>> >>>>>
>> >>>>>
>> >>>>> The function index/summary will later be added as a special native
>> >>>>> object section alongside the .llvmbc sections. The offset and size
>> of
>> >>>>> the corresponding function summary can be placed in the associated
>> >>>>> symtab entry. As noted above, a separate design document will be
>> sent
>> >>>>> for the native object format changes.
>> >>>>>
>> >>>>>
>> >>>>> 2. Stage 2: ThinLTO Infrastructure
>> >>>>> ------------------------------------------------------
>> >>>>>
>> >>>>>
>> >>>>> The next set of patches adds the base implementation of the ThinLTO
>> >>>>> infrastructure, specifically those required to make ThinLTO
>> functional
>> >>>>> and generate correct but not necessarily high-performing binaries.
>> >>>>>
>> >>>>>
>> >>>>> a. Clang/LLVM/gold linker options
>> >>>>>
>> >>>>>
>> >>>>> An early set of clang/llvm patches is needed to provide options to
>> >>>>> enable ThinLTO (off by default), so that the rest of the
>> >>>>> implementation can be disabled by default as it is added.
>> >>>>> Specifically, clang options -fthinlto (used instead of -flto) will
>> >>>>> cause clang to invoke the phase-1 emission of LLVM bitcode and
>> >>>>> function summary/index on a compile step, and pass the appropriate
>> >>>>> option to the gold plugin on a link step. The -thinlto option will
>> be
>> >>>>> added to the gold plugin and llvm-lto tool to launch the phase-2
>> thin
>> >>>>> archive step. The -thinlto-be option will also be added to clang to
>> >>>>> invoke it as a phase-3 parallel backend instance with a bitcode file
>> >>>>> as input.
>> >>>>>
>> >>>>>
>> >>>>> b. Thin-archive linking support in Gold plugin and llvm-lto
>> >>>>>
>> >>>>>
>> >>>>> Under the new plugin option (see above), the plugin needs to perform
>> >>>>> the phase-2 (thin archive) link which simply emits a combined
>> function
>> >>>>> index from the linked modules, without actually performing the
>> normal
>> >>>>> link. Corresponding support should be added to the standalone
>> llvm-lto
>> >>>>> tool to enable testing/debugging without involving the linker and
>> >>>>> plugin.
>> >>>>>
>> >>>>>
>> >>>>> c. ThinLTO backend support
>> >>>>>
>> >>>>>
>> >>>>> Support for invoking a phase-3 backend invocation (including
>> >>>>> importing) on a module should be added to the clang driver under the
>> >>>>> new option. The main change under the option is to instantiate a
>> >>>>> Linker object used to manage the process of linking imported
>> functions
>> >>>>> into the module, efficient read of the combined function index, and
>> >>>>> enable the ThinLTO import pass.
>> >>>>>
>> >>>>>
>> >>>>> d. Function index/summary support
>> >>>>>
>> >>>>>
>> >>>>> This includes infrastructure for writing and reading the function
>> >>>>> index/summary section. As noted earlier this will be encoded in a
>> >>>>> special section within the native object file for the module,
>> >>>>> alongside the .llvmbc section containing the bitcode. The thin
>> archive
>> >>>>> (combined function index) generated by phase-2 of ThinLTO simply
>> >>>>> contains all of the function index/summary sections across the
>> linked
>> >>>>> modules, organized for efficient function lookup. As mentioned
>> earlier
>> >>>>> when discussing the native object wrapper format, a separate design
>> >>>>> document will be sent for this format.
>> >>>>>
>> >>>>>
>> >>>>> Each function available for importing from the module contains an
>> >>>>> entry in the module’s function index/summary section and in the
>> >>>>> resulting combined function index. Each function entry contains that
>> >>>>> function’s offset within the bitcode file, used to efficiently
>> locate
>> >>>>> and quickly import just that function (see below in 2e for more
>> >>>>> details on the importing mechanics). The entry also contains summary
>> >>>>> information (e.g. basic information determined during parsing such
>> as
>> >>>>> the number of instructions in the function), that will be used to
>> help
>> >>>>> guide later import decisions. Because the contents of this section
>> >>>>> will change frequently during ThinLTO tuning, it should also be
>> marked
>> >>>>> with a version id for backwards compatibility or version checking.
>> >>>>>
>> >>>>>
>> >>>>> e. ThinLTO importing support
>> >>>>>
>> >>>>>
>> >>>>> Support for the mechanics of importing functions from other modules,
>> >>>>> which can go in gradually as a set of patches since it will be off
>> by
>> >>>>> default (the ThinLTO pass itself discussed below in 2f).
>> >>>>>
>> >>>>>
>> >>>>> Note that ThinLTO function importing is iterative, and we may import
>> >>>>> from a number of modules in an interleaved fashion. For example,
>> >>>>> assume we have hot call chains a()->b1()->c() and a()->b2()->d(),
>> >>>>> where functions a(), b1()/b2(), c() and d() are from modules A, B, C
>> >>>>> and D, respectively. When performing ThinLTO backend compilation of
>> >>>>> module A, we may decide to import in the following order (based on
>> >>>>> callsite and function summary info):
>> >>>>> 1. B::b1()  # exposes call to c()
>> >>>>> 2. C::c()
>> >>>>> 3. B::b2()  # exposes call to d()
>> >>>>> 4. D::d()
>> >>>>> For this reason, ThinLTO importing is different than regular LTO
>> >>>>> bitcode reading and linking, which reads and links in a module in
>> its
>> >>>>> entirety on a single pass through each module (notice in the above
>> >>>>> example the imports of the two module B functions have an
>> intervening
>> >>>>> import from module C). As a result, for example, the existing
>> support
>> >>>>> for lazy metadata parsing that delays it until the first function is
>> >>>>> materialized can’t be leveraged (metadata handling is discussed more
>> >>>>> below in 2h). Therefore, the ThinLTO importing pass instantiates a
>> new
>> >>>>> BitcodeReader and LTOModule object for each function we decide to
>> >>>>> import, parsing only what is needed and linking in just that
>> function.
>> >>>>> This is fast and efficient as found in the prototype results shown
>> in
>> >>>>> the linked EuroLLVM slides.
>> >>>>>
>> >>>>>
>> >>>>> Separate patches can include:
>> >>>>>
>> >>>>>
>> >>>>> * BitcodeReader changes to use function index to import/deserialize
>> >>>>> single function of interest (small changes, leverages existing lazy
>> >>>>> function streamer support). The declarations and other symbol table
>> >>>>> info in the bitcode must be reloaded, but the bitcode parsing can
>> stop
>> >>>>> once the first function body is hit. We simply set up an entry in
>> the
>> >>>>> lazy streamer’s DeferredFunctionInfo function index map from the
>> >>>>> bitcode index that was saved in the ThinLTO function summary (and
>> >>>>> therefore don’t need to build up this function index structure
>> through
>> >>>>> repeated calls to RememberAndSkipFunctionBody via
>> >>>>> FindFunctionInStream).
>> >>>>> * Minor LTOModule changes to pass the ThinLTO function to import and
>> >>>>> its index into bitcode reader (see 1a for discussion on LTOModule
>> >>>>> use).
>> >>>>> * Marking of imported functions. Most handling for ThinLTO imported
>> >>>>> functions will simply rely on applying the appropriate linkage type.
>> >>>>> But it is useful to know which functions were imported, both for
>> >>>>> compiler debugging and and verification, and possibly to modify some
>> >>>>> optimization heuristics along with the summary information. This can
>> >>>>> be in-memory initially, but IR support may be required in order to
>> >>>>> support streaming bitcode out and back in again after importing.
>> >>>>> * ModuleLinker changes to do ThinLTO-specific symbol linking and
>> >>>>> static promotion when necessary. The linkage type of imported
>> >>>>> non-local functions and variables changes to
>> >>>>> AvailableExternallyLinkage, for example. Statics must be promoted in
>> >>>>> certain cases, and accordingly renamed in consistent ways.
>> Read-write
>> >>>>> or address-taken static variables must always be promoted. Other
>> >>>>> discardable functions, i.e. link-once such as comdats, will be force
>> >>>>> imported on reference by another imported function. We are working
>> on
>> >>>>> a separate design document describing these changes in more detail
>> >>>>> with examples, as a more detailed discussion of these changes is
>> >>>>> beyond the scope of this RFC.
>> >>>>> * GlobalDCE changes to support removing imported non-local functions
>> >>>>> that were not inlined and imported non-local variables, which are
>> >>>>> marked AvailableExternallyLinkage (very small changes to existing
>> pass
>> >>>>> logic). As discussed in the original RFC threads, currently
>> GlobalDCE
>> >>>>> does not remove referenced AvailableExternallyLinkage functions.
>> >>>>> Instead, these are suppressed later during code generation. It isn’t
>> >>>>> clear that these functions are useful past the first call to
>> >>>>> GlobalDCE, which is after inlining, GlobalOpt and IPSCCP (so
>> >>>>> presumably after inter procedural constant prop, etc). Patch with
>> >>>>> these changes in testing as discussed in this thread:
>> >>>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085807.html.
>> >>>>>
>> >>>>>
>> >>>>> f. ThinLTO Import Driver SCC pass
>> >>>>>
>> >>>>>
>> >>>>> Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via
>> >>>>> an SCC pass, enabled only under the -fthinlto-be option. The pass
>> >>>>> includes utilizing the thin archive[2] (combined global function
>> >>>>> index/summary), import decision heuristics, invocation of
>> >>>>> LTOModule/ModuleLinker routines that perform the import, and any
>> >>>>> necessary callgraph updates and verification.
>> >>>>>
>> >>>>>
>> >>>>> g. Backend Driver
>> >>>>>
>> >>>>>
>> >>>>> For a single node build, the gold plugin will initially exec the
>> >>>>> backend processes directly, with the amount of parallelism
>> controlled
>> >>>>> via an option and/or env variable. It is also possible to leverage
>> >>>>> existing single node build system task dispatching mechanisms such
>> as
>> >>>>> Unix Makefiles, Ninja, etc., where the plugin can simply write a
>> build
>> >>>>> file and fork the parallel backend instances directly under an
>> >>>>> appropriate option. We will also initially add support for our
>> >>>>> distributed build system as described below under 3c.
>> >>>>>
>> >>>>>
>> >>>>> h. Lazy Debug Metadata Linking
>> >>>>>
>> >>>>>
>> >>>>> The prototype implementation included lazy importing of module-level
>> >>>>> metadata during the ThinLTO pass finalization (i.e. after all
>> function
>> >>>>> importing is complete). This actually applies to all module-level
>> >>>>> metadata, not just debug, although it is the largest. This can be
>> >>>>> added as a separate set of patches, and the detailed design will be
>> >>>>> sent with those. Includes changes to BitcodeReader, ValueMapper, and
>> >>>>> the ModuleLinker classes. As described in 2e, due to the
>> >>>>> iterative/interleaved nature of ThinLTO importing, the bitcode
>> parsing
>> >>>>> is structured differently than LTO where a single pass over each
>> >>>>> module can be performed to parse and materialize all functions and
>> >>>>> metadata. Therefore, the lazy metadata parsing support in
>> >>>>> BitcodeReader, which parses all the metadata once the first function
>> >>>>> is materialized, are not applicable. We may instantiate a
>> >>>>> BitcodeReader multiple times for a module, if multiple functions are
>> >>>>> eventually imported, and we need a way to suture up the metadata to
>> >>>>> the functions imported by an earlier BitcodeReader instantiation.
>> The
>> >>>>> high level summary is that during the initial import we leave the
>> >>>>> temporary metadata on the instructions that were imported, but save
>> >>>>> the index used by the bitcode reader used to correlate with the
>> >>>>> metadata when it is ready (i.e. the MDValuePtrs index), and skip the
>> >>>>> metadata parsing. During the ThinLTO pass finalization we parse just
>> >>>>> the metadata, and suture it up during metadata value mapping using
>> the
>> >>>>> saved index. As mentioned earlier, this will be described in more
>> >>>>> detail when the patches are ready.
>> >>>>>
>> >>>>>
>> >>>>> 3. Stage 3: ThinLTO Tuning and Enhancements
>> >>>>>
>> >>>>>
>> -------------------------------------------------------------------------
>> >>>>>
>> >>>>>
>> >>>>> This refers to the patches that are not required for ThinLTO to
>> work,
>> >>>>> but rather to improve compile time, memory, run-time performance and
>> >>>>> usability.
>> >>>>>
>> >>>>>
>> >>>>> a. Import Tuning
>> >>>>>
>> >>>>>
>> >>>>> Tuning the import strategy will be an iterative process that will
>> >>>>> continue to be refined over time. It involves several different
>> types
>> >>>>> of changes: adding support for recording additional metrics in the
>> >>>>> function summary, such as profile data and optional heavier-weight
>> IPA
>> >>>>> analyses, and tuning the import heuristics based on the summary and
>> >>>>> callsite context.
>> >>>>>
>> >>>>>
>> >>>>> b. Combined Function Index Pruning
>> >>>>>
>> >>>>>
>> >>>>> The combined function index can be pruned of functions that are
>> >>>>> unlikely to benefit from being imported. For example, during the
>> >>>>> phase-2 thin archive plug step we can safely omit large and (with
>> >>>>> profile data) cold functions, which are unlikely to benefit from
>> being
>> >>>>> inlined. Additionally, all but one copy of comdat functions can be
>> >>>>> suppressed.
>> >>>>>
>> >>>>>
>> >>>>> c. Distributed Build System Integration
>> >>>>>
>> >>>>>
>> >>>>> For a distributed build system such as Bazel (http://bazel.io/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__bazel.io_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=0PsInzni6pJ8kT96juwCqS61kiWswqLV5VwBAECwl1Q&s=65whxawpYDsRwKfFOkJtwKog-n2UbU_REkAKKQAzzHU&e=>),
>> the
>> >>>>> gold plugin should write the parallel backend invocations into a
>> build
>> >>>>> file, including the mapping from the IR file to the real object file
>> >>>>> path, and exit. Additional work needs to be done in the distributed
>> >>>>> build system itself to distribute and dispatch the parallel backend
>> >>>>> jobs to the build cluster.
>> >>>>>
>> >>>>>
>> >>>>> d. Dependence Tracking and Incremental Compiles
>> >>>>>
>> >>>>>
>> >>>>> In order to support build systems that stage from local disks or
>> >>>>> network storage, the plugin will optionally support computation of
>> >>>>> dependent sets of IR files that each module may import from. This
>> can
>> >>>>> be computed from profile data, if it exists, or from the symbol
>> table
>> >>>>> and heuristics if not. These dependence sets also enable support for
>> >>>>> incremental backend compiles.
>> >>>>>
>> >>>>>
>> >>>>> ________________
>> >>>>> [1] The following compilers currently wrap intermediate LTO files in
>> >>>>> native object format: GCC fat and non-fat objects (with a custom
>> >>>>> symtab), Intel icc non-fat (IR-only) objects (with a full native
>> >>>>> symtab), HP’s aCC non-fat objects (with full native symtab), IBM xlC
>> >>>>> both fat and non-fat objects (with full native symtab).
>> >>>>> [2] The “thin archive” here (also referred to as a combined function
>> >>>>> index) has some similarities to the AR tool thin archive format, but
>> >>>>> is not exactly the same. Both contain the symtab and not the code,
>> but
>> >>>>> the ThinLTO combined function index contains the summary sections as
>> >>>>> well.
>> >>>>>
>> >>>>> --
>> >>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> >>>>> 408-460-2413
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> LLVM Developers mailing list
>> >>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> 408-460-2413
>> >>>
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Teresa Johnson | Software Engineer | tejohnson at google.com |
>> 408-460-2413
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150603/547ea1f4/attachment.html>