[LLVMdev] Updated RFC: ThinLTO Implementation Plan

Wed Jun 3 09:52:31 PDT 2015


> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
> Behalf Of Teresa Johnson
> Sent: Wednesday, June 03, 2015 7:02 AM
> To: Dave Bozier
> Cc: <llvmdev at cs.uiuc.edu> List
> Subject: Re: [LLVMdev] Updated RFC: ThinLTO Implementation Plan
> 
> On Wed, Jun 3, 2015 at 4:19 AM, Dave Bozier <seifsta at gmail.com> wrote:
> > Hi Teresa,
> >
> > Thanks for providing this updated RFC.
> >
> >> For Sony's linker, are you using the gold plugin or libLTO interfaces?
> >> If the latter, I suppose some ThinLTO handling would have to be added
> >> to your linker (e.g. to invoke the LLVM hooks to write the stage-2
> >> combined function map and either launch the backend processes in
> >> parallel or write out a make or other build file). The current support
> >> for reading native object wrapped bitcode is baked into IRObjectFile
> >> so presumably the Sony linker can handle these native object wrapped
> >> bitcode files if it uses libLTO. We would similarly embed the handling
> >> of the function index/summary behind an API that can handle either so
> >> it is similarly transparent to the linkers. Let me know if there would
> >> be additional issues that make wrapped bitcode more difficult in your
> >> case, or how we could make ThinLTO usage simpler for you in general.
> > We use the libLTO interfaces.
> 
> Hi Dave,
> 
> Thanks for the info.
> 
> >
> > We use the libLTO interfaces, more specifically we use the C API
> > located in llvm-c\lto.h.
> >
> > Our linker won't support native object wrapped bitcode files as our
> > LTO is it currently stands. Right now, it will be recognized as an
> > object file and won't get anywhere near the libLTO libraries. We'd
> > need to teach our linker to recognize and differentiate native object
> > wrapped bitcode files and regular native object files. This isn't
> > straight forward as we cannot distinguish them just by looking at the
> > file header alone, we would need to parse the sections and look for a
> > .llvmbc section. We then need to add special handling of these native
> > object wrappers.
> 
> Ok, I see. Does it help that there are LTOModule (lto_module_* in the
> C API) interfaces for checking if a file contains bitcode (regardless
> of whether it is straight-up or native-wrapped)? I don't know how hard
> in your linker it is to query these when deciding whether to treat the
> object file as bitcode or not, or how hard it is to pass the resulting
> object file along to the libLTO routines for handling (they
> automatically handle the native-wrapped object files so the linker
> shouldn't have to do anything special to read them).

One twist is that we use the Darwin-style wrapper around our bitcode files
so that we have a place to hang a bitcode version number, which we also
want to check.  Without reopening the debate about why we do that, we do
that, and I fully expect the libLTO API to silently ignore the wrapper that
we are depending on.  I suppose we could add a new libLTO API that verifies
the bitcode wrapper but it would be yet another private change to maintain,
rather than just having the linker check it directly.
--paulr

> 
> Specifically, in the C API these are the lto_module_is_object_file*
> variants, which will return true for either straight-up or
> native-wrapped bitcode. All of the mechanics of handling bitcode vs
> native object-wrapped bitcode are down in the IRObjectFile handling.
> So the LTOModule:isBitcode*/lto_module_is_object_file* will correctly
> identify native object-wrapped bitcode as bitcode. And the
> LTOModule::createFrom*/lto_module_create* routines correctly parse the
> native object-wrapped bitcode and return an LTOmodule.
> 
> As a result, the llvm-lto tool that also uses libLTO interfaces didn't
> require any changes when the native-wrapped reading support went in
> (r218078), and is able to handle native-wrapped bitcode out of the
> box.
> 
> >
> > Handling the function index/summary behind an API sounds like a good
> idea.
> 
> I am going to work on fleshing out this part next so that the actual
> format of the files is hidden from clients.
> 
> Thanks,
> Teresa
> 
> >
> > On Fri, May 29, 2015 at 4:01 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
> >> On Fri, May 29, 2015 at 6:56 AM, Alex Rosenberg <alexr at leftfield.org>
> wrote:
> >>> My earlier statement about wrapping things in a native object file
> held in that it is controversial. It appears to be still central to your
> design.
> >>>
> >>> It may help to look at the problem from a different viewpoint: LLVM is
> not a compiler. It is a framework that can be used to make compiler-like
> tools.
> >>>
> >>> From that view, it no longer makes sense to discuss "the plugin," or
> gold, or $AR, because there isn't just one of any of those things. ld64
> isn't the only outlier linker to consider. We have our own linker at Sony,
> for example. From this perspective, then it makes more sense to consider
> replacing the binary utilities with ones that support bitcode, because
> from a user-perspective, all of the linkers already transparently support
> bitcode directly today, as do ar, nm, etc. This has been necessary for the
> regular LTO process.
> >>
> >> Hi Alex,
> >>
> >> It's true that the LLVM versions of these tools support bitcode
> >> transparently, but not all build systems use LLVM versions of these
> >> tools, particularly build systems that support a variety of compilers,
> >> or legacy build systems. And not all build systems have the plugin or
> >> currently pass it to the native tools that can take a plugin for
> >> handling bitcode. In those cases the bitcode support is not
> >> transparently available, and our aim is to reduce the friction as much
> >> as possible. And not all use LTO currently (I know we don't due to the
> >> scalability issues we're trying to address with this design), and in
> >> those cases the migration to bitcode-aware tools and plugins was not
> >> previously required.
> >>
> >> For Sony's linker, are you using the gold plugin or libLTO interfaces?
> >> If the latter, I suppose some ThinLTO handling would have to be added
> >> to your linker (e.g. to invoke the LLVM hooks to write the stage-2
> >> combined function map and either launch the backend processes in
> >> parallel or write out a make or other build file). The current support
> >> for reading native object wrapped bitcode is baked into IRObjectFile
> >> so presumably the Sony linker can handle these native object wrapped
> >> bitcode files if it uses libLTO. We would similarly embed the handling
> >> of the function index/summary behind an API that can handle either so
> >> it is similarly transparent to the linkers. Let me know if there would
> >> be additional issues that make wrapped bitcode more difficult in your
> >> case, or how we could make ThinLTO usage simpler for you in general.
> >>
> >>>
> >>> The only tool in the list of tools you mentioned that do not support
> bitcode directly is objcopy, and that's because nobody has yet written an
> LLVM-project implementation of it. Personally, I'd much rather you focus
> on making ThinLTO work by extending bitcode as needed, and we work as a
> community toward replacing objcopy with an LLVM-native one. It's a big
> missing piece of the LLVM project today and could be so much better if we
> could use it to replace Apple's lipo and possibly other extant object file
> modification tools. (Has anyone surveyed this area?)
> >>>
> >>> That older toolchains have tried to slip non-object file data through
> the binary utilities isn't really proof that this is a good choice. It
> might simply reflect the realities of those engineering teams. I wasn't at
> Sun for this, but DTrace needed a linker feature that apparently the Sun
> linker team was unwilling or unable to provide, so dtrace(1) gained the
> ability to modify ELF files directly as needed. That doesn't prove that
> DTrace's USDT feature shouldn't have been implemented in the linker (as
> ld64 does directly for Apple), does it?
> >>
> >> I'd argue that the realities being addressed by using native object
> >> format in those cases still exist.
> >>
> >>>
> >>> If in the end using native object-wrapped bitcode is the best
> solution, so be it. However, I think it is largely orthogonal to ThinLTO's
> needs for transporting symtab data alongside the existing bitcode format.
> >>
> >> That's certainly true, ThinLTO can be implemented using either format,
> >> and bitcode only support can certainly be implemented. It is a matter
> >> of prioritizing which format to implement first. I had added some
> >> description to the updated RFC on how the function index/summary can
> >> be represented, etc in bitcode. Prioritizing the native object format
> >> doesn't make it easier to implement ThinLTO, but should make it easier
> >> to deploy.
> >>
> >> Thanks!
> >> Teresa
> >>
> >>>
> >>> Alex
> >>>
> >>>> On May 28, 2015, at 2:10 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
> >>>>
> >>>> As promised, here is an new version of the ThinLTO RFC, updated based
> >>>> on some of the comments, questions and feedback from the first RFC.
> >>>> Hopefully we have addressed many of these, and as noted below, will
> >>>> fork some of the detailed discussion on particular aspects into
> >>>> separate design doc threads. Please send any additional feedback and
> >>>> questions on the overall design.
> >>>> Thanks!
> >>>> Teresa
> >>>>
> >>>>
> >>>> Updated RFC to discuss plans for implementing ThinLTO upstream,
> >>>> reflecting feedback and discussion from initial RFC
> >>>> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085557.html). As
> >>>> discussed in the earlier thread and below, more detailed design
> >>>> documents for several pieces (native object format, linkage type
> >>>> changes and static promotions, etc) are in progress and will be sent
> >>>> separately. This RFC covers the overall design and the breakdown of
> >>>> work at a higher level.
> >>>>
> >>>>
> >>>> Background on ThinLTO can be found in slides from EuroLLVM 2015:
> >>>>
> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
> >>>> As described in the talk, we have a prototype implementation, and
> >>>> would like to start staging patches upstream. This RFC describes a
> >>>> breakdown of the major pieces. We would like to commit upstream
> >>>> gradually in several stages, with all functionality off by default.
> >>>> The core ThinLTO importing support and tuning will require frequent
> >>>> change and iteration during testing and tuning, and for that part we
> >>>> would like to commit rapidly (off by default). See the proposed
> staged
> >>>> implementation described in the Implementation Plan section.
> >>>>
> >>>>
> >>>> ThinLTO Overview
> >>>> ==================
> >>>>
> >>>>
> >>>> See the talk slides linked above for more details. The following is a
> >>>> high-level overview of the motivation.
> >>>>
> >>>>
> >>>> Cross Module Optimization (CMO) is an effective means for improving
> >>>> runtime performance, by extending the scope of optimizations across
> >>>> source module boundaries. Without CMO, the compiler is limited to
> >>>> optimizing within the scope of single source modules. Two solutions
> >>>> for enabling CMO are Link-Time Optimization (LTO), which is currently
> >>>> supported in LLVM and GCC, and Lightweight-Interprocedural
> >>>> Optimization (LIPO). However, each of these solutions has limitations
> >>>> that prevent it from being enabled by default. ThinLTO is a new
> >>>> approach that attempts to address these limitations, with a goal of
> >>>> being enabled more broadly. ThinLTO is designed with many of the same
> >>>> principals as LIPO, and therefore its advantages, without any of its
> >>>> inherent weakness. Unlike in LIPO where the module group decision is
> >>>> made at profile training runtime, ThinLTO makes the decision at
> >>>> compile time, but in a lazy mode that facilitates large scale
> >>>> parallelism. LTO implementations all contain a serial IPA/IPO step
> >>>> that is both memory intensive and slow, limiting usability on both
> >>>> smaller workstations and huge applications. In contrast, the ThinLTO
> >>>> serial linker plugin phase is designed to be razor thin and blazingly
> >>>> fast. By default this step only does minimal preparation work to
> >>>> enable the parallel lazy importing performed later. ThinLTO aims to
> be
> >>>> scalable like a regular O2 build, enabling CMO on machines without
> >>>> large memory configurations, while also integrating well with
> >>>> distributed build systems. Results from early prototyping on SPEC
> >>>> cpu2006 C++ benchmarks are in line with expectations that ThinLTO can
> >>>> scale like O2 while enabling much of the CMO performed during a full
> >>>> LTO build.
> >>>>
> >>>>
> >>>> A ThinLTO build is divided into 3 phases, which are referred to in
> the
> >>>> following implementation plan:
> >>>> 1. phase-1: IR and Function Summary Generation (-c compile)
> >>>> 2. phase-2: Thin Linker Plugin Layer (thin archive linker step)
> >>>> 3. phase-3: Parallel Backend with Demand-Driven Importing
> >>>>
> >>>>
> >>>> Implementation Plan
> >>>> ====================
> >>>>
> >>>>
> >>>> This section gives a high-level breakdown of the ThinLTO support that
> >>>> will be added, in roughly the order that the patches would be staged.
> >>>> The patches are divided into three stages. The first stage contains a
> >>>> minimal amount of preparation work that is not ThinLTO-specific. The
> >>>> second stage contains most of the infrastructure for ThinLTO, which
> >>>> will be off by default. The third stage includes
> >>>> enhancements/improvements/tunings that can be performed after the
> main
> >>>> ThinLTO infrastructure is in.
> >>>>
> >>>>
> >>>> The second and third implementation stages will initially be very
> >>>> volatile, requiring a lot of iterations and tuning with large apps to
> >>>> get stabilized. Therefore it will be important to do fast commits for
> >>>> these implementation stages.
> >>>>
> >>>>
> >>>> 1. Stage 1: Preparation
> >>>> ------------------------------------
> >>>>
> >>>>
> >>>> The first planned sets of patches are enablers for ThinLTO work:
> >>>>
> >>>>
> >>>> a. LTO directory structure
> >>>>
> >>>>
> >>>> Restructure the LTO directory to remove circular dependence when
> >>>> ThinLTO pass added. Because ThinLTO is being implemented as a SCC
> pass
> >>>> within Transforms/IPO, and leverages the LTOModule class for linking
> >>>> in functions from modules, IPO then requires the LTO library. This
> >>>> creates a circular dependence between LTO and IPO. To break that, we
> >>>> need to split the lib/LTO directory/library into lib/LTO/CodeGen and
> >>>> lib/LTO/Module, containing LTOCodeGenerator and LTOModule,
> >>>> respectively. Only LTOCodeGenerator has a dependence on IPO, removing
> >>>> the circular dependence.
> >>>>
> >>>>
> >>>> Note that libLTO and llvm-lto use LTOModule/LTOCodeGenerator, whereas
> >>>> the gold plugin uses lib/Object/IRObject and lib/Linker directly. The
> >>>> use of LTOModule in the ThinLTO pass is a convenience, but could be
> >>>> avoided by using the IRObject/Linker methods directly if that is
> >>>> preferred.
> >>>>
> >>>>
> >>>> b. Native object wrapper generation support
> >>>>
> >>>>
> >>>> Implement native-object wrapped bitcode writer. The main goal is to
> >>>> more easily interact with existing native tools such as $AR, $NM,
> “$LD
> >>>> -r”, $OBJCOPY, and $RANLIB, without requiring the build system to
> find
> >>>> and pass the plugin as an option. We plan to emit the phase-1 bitcode
> >>>> wrapped in native object format via the .llvmbc section, along with a
> >>>> symbol table. We will implement ELF first, but subsequently extend
> >>>> support to COFF and Mach-O. Additionally, we also want to avoid doing
> >>>> partial LTO/ThinLTO across files linked with “$LD -r” (i.e. the
> >>>> resulting object file should still contain native object-wrapped
> >>>> bitcode to enable ThinLTO at the full link step). I will send a
> >>>> separate design document for these changes, including the format of
> >>>> the symtab and function index/summary section, but the following is a
> >>>> high-level motivation and overview.
> >>>>
> >>>>
> >>>> Note that support for ThinLTO using bitcode can be added as a
> >>>> follow-on under an option, so that bitcode-aware tools do not need to
> >>>> use the wrapper. Under the bitcode-only option, the symbol table will
> >>>> be replaced by the bitcode form of the function index and summary
> >>>> section, which can be encoded as a new bitcode block type. Changes
> >>>> should be made to the gold plugin to avoid partial link of bitcode
> >>>> files under “$LD -r” (emitting bitcode rather than compiling all the
> >>>> way down to native code, which is how ld64 behaves on Darwin as per
> >>>> dexonsmith).
> >>>>
> >>>>
> >>>> Advantages of using native object format:
> >>>> * Out of the box interoperability with existing native build tools
> >>>> ($AR, $NM, “$LD -r”, $OBJCOPY, and $RANLIB) which may not currently
> >>>> know how to locate/pass the appropriate plugin.
> >>>> * There is precedence in using this format: other compilers also wrap
> >>>> intermediate LTO files (probably related to the above advantage)[1].
> >>>> * Tools that modify symbol linkage and visibility (e.g. $OBJCOPY and
> >>>> “$LD -r”) can mark the change in the symbol table without needing to
> >>>> parse/change/encode bitcode. The change can be propagated to bitcode
> >>>> by the ThinLTO backend.
> >>>> * Some tools only need to read/write the symtab and can avoid
> >>>> parsing/encoding bitcode (e.g. $NM, $OBJCOPY).
> >>>> * The second phase of ThinLTO does not need to parse the bitcode when
> >>>> creating the combined function index.
> >>>>
> >>>>
> >>>> Disadvantages of using native object format:
> >>>> * Unnecessary when using plugins with plugin-aware native tools, or
> >>>> LLVM’s custom tools.
> >>>> * Slightly increase disk storage and I/O from symtab. However, with
> >>>> our design the symtab is leveraged to hold function indexing info
> >>>> required for ThinLTO. The I/O for some build tools and build steps
> can
> >>>> actually be reduced as there is no need to read the bitcode, as
> >>>> described above.
> >>>>
> >>>>
> >>>> Support was added to LLVM for reading native object-wrapped bitcode
> >>>> (http://reviews.llvm.org/rL218078), but there does not yet exist
> >>>> support in LLVM/Clang for emitting bitcode wrapped in native object
> >>>> format. I plan to add support for optionally generating bitcode in an
> >>>> native object file containing a single .llvmbc section holding the
> >>>> bitcode. Specifically, the patch would add new options
> >>>> “emit-llvm-native-object” (object file) and corresponding
> >>>> “emit-llvm-native-assembly” (textual assembly code equivalent).
> >>>> Eventually these would be automatically triggered under “-fthinlto -
> c”
> >>>> and “-fthinlto -S”, respectively.
> >>>>
> >>>>
> >>>> Additionally, a symbol table will be generated in the native object
> >>>> file, holding the function symbols within the bitcode. This
> >>>> facilitates handling archives of the native object-wrapped bitcode
> >>>> created with $AR, since the archive will have a symbol table as well.
> >>>> The archive symbol table enables gold to extract and pass to the
> >>>> plugin the constituent native object-wrapped bitcode files. To
> support
> >>>> the concatenated llvmbc section generated by “$LD -r”, some handling
> >>>> needs to be added to gold and to the backend driver to process each
> >>>> original module’s bitcode.
> >>>>
> >>>>
> >>>> The function index/summary will later be added as a special native
> >>>> object section alongside the .llvmbc sections. The offset and size of
> >>>> the corresponding function summary can be placed in the associated
> >>>> symtab entry. As noted above, a separate design document will be sent
> >>>> for the native object format changes.
> >>>>
> >>>>
> >>>> 2. Stage 2: ThinLTO Infrastructure
> >>>> ------------------------------------------------------
> >>>>
> >>>>
> >>>> The next set of patches adds the base implementation of the ThinLTO
> >>>> infrastructure, specifically those required to make ThinLTO
> functional
> >>>> and generate correct but not necessarily high-performing binaries.
> >>>>
> >>>>
> >>>> a. Clang/LLVM/gold linker options
> >>>>
> >>>>
> >>>> An early set of clang/llvm patches is needed to provide options to
> >>>> enable ThinLTO (off by default), so that the rest of the
> >>>> implementation can be disabled by default as it is added.
> >>>> Specifically, clang options -fthinlto (used instead of -flto) will
> >>>> cause clang to invoke the phase-1 emission of LLVM bitcode and
> >>>> function summary/index on a compile step, and pass the appropriate
> >>>> option to the gold plugin on a link step. The -thinlto option will be
> >>>> added to the gold plugin and llvm-lto tool to launch the phase-2 thin
> >>>> archive step. The -thinlto-be option will also be added to clang to
> >>>> invoke it as a phase-3 parallel backend instance with a bitcode file
> >>>> as input.
> >>>>
> >>>>
> >>>> b. Thin-archive linking support in Gold plugin and llvm-lto
> >>>>
> >>>>
> >>>> Under the new plugin option (see above), the plugin needs to perform
> >>>> the phase-2 (thin archive) link which simply emits a combined
> function
> >>>> index from the linked modules, without actually performing the normal
> >>>> link. Corresponding support should be added to the standalone llvm-
> lto
> >>>> tool to enable testing/debugging without involving the linker and
> >>>> plugin.
> >>>>
> >>>>
> >>>> c. ThinLTO backend support
> >>>>
> >>>>
> >>>> Support for invoking a phase-3 backend invocation (including
> >>>> importing) on a module should be added to the clang driver under the
> >>>> new option. The main change under the option is to instantiate a
> >>>> Linker object used to manage the process of linking imported
> functions
> >>>> into the module, efficient read of the combined function index, and
> >>>> enable the ThinLTO import pass.
> >>>>
> >>>>
> >>>> d. Function index/summary support
> >>>>
> >>>>
> >>>> This includes infrastructure for writing and reading the function
> >>>> index/summary section. As noted earlier this will be encoded in a
> >>>> special section within the native object file for the module,
> >>>> alongside the .llvmbc section containing the bitcode. The thin
> archive
> >>>> (combined function index) generated by phase-2 of ThinLTO simply
> >>>> contains all of the function index/summary sections across the linked
> >>>> modules, organized for efficient function lookup. As mentioned
> earlier
> >>>> when discussing the native object wrapper format, a separate design
> >>>> document will be sent for this format.
> >>>>
> >>>>
> >>>> Each function available for importing from the module contains an
> >>>> entry in the module’s function index/summary section and in the
> >>>> resulting combined function index. Each function entry contains that
> >>>> function’s offset within the bitcode file, used to efficiently locate
> >>>> and quickly import just that function (see below in 2e for more
> >>>> details on the importing mechanics). The entry also contains summary
> >>>> information (e.g. basic information determined during parsing such as
> >>>> the number of instructions in the function), that will be used to
> help
> >>>> guide later import decisions. Because the contents of this section
> >>>> will change frequently during ThinLTO tuning, it should also be
> marked
> >>>> with a version id for backwards compatibility or version checking.
> >>>>
> >>>>
> >>>> e. ThinLTO importing support
> >>>>
> >>>>
> >>>> Support for the mechanics of importing functions from other modules,
> >>>> which can go in gradually as a set of patches since it will be off by
> >>>> default (the ThinLTO pass itself discussed below in 2f).
> >>>>
> >>>>
> >>>> Note that ThinLTO function importing is iterative, and we may import
> >>>> from a number of modules in an interleaved fashion. For example,
> >>>> assume we have hot call chains a()->b1()->c() and a()->b2()->d(),
> >>>> where functions a(), b1()/b2(), c() and d() are from modules A, B, C
> >>>> and D, respectively. When performing ThinLTO backend compilation of
> >>>> module A, we may decide to import in the following order (based on
> >>>> callsite and function summary info):
> >>>> 1. B::b1()  # exposes call to c()
> >>>> 2. C::c()
> >>>> 3. B::b2()  # exposes call to d()
> >>>> 4. D::d()
> >>>> For this reason, ThinLTO importing is different than regular LTO
> >>>> bitcode reading and linking, which reads and links in a module in its
> >>>> entirety on a single pass through each module (notice in the above
> >>>> example the imports of the two module B functions have an intervening
> >>>> import from module C). As a result, for example, the existing support
> >>>> for lazy metadata parsing that delays it until the first function is
> >>>> materialized can’t be leveraged (metadata handling is discussed more
> >>>> below in 2h). Therefore, the ThinLTO importing pass instantiates a
> new
> >>>> BitcodeReader and LTOModule object for each function we decide to
> >>>> import, parsing only what is needed and linking in just that
> function.
> >>>> This is fast and efficient as found in the prototype results shown in
> >>>> the linked EuroLLVM slides.
> >>>>
> >>>>
> >>>> Separate patches can include:
> >>>>
> >>>>
> >>>> * BitcodeReader changes to use function index to import/deserialize
> >>>> single function of interest (small changes, leverages existing lazy
> >>>> function streamer support). The declarations and other symbol table
> >>>> info in the bitcode must be reloaded, but the bitcode parsing can
> stop
> >>>> once the first function body is hit. We simply set up an entry in the
> >>>> lazy streamer’s DeferredFunctionInfo function index map from the
> >>>> bitcode index that was saved in the ThinLTO function summary (and
> >>>> therefore don’t need to build up this function index structure
> through
> >>>> repeated calls to RememberAndSkipFunctionBody via
> >>>> FindFunctionInStream).
> >>>> * Minor LTOModule changes to pass the ThinLTO function to import and
> >>>> its index into bitcode reader (see 1a for discussion on LTOModule
> >>>> use).
> >>>> * Marking of imported functions. Most handling for ThinLTO imported
> >>>> functions will simply rely on applying the appropriate linkage type.
> >>>> But it is useful to know which functions were imported, both for
> >>>> compiler debugging and and verification, and possibly to modify some
> >>>> optimization heuristics along with the summary information. This can
> >>>> be in-memory initially, but IR support may be required in order to
> >>>> support streaming bitcode out and back in again after importing.
> >>>> * ModuleLinker changes to do ThinLTO-specific symbol linking and
> >>>> static promotion when necessary. The linkage type of imported
> >>>> non-local functions and variables changes to
> >>>> AvailableExternallyLinkage, for example. Statics must be promoted in
> >>>> certain cases, and accordingly renamed in consistent ways. Read-write
> >>>> or address-taken static variables must always be promoted. Other
> >>>> discardable functions, i.e. link-once such as comdats, will be force
> >>>> imported on reference by another imported function. We are working on
> >>>> a separate design document describing these changes in more detail
> >>>> with examples, as a more detailed discussion of these changes is
> >>>> beyond the scope of this RFC.
> >>>> * GlobalDCE changes to support removing imported non-local functions
> >>>> that were not inlined and imported non-local variables, which are
> >>>> marked AvailableExternallyLinkage (very small changes to existing
> pass
> >>>> logic). As discussed in the original RFC threads, currently GlobalDCE
> >>>> does not remove referenced AvailableExternallyLinkage functions.
> >>>> Instead, these are suppressed later during code generation. It isn’t
> >>>> clear that these functions are useful past the first call to
> >>>> GlobalDCE, which is after inlining, GlobalOpt and IPSCCP (so
> >>>> presumably after inter procedural constant prop, etc). Patch with
> >>>> these changes in testing as discussed in this thread:
> >>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/085807.html.
> >>>>
> >>>>
> >>>> f. ThinLTO Import Driver SCC pass
> >>>>
> >>>>
> >>>> Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via
> >>>> an SCC pass, enabled only under the -fthinlto-be option. The pass
> >>>> includes utilizing the thin archive[2] (combined global function
> >>>> index/summary), import decision heuristics, invocation of
> >>>> LTOModule/ModuleLinker routines that perform the import, and any
> >>>> necessary callgraph updates and verification.
> >>>>
> >>>>
> >>>> g. Backend Driver
> >>>>
> >>>>
> >>>> For a single node build, the gold plugin will initially exec the
> >>>> backend processes directly, with the amount of parallelism controlled
> >>>> via an option and/or env variable. It is also possible to leverage
> >>>> existing single node build system task dispatching mechanisms such as
> >>>> Unix Makefiles, Ninja, etc., where the plugin can simply write a
> build
> >>>> file and fork the parallel backend instances directly under an
> >>>> appropriate option. We will also initially add support for our
> >>>> distributed build system as described below under 3c.
> >>>>
> >>>>
> >>>> h. Lazy Debug Metadata Linking
> >>>>
> >>>>
> >>>> The prototype implementation included lazy importing of module-level
> >>>> metadata during the ThinLTO pass finalization (i.e. after all
> function
> >>>> importing is complete). This actually applies to all module-level
> >>>> metadata, not just debug, although it is the largest. This can be
> >>>> added as a separate set of patches, and the detailed design will be
> >>>> sent with those. Includes changes to BitcodeReader, ValueMapper, and
> >>>> the ModuleLinker classes. As described in 2e, due to the
> >>>> iterative/interleaved nature of ThinLTO importing, the bitcode
> parsing
> >>>> is structured differently than LTO where a single pass over each
> >>>> module can be performed to parse and materialize all functions and
> >>>> metadata. Therefore, the lazy metadata parsing support in
> >>>> BitcodeReader, which parses all the metadata once the first function
> >>>> is materialized, are not applicable. We may instantiate a
> >>>> BitcodeReader multiple times for a module, if multiple functions are
> >>>> eventually imported, and we need a way to suture up the metadata to
> >>>> the functions imported by an earlier BitcodeReader instantiation. The
> >>>> high level summary is that during the initial import we leave the
> >>>> temporary metadata on the instructions that were imported, but save
> >>>> the index used by the bitcode reader used to correlate with the
> >>>> metadata when it is ready (i.e. the MDValuePtrs index), and skip the
> >>>> metadata parsing. During the ThinLTO pass finalization we parse just
> >>>> the metadata, and suture it up during metadata value mapping using
> the
> >>>> saved index. As mentioned earlier, this will be described in more
> >>>> detail when the patches are ready.
> >>>>
> >>>>
> >>>> 3. Stage 3: ThinLTO Tuning and Enhancements
> >>>> ---------------------------------------------------------------------
> ----
> >>>>
> >>>>
> >>>> This refers to the patches that are not required for ThinLTO to work,
> >>>> but rather to improve compile time, memory, run-time performance and
> >>>> usability.
> >>>>
> >>>>
> >>>> a. Import Tuning
> >>>>
> >>>>
> >>>> Tuning the import strategy will be an iterative process that will
> >>>> continue to be refined over time. It involves several different types
> >>>> of changes: adding support for recording additional metrics in the
> >>>> function summary, such as profile data and optional heavier-weight
> IPA
> >>>> analyses, and tuning the import heuristics based on the summary and
> >>>> callsite context.
> >>>>
> >>>>
> >>>> b. Combined Function Index Pruning
> >>>>
> >>>>
> >>>> The combined function index can be pruned of functions that are
> >>>> unlikely to benefit from being imported. For example, during the
> >>>> phase-2 thin archive plug step we can safely omit large and (with
> >>>> profile data) cold functions, which are unlikely to benefit from
> being
> >>>> inlined. Additionally, all but one copy of comdat functions can be
> >>>> suppressed.
> >>>>
> >>>>
> >>>> c. Distributed Build System Integration
> >>>>
> >>>>
> >>>> For a distributed build system such as Bazel (http://bazel.io/), the
> >>>> gold plugin should write the parallel backend invocations into a
> build
> >>>> file, including the mapping from the IR file to the real object file
> >>>> path, and exit. Additional work needs to be done in the distributed
> >>>> build system itself to distribute and dispatch the parallel backend
> >>>> jobs to the build cluster.
> >>>>
> >>>>
> >>>> d. Dependence Tracking and Incremental Compiles
> >>>>
> >>>>
> >>>> In order to support build systems that stage from local disks or
> >>>> network storage, the plugin will optionally support computation of
> >>>> dependent sets of IR files that each module may import from. This can
> >>>> be computed from profile data, if it exists, or from the symbol table
> >>>> and heuristics if not. These dependence sets also enable support for
> >>>> incremental backend compiles.
> >>>>
> >>>>
> >>>> ________________
> >>>> [1] The following compilers currently wrap intermediate LTO files in
> >>>> native object format: GCC fat and non-fat objects (with a custom
> >>>> symtab), Intel icc non-fat (IR-only) objects (with a full native
> >>>> symtab), HP’s aCC non-fat objects (with full native symtab), IBM xlC
> >>>> both fat and non-fat objects (with full native symtab).
> >>>> [2] The “thin archive” here (also referred to as a combined function
> >>>> index) has some similarities to the AR tool thin archive format, but
> >>>> is not exactly the same. Both contain the symtab and not the code,
> but
> >>>> the ThinLTO combined function index contains the summary sections as
> >>>> well.
> >>>>
> >>>> --
> >>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-
> 2413
> >>>>
> >>>> _______________________________________________
> >>>> LLVM Developers mailing list
> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-
> 2413
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> --
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev