[LLVMdev] RFC: ThinLTO Impementation Plan

Alex Rosenberg alexr at leftfield.org
Wed May 13 22:46:29 PDT 2015


"ELF-wrapped bitcode" seems potentially controversial to me.

What about ar, nm, and various ld implementations adds this requirement? What about the LLVM implementations of these tools is lacking?

Alex

> On May 13, 2015, at 7:44 PM, Teresa Johnson <tejohnson at google.com> wrote:
> 
> I've included below an RFC for implementing ThinLTO in LLVM, looking
> forward to feedback and questions.
> Thanks!
> Teresa
> 
> 
> 
> RFC to discuss plans for implementing ThinLTO upstream. Background can
> be found in slides from EuroLLVM 2015:
>   https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
> As described in the talk, we have a prototype implementation, and
> would like to start staging patches upstream. This RFC describes a
> breakdown of the major pieces. We would like to commit upstream
> gradually in several stages, with all functionality off by default.
> The core ThinLTO importing support and tuning will require frequent
> change and iteration during testing and tuning, and for that part we
> would like to commit rapidly (off by default). See the proposed staged
> implementation described in the Implementation Plan section.
> 
> 
> ThinLTO Overview
> ==============
> 
> See the talk slides linked above for more details. The following is a
> high-level overview of the motivation.
> 
> Cross Module Optimization (CMO) is an effective means for improving
> runtime performance, by extending the scope of optimizations across
> source module boundaries. Without CMO, the compiler is limited to
> optimizing within the scope of single source modules. Two solutions
> for enabling CMO are Link-Time Optimization (LTO), which is currently
> supported in LLVM and GCC, and Lightweight-Interprocedural
> Optimization (LIPO). However, each of these solutions has limitations
> that prevent it from being enabled by default. ThinLTO is a new
> approach that attempts to address these limitations, with a goal of
> being enabled more broadly. ThinLTO is designed with many of the same
> principals as LIPO, and therefore its advantages, without any of its
> inherent weakness. Unlike in LIPO where the module group decision is
> made at profile training runtime, ThinLTO makes the decision at
> compile time, but in a lazy mode that facilitates large scale
> parallelism. The serial linker plugin phase is designed to be razor
> thin and blazingly fast. By default this step only does minimal
> preparation work to enable the parallel lazy importing performed
> later. ThinLTO aims to be scalable like a regular O2 build, enabling
> CMO on machines without large memory configurations, while also
> integrating well with distributed build systems. Results from early
> prototyping on SPEC cpu2006 C++ benchmarks are in line with
> expectations that ThinLTO can scale like O2 while enabling much of the
> CMO performed during a full LTO build.
> 
> 
> A ThinLTO build is divided into 3 phases, which are referred to in the
> following implementation plan:
> 
> phase-1: IR and Function Summary Generation (-c compile)
> phase-2: Thin Linker Plugin Layer (thin archive linker step)
> phase-3: Parallel Backend with Demand-Driven Importing
> 
> 
> Implementation Plan
> ================
> 
> This section gives a high-level breakdown of the ThinLTO support that
> will be added, in roughly the order that the patches would be staged.
> The patches are divided into three stages. The first stage contains a
> minimal amount of preparation work that is not ThinLTO-specific. The
> second stage contains most of the infrastructure for ThinLTO, which
> will be off by default. The third stage includes
> enhancements/improvements/tunings that can be performed after the main
> ThinLTO infrastructure is in.
> 
> The second and third implementation stages will initially be very
> volatile, requiring a lot of iterations and tuning with large apps to
> get stabilized. Therefore it will be important to do fast commits for
> these implementation stages.
> 
> 
> 1. Stage 1: Preparation
> -------------------------------
> 
> The first planned sets of patches are enablers for ThinLTO work:
> 
> 
> a. LTO directory structure:
> 
> Restructure the LTO directory to remove circular dependence when
> ThinLTO pass added. Because ThinLTO is being implemented as a SCC pass
> within Transforms/IPO, and leverages the LTOModule class for linking
> in functions from modules, IPO then requires the LTO library. This
> creates a circular dependence between LTO and IPO. To break that, we
> need to split the lib/LTO directory/library into lib/LTO/CodeGen and
> lib/LTO/Module, containing LTOCodeGenerator and LTOModule,
> respectively. Only LTOCodeGenerator has a dependence on IPO, removing
> the circular dependence.
> 
> 
> b. ELF wrapper generation support:
> 
> Implement ELF wrapped bitcode writer. In order to more easily interact
> with tools such as $AR, $NM, and “$LD -r” we plan to emit the phase-1
> bitcode wrapped in ELF via the .llvmbc section, along with a symbol
> table. The goal is both to interact with these tools without requiring
> a plugin, and also to avoid doing partial LTO/ThinLTO across files
> linked with “$LD -r” (i.e. the resulting object file should still
> contain ELF-wrapped bitcode to enable ThinLTO at the full link step).
> I will send a separate design document for these changes, but the
> following is a high-level overview.
> 
> Support was added to LLVM for reading ELF-wrapped bitcode
> (http://reviews.llvm.org/rL218078), but there does not yet exist
> support in LLVM/Clang for emitting bitcode wrapped in ELF. I plan to
> add support for optionally generating bitcode in an ELF file
> containing a single .llvmbc section holding the bitcode. Specifically,
> the patch would add new options “emit-llvm-bc-elf” (object file) and
> corresponding “emit-llvm-elf” (textual assembly code equivalent).
> Eventually these would be automatically triggered under “-fthinlto -c”
> and “-fthinlto -S”, respectively.
> 
> Additionally, a symbol table will be generated in the ELF file,
> holding the function symbols within the bitcode. This facilitates
> handling archives of the ELF-wrapped bitcode created with $AR, since
> the archive will have a symbol table as well. The archive symbol table
> enables gold to extract and pass to the plugin the constituent
> ELF-wrapped bitcode files. To support the concatenated llvmbc section
> generated by “$LD -r”, some handling needs to be added to gold and to
> the backend driver to process each original module’s bitcode.
> 
> The function index/summary will later be added as a special ELF
> section alongside the .llvmbc sections.
> 
> 
> 2. Stage 2: ThinLTO Infrastructure
> ----------------------------------------------
> 
> The next set of patches adds the base implementation of the ThinLTO
> infrastructure, specifically those required to make ThinLTO functional
> and generate correct but not necessarily high-performing binaries. It
> also does not include support to make debug support under -g efficient
> with ThinLTO.
> 
> 
> a. Clang/LLVM/gold linker options:
> 
> An early set of clang/llvm patches is needed to provide options to
> enable ThinLTO (off by default), so that the rest of the
> implementation can be disabled by default as it is added.
> Specifically, clang options -fthinlto (used instead of -flto) will
> cause clang to invoke the phase-1 emission of LLVM bitcode and
> function summary/index on a compile step, and pass the appropriate
> option to the gold plugin on a link step. The -thinlto option will be
> added to the gold plugin and llvm-lto tool to launch the phase-2 thin
> archive step. The -thinlto option will also be added to the ‘opt’ tool
> to invoke it as a phase-3 parallel backend instance.
> 
> 
> b. Thin-archive linking support in Gold plugin and llvm-lto:
> 
> Under the new plugin option (see above), the plugin needs to perform
> the phase-2 (thin archive) link which simply emits a combined function
> map from the linked modules, without actually performing the normal
> link. Corresponding support should be added to the standalone llvm-lto
> tool to enable testing/debugging without involving the linker and
> plugin.
> 
> 
> c. ThinLTO backend support:
> 
> Support for invoking a phase-3 backend invocation (including
> importing) on a module should be added to the ‘opt’ tool under the new
> option. The main change under the option is to instantiate a Linker
> object used to manage the process of linking imported functions into
> the module, efficient read of the combined function map, and enable
> the ThinLTO import pass.
> 
> 
> d. Function index/summary support:
> 
> This includes infrastructure for writing and reading the function
> index/summary section. As noted earlier this will be encoded in a
> special ELF section within the module, alongside the .llvmbc section
> containing the bitcode. The thin archive generated by phase-2 of
> ThinLTO simply contains all of the function index/summary sections
> across the linked modules, organized for efficient function lookup.
> 
> Each function available for importing from the module contains an
> entry in the module’s function index/summary section and in the
> resulting combined function map. Each function entry contains that
> function’s offset within the bitcode file, used to efficiently locate
> and quickly import just that function. The entry also contains summary
> information (e.g. basic information determined during parsing such as
> the number of instructions in the function), that will be used to help
> guide later import decisions. Because the contents of this section
> will change frequently during ThinLTO tuning, it should also be marked
> with a version id for backwards compatibility or version checking.
> 
> 
> e. ThinLTO importing support:
> 
> Support for the mechanics of importing functions from other modules,
> which can go in gradually as a set of patches since it will be off by
> default. Separate patches can include:
> 
> - BitcodeReader changes to use function index to import/deserialize
> single function of interest (small changes, leverages existing lazy
> streamer support).
> 
> - Minor LTOModule changes to pass the ThinLTO function to import and
> its index into bitcode reader.
> 
> - Marking of imported functions (for use in ThinLTO-specific symbol
> linking and global DCE, for example). This can be in-memory initially,
> but IR support may be required in order to support streaming bitcode
> out and back in again after importing.
> 
> - ModuleLinker changes to do ThinLTO-specific symbol linking and
> static promotion when necessary. The linkage type of imported
> functions changes to AvailableExternallyLinkage, for example. Statics
> must be promoted in certain cases, and renamed in consistent ways.
> 
> - GlobalDCE changes to support removing imported functions that were
> not inlined (very small changes to existing pass logic).
> 
> 
> f. ThinLTO Import Driver SCC pass:
> 
> Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO via
> an SCC pass, enabled only under -fthinlto options. The pass includes
> utilizing the thin archive (global function index/summary), import
> decision heuristics, invocation of LTOModule/ModuleLinker routines
> that perform the import, and any necessary callgraph updates and
> verification.
> 
> 
> g. Backend Driver:
> 
> For a single node build, the gold plugin can simply write a makefile
> and fork the parallel backend instances directly via parallel make.
> 
> 
> 3. Stage 3: ThinLTO Tuning and Enhancements
> ----------------------------------------------------------------
> 
> This refers to the patches that are not required for ThinLTO to work,
> but rather to improve compile time, memory, run-time performance and
> usability.
> 
> 
> a. Lazy Debug Metadata Linking:
> 
> The prototype implementation included lazy importing of module-level
> metadata during the ThinLTO pass finalization (i.e. after all function
> importing is complete). This actually applies to all module-level
> metadata, not just debug, although it is the largest. This can be
> added as a separate set of patches. Changes to BitcodeReader,
> ValueMapper, ModuleLinker
> 
> 
> b. Import Tuning:
> 
> Tuning the import strategy will be an iterative process that will
> continue to be refined over time. It involves several different types
> of changes: adding support for recording additional metrics in the
> function summary, such as profile data and optional heavier-weight IPA
> analyses, and tuning the import heuristics based on the summary and
> callsite context.
> 
> 
> c. Combined Function Map Pruning:
> 
> The combined function map can be pruned of functions that are unlikely
> to benefit from being imported. For example, during the phase-2 thin
> archive plug step we can safely omit large and (with profile data)
> cold functions, which are unlikely to benefit from being inlined.
> Additionally, all but one copy of comdat functions can be suppressed.
> 
> 
> d. Distributed Build System Integration:
> 
> For a distributed build system, the gold plugin should write the
> parallel backend invocations into a makefile, including the mapping
> from the IR file to the real object file path, and exit. Additional
> work needs to be done in the distributed build system itself to
> distribute and dispatch the parallel backend jobs to the build
> cluster.
> 
> 
> e. Dependence Tracking and Incremental Compiles:
> 
> In order to support build systems that stage from local disks or
> network storage, the plugin will optionally support computation of
> dependent sets of IR files that each module may import from. This can
> be computed from profile data, if it exists, or from the symbol table
> and heuristics if not. These dependence sets also enable support for
> incremental backend compiles.
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list