[LLVMdev] RFC: ThinLTO Impementation Plan

Xinliang David Li xinliangli at gmail.com
Thu May 14 11:53:49 PDT 2015


The end goal is the ability to turn on thin-lto as easy as turning
optimizations like -O2 or -O3 -- we want friendliness, very much :)

David


On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <echristo at gmail.com>
wrote:

> I'm not sure this is a particularly great assumption to make. We have to
> support a lot of different build systems and tools and concentrating on
> something that just binutils uses isn't particularly friendly here. I also
> can't imagine how it's necessary for any of the lto aspects as currently
> written in the proposal.
>
> -eric
>
> On Thu, May 14, 2015 at 9:26 AM Xinliang David Li <xinliangli at gmail.com>
> wrote:
>
>> The design objective is to make thinLTO mostly transparent to binutil
>> tools to enable easy integration with any build system in the wild.
>>  'Pass-through' mode with 'ld -r' instead of the partial LTO mode is
>> another reason.
>>
>> David
>>
>> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson <tejohnson at google.com>
>> wrote:
>>
>>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher <echristo at gmail.com>
>>> wrote:
>>> > So, what Alex is saying is that we have these tools as well and they
>>> > understand bitcode just fine, as well as every object format - not
>>> just ELF.
>>> > :)
>>>
>>> Right, there are also LLVM specific versions (llvm-ar, llvm-nm) that
>>> handle bitcode similarly to the way the standard tool + plugin does.
>>> But the goal we are trying to achieve is to allow the standard system
>>> versions of the tools to handle these files without requiring a
>>> plugin. I know the LLVM tool handles other object formats, but I'm not
>>> sure how that helps here? We're not planning to replace those tools,
>>> just allow the standard system versions to handle the intermediate
>>> objects produced by ThinLTO.
>>>
>>> Thanks,
>>> Teresa
>>>
>>> >
>>> > -eric
>>> >
>>> >
>>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson <tejohnson at google.com>
>>> wrote:
>>> >>
>>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li
>>> >> <xinliangli at gmail.com> wrote:
>>> >> >
>>> >> >
>>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg <
>>> alexr at leftfield.org>
>>> >> > wrote:
>>> >> >>
>>> >> >> "ELF-wrapped bitcode" seems potentially controversial to me.
>>> >> >>
>>> >> >> What about ar, nm, and various ld implementations adds this
>>> >> >> requirement?
>>> >> >> What about the LLVM implementations of these tools is lacking?
>>> >> >
>>> >> >
>>> >> > Sorry I can not parse your questions properly. Can you make it
>>> clearer?
>>> >>
>>> >> Alex is asking what the issue is with ar, nm, ld -r and regular
>>> >> bitcode that makes using elf-wrapped bitcode easier.
>>> >>
>>> >> The issue is that generally you need to provide a plugin to these
>>> >> tools in order for them to understand and handle bitcode files. We'd
>>> >> like standard tools to work without requiring a plugin as much as
>>> >> possible. And in some cases we want them to be handled different than
>>> >> the way bitcode files are handled with the plugin.
>>> >>
>>> >> nm: Without a plugin, normal bitcode files are inscrutable. When
>>> >> provided the gold plugin it can emit the symbols.
>>> >>
>>> >> ar: Without a plugin, it will create an archive of bitcode files, but
>>> >> without an index, so it can't be handled by the linker even with a
>>> >> plugin on an -flto link. When ar is provided the gold plugin it does
>>> >> create an index, so the linker + gold plugin handle it appropriately
>>> >> on an -flto link.
>>> >>
>>> >> ld -r: Without a plugin, fails when provided bitcode inputs. When
>>> >> provided the gold plugin, it handles them but compiles them all the
>>> >> way through to ELF executable instructions via a partial LTO link.
>>> >> This is where we would like to differ in behavior (while also not
>>> >> requiring a plugin) with ELF-wrapped bitcode: we would like the ld -r
>>> >> output file to still contain ELF-wrapped bitcode, delaying the LTO
>>> >> until the full link step.
>>> >>
>>> >> Let me know if that helps address your concerns.
>>> >>
>>> >> Thanks,
>>> >> Teresa
>>> >>
>>> >> >
>>> >> > David
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> Alex
>>> >> >>
>>> >> >> > On May 13, 2015, at 7:44 PM, Teresa Johnson <
>>> tejohnson at google.com>
>>> >> >> > wrote:
>>> >> >> >
>>> >> >> > I've included below an RFC for implementing ThinLTO in LLVM,
>>> looking
>>> >> >> > forward to feedback and questions.
>>> >> >> > Thanks!
>>> >> >> > Teresa
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > RFC to discuss plans for implementing ThinLTO upstream.
>>> Background
>>> >> >> > can
>>> >> >> > be found in slides from EuroLLVM 2015:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0
>>> )
>>> >> >> > As described in the talk, we have a prototype implementation, and
>>> >> >> > would like to start staging patches upstream. This RFC describes
>>> a
>>> >> >> > breakdown of the major pieces. We would like to commit upstream
>>> >> >> > gradually in several stages, with all functionality off by
>>> default.
>>> >> >> > The core ThinLTO importing support and tuning will require
>>> frequent
>>> >> >> > change and iteration during testing and tuning, and for that
>>> part we
>>> >> >> > would like to commit rapidly (off by default). See the proposed
>>> >> >> > staged
>>> >> >> > implementation described in the Implementation Plan section.
>>> >> >> >
>>> >> >> >
>>> >> >> > ThinLTO Overview
>>> >> >> > ==============
>>> >> >> >
>>> >> >> > See the talk slides linked above for more details. The following
>>> is a
>>> >> >> > high-level overview of the motivation.
>>> >> >> >
>>> >> >> > Cross Module Optimization (CMO) is an effective means for
>>> improving
>>> >> >> > runtime performance, by extending the scope of optimizations
>>> across
>>> >> >> > source module boundaries. Without CMO, the compiler is limited to
>>> >> >> > optimizing within the scope of single source modules. Two
>>> solutions
>>> >> >> > for enabling CMO are Link-Time Optimization (LTO), which is
>>> currently
>>> >> >> > supported in LLVM and GCC, and Lightweight-Interprocedural
>>> >> >> > Optimization (LIPO). However, each of these solutions has
>>> limitations
>>> >> >> > that prevent it from being enabled by default. ThinLTO is a new
>>> >> >> > approach that attempts to address these limitations, with a goal
>>> of
>>> >> >> > being enabled more broadly. ThinLTO is designed with many of the
>>> same
>>> >> >> > principals as LIPO, and therefore its advantages, without any of
>>> its
>>> >> >> > inherent weakness. Unlike in LIPO where the module group
>>> decision is
>>> >> >> > made at profile training runtime, ThinLTO makes the decision at
>>> >> >> > compile time, but in a lazy mode that facilitates large scale
>>> >> >> > parallelism. The serial linker plugin phase is designed to be
>>> razor
>>> >> >> > thin and blazingly fast. By default this step only does minimal
>>> >> >> > preparation work to enable the parallel lazy importing performed
>>> >> >> > later. ThinLTO aims to be scalable like a regular O2 build,
>>> enabling
>>> >> >> > CMO on machines without large memory configurations, while also
>>> >> >> > integrating well with distributed build systems. Results from
>>> early
>>> >> >> > prototyping on SPEC cpu2006 C++ benchmarks are in line with
>>> >> >> > expectations that ThinLTO can scale like O2 while enabling much
>>> of
>>> >> >> > the
>>> >> >> > CMO performed during a full LTO build.
>>> >> >> >
>>> >> >> >
>>> >> >> > A ThinLTO build is divided into 3 phases, which are referred to
>>> in
>>> >> >> > the
>>> >> >> > following implementation plan:
>>> >> >> >
>>> >> >> > phase-1: IR and Function Summary Generation (-c compile)
>>> >> >> > phase-2: Thin Linker Plugin Layer (thin archive linker step)
>>> >> >> > phase-3: Parallel Backend with Demand-Driven Importing
>>> >> >> >
>>> >> >> >
>>> >> >> > Implementation Plan
>>> >> >> > ================
>>> >> >> >
>>> >> >> > This section gives a high-level breakdown of the ThinLTO support
>>> that
>>> >> >> > will be added, in roughly the order that the patches would be
>>> staged.
>>> >> >> > The patches are divided into three stages. The first stage
>>> contains a
>>> >> >> > minimal amount of preparation work that is not ThinLTO-specific.
>>> The
>>> >> >> > second stage contains most of the infrastructure for ThinLTO,
>>> which
>>> >> >> > will be off by default. The third stage includes
>>> >> >> > enhancements/improvements/tunings that can be performed after the
>>> >> >> > main
>>> >> >> > ThinLTO infrastructure is in.
>>> >> >> >
>>> >> >> > The second and third implementation stages will initially be very
>>> >> >> > volatile, requiring a lot of iterations and tuning with large
>>> apps to
>>> >> >> > get stabilized. Therefore it will be important to do fast
>>> commits for
>>> >> >> > these implementation stages.
>>> >> >> >
>>> >> >> >
>>> >> >> > 1. Stage 1: Preparation
>>> >> >> > -------------------------------
>>> >> >> >
>>> >> >> > The first planned sets of patches are enablers for ThinLTO work:
>>> >> >> >
>>> >> >> >
>>> >> >> > a. LTO directory structure:
>>> >> >> >
>>> >> >> > Restructure the LTO directory to remove circular dependence when
>>> >> >> > ThinLTO pass added. Because ThinLTO is being implemented as a SCC
>>> >> >> > pass
>>> >> >> > within Transforms/IPO, and leverages the LTOModule class for
>>> linking
>>> >> >> > in functions from modules, IPO then requires the LTO library.
>>> This
>>> >> >> > creates a circular dependence between LTO and IPO. To break
>>> that, we
>>> >> >> > need to split the lib/LTO directory/library into lib/LTO/CodeGen
>>> and
>>> >> >> > lib/LTO/Module, containing LTOCodeGenerator and LTOModule,
>>> >> >> > respectively. Only LTOCodeGenerator has a dependence on IPO,
>>> removing
>>> >> >> > the circular dependence.
>>> >> >> >
>>> >> >> >
>>> >> >> > b. ELF wrapper generation support:
>>> >> >> >
>>> >> >> > Implement ELF wrapped bitcode writer. In order to more easily
>>> >> >> > interact
>>> >> >> > with tools such as $AR, $NM, and “$LD -r” we plan to emit the
>>> phase-1
>>> >> >> > bitcode wrapped in ELF via the .llvmbc section, along with a
>>> symbol
>>> >> >> > table. The goal is both to interact with these tools without
>>> >> >> > requiring
>>> >> >> > a plugin, and also to avoid doing partial LTO/ThinLTO across
>>> files
>>> >> >> > linked with “$LD -r” (i.e. the resulting object file should still
>>> >> >> > contain ELF-wrapped bitcode to enable ThinLTO at the full link
>>> step).
>>> >> >> > I will send a separate design document for these changes, but the
>>> >> >> > following is a high-level overview.
>>> >> >> >
>>> >> >> > Support was added to LLVM for reading ELF-wrapped bitcode
>>> >> >> > (http://reviews.llvm.org/rL218078), but there does not yet exist
>>> >> >> > support in LLVM/Clang for emitting bitcode wrapped in ELF. I
>>> plan to
>>> >> >> > add support for optionally generating bitcode in an ELF file
>>> >> >> > containing a single .llvmbc section holding the bitcode.
>>> >> >> > Specifically,
>>> >> >> > the patch would add new options “emit-llvm-bc-elf” (object file)
>>> and
>>> >> >> > corresponding “emit-llvm-elf” (textual assembly code equivalent).
>>> >> >> > Eventually these would be automatically triggered under
>>> “-fthinlto
>>> >> >> > -c”
>>> >> >> > and “-fthinlto -S”, respectively.
>>> >> >> >
>>> >> >> > Additionally, a symbol table will be generated in the ELF file,
>>> >> >> > holding the function symbols within the bitcode. This facilitates
>>> >> >> > handling archives of the ELF-wrapped bitcode created with $AR,
>>> since
>>> >> >> > the archive will have a symbol table as well. The archive symbol
>>> >> >> > table
>>> >> >> > enables gold to extract and pass to the plugin the constituent
>>> >> >> > ELF-wrapped bitcode files. To support the concatenated llvmbc
>>> section
>>> >> >> > generated by “$LD -r”, some handling needs to be added to gold
>>> and to
>>> >> >> > the backend driver to process each original module’s bitcode.
>>> >> >> >
>>> >> >> > The function index/summary will later be added as a special ELF
>>> >> >> > section alongside the .llvmbc sections.
>>> >> >> >
>>> >> >> >
>>> >> >> > 2. Stage 2: ThinLTO Infrastructure
>>> >> >> > ----------------------------------------------
>>> >> >> >
>>> >> >> > The next set of patches adds the base implementation of the
>>> ThinLTO
>>> >> >> > infrastructure, specifically those required to make ThinLTO
>>> >> >> > functional
>>> >> >> > and generate correct but not necessarily high-performing
>>> binaries. It
>>> >> >> > also does not include support to make debug support under -g
>>> >> >> > efficient
>>> >> >> > with ThinLTO.
>>> >> >> >
>>> >> >> >
>>> >> >> > a. Clang/LLVM/gold linker options:
>>> >> >> >
>>> >> >> > An early set of clang/llvm patches is needed to provide options
>>> to
>>> >> >> > enable ThinLTO (off by default), so that the rest of the
>>> >> >> > implementation can be disabled by default as it is added.
>>> >> >> > Specifically, clang options -fthinlto (used instead of -flto)
>>> will
>>> >> >> > cause clang to invoke the phase-1 emission of LLVM bitcode and
>>> >> >> > function summary/index on a compile step, and pass the
>>> appropriate
>>> >> >> > option to the gold plugin on a link step. The -thinlto option
>>> will be
>>> >> >> > added to the gold plugin and llvm-lto tool to launch the phase-2
>>> thin
>>> >> >> > archive step. The -thinlto option will also be added to the ‘opt’
>>> >> >> > tool
>>> >> >> > to invoke it as a phase-3 parallel backend instance.
>>> >> >> >
>>> >> >> >
>>> >> >> > b. Thin-archive linking support in Gold plugin and llvm-lto:
>>> >> >> >
>>> >> >> > Under the new plugin option (see above), the plugin needs to
>>> perform
>>> >> >> > the phase-2 (thin archive) link which simply emits a combined
>>> >> >> > function
>>> >> >> > map from the linked modules, without actually performing the
>>> normal
>>> >> >> > link. Corresponding support should be added to the standalone
>>> >> >> > llvm-lto
>>> >> >> > tool to enable testing/debugging without involving the linker and
>>> >> >> > plugin.
>>> >> >> >
>>> >> >> >
>>> >> >> > c. ThinLTO backend support:
>>> >> >> >
>>> >> >> > Support for invoking a phase-3 backend invocation (including
>>> >> >> > importing) on a module should be added to the ‘opt’ tool under
>>> the
>>> >> >> > new
>>> >> >> > option. The main change under the option is to instantiate a
>>> Linker
>>> >> >> > object used to manage the process of linking imported functions
>>> into
>>> >> >> > the module, efficient read of the combined function map, and
>>> enable
>>> >> >> > the ThinLTO import pass.
>>> >> >> >
>>> >> >> >
>>> >> >> > d. Function index/summary support:
>>> >> >> >
>>> >> >> > This includes infrastructure for writing and reading the function
>>> >> >> > index/summary section. As noted earlier this will be encoded in a
>>> >> >> > special ELF section within the module, alongside the .llvmbc
>>> section
>>> >> >> > containing the bitcode. The thin archive generated by phase-2 of
>>> >> >> > ThinLTO simply contains all of the function index/summary
>>> sections
>>> >> >> > across the linked modules, organized for efficient function
>>> lookup.
>>> >> >> >
>>> >> >> > Each function available for importing from the module contains an
>>> >> >> > entry in the module’s function index/summary section and in the
>>> >> >> > resulting combined function map. Each function entry contains
>>> that
>>> >> >> > function’s offset within the bitcode file, used to efficiently
>>> locate
>>> >> >> > and quickly import just that function. The entry also contains
>>> >> >> > summary
>>> >> >> > information (e.g. basic information determined during parsing
>>> such as
>>> >> >> > the number of instructions in the function), that will be used to
>>> >> >> > help
>>> >> >> > guide later import decisions. Because the contents of this
>>> section
>>> >> >> > will change frequently during ThinLTO tuning, it should also be
>>> >> >> > marked
>>> >> >> > with a version id for backwards compatibility or version
>>> checking.
>>> >> >> >
>>> >> >> >
>>> >> >> > e. ThinLTO importing support:
>>> >> >> >
>>> >> >> > Support for the mechanics of importing functions from other
>>> modules,
>>> >> >> > which can go in gradually as a set of patches since it will be
>>> off by
>>> >> >> > default. Separate patches can include:
>>> >> >> >
>>> >> >> > - BitcodeReader changes to use function index to
>>> import/deserialize
>>> >> >> > single function of interest (small changes, leverages existing
>>> lazy
>>> >> >> > streamer support).
>>> >> >> >
>>> >> >> > - Minor LTOModule changes to pass the ThinLTO function to import
>>> and
>>> >> >> > its index into bitcode reader.
>>> >> >> >
>>> >> >> > - Marking of imported functions (for use in ThinLTO-specific
>>> symbol
>>> >> >> > linking and global DCE, for example). This can be in-memory
>>> >> >> > initially,
>>> >> >> > but IR support may be required in order to support streaming
>>> bitcode
>>> >> >> > out and back in again after importing.
>>> >> >> >
>>> >> >> > - ModuleLinker changes to do ThinLTO-specific symbol linking and
>>> >> >> > static promotion when necessary. The linkage type of imported
>>> >> >> > functions changes to AvailableExternallyLinkage, for example.
>>> Statics
>>> >> >> > must be promoted in certain cases, and renamed in consistent
>>> ways.
>>> >> >> >
>>> >> >> > - GlobalDCE changes to support removing imported functions that
>>> were
>>> >> >> > not inlined (very small changes to existing pass logic).
>>> >> >> >
>>> >> >> >
>>> >> >> > f. ThinLTO Import Driver SCC pass:
>>> >> >> >
>>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with framework for doing ThinLTO
>>> via
>>> >> >> > an SCC pass, enabled only under -fthinlto options. The pass
>>> includes
>>> >> >> > utilizing the thin archive (global function index/summary),
>>> import
>>> >> >> > decision heuristics, invocation of LTOModule/ModuleLinker
>>> routines
>>> >> >> > that perform the import, and any necessary callgraph updates and
>>> >> >> > verification.
>>> >> >> >
>>> >> >> >
>>> >> >> > g. Backend Driver:
>>> >> >> >
>>> >> >> > For a single node build, the gold plugin can simply write a
>>> makefile
>>> >> >> > and fork the parallel backend instances directly via parallel
>>> make.
>>> >> >> >
>>> >> >> >
>>> >> >> > 3. Stage 3: ThinLTO Tuning and Enhancements
>>> >> >> > ----------------------------------------------------------------
>>> >> >> >
>>> >> >> > This refers to the patches that are not required for ThinLTO to
>>> work,
>>> >> >> > but rather to improve compile time, memory, run-time performance
>>> and
>>> >> >> > usability.
>>> >> >> >
>>> >> >> >
>>> >> >> > a. Lazy Debug Metadata Linking:
>>> >> >> >
>>> >> >> > The prototype implementation included lazy importing of
>>> module-level
>>> >> >> > metadata during the ThinLTO pass finalization (i.e. after all
>>> >> >> > function
>>> >> >> > importing is complete). This actually applies to all module-level
>>> >> >> > metadata, not just debug, although it is the largest. This can be
>>> >> >> > added as a separate set of patches. Changes to BitcodeReader,
>>> >> >> > ValueMapper, ModuleLinker
>>> >> >> >
>>> >> >> >
>>> >> >> > b. Import Tuning:
>>> >> >> >
>>> >> >> > Tuning the import strategy will be an iterative process that will
>>> >> >> > continue to be refined over time. It involves several different
>>> types
>>> >> >> > of changes: adding support for recording additional metrics in
>>> the
>>> >> >> > function summary, such as profile data and optional
>>> heavier-weight
>>> >> >> > IPA
>>> >> >> > analyses, and tuning the import heuristics based on the summary
>>> and
>>> >> >> > callsite context.
>>> >> >> >
>>> >> >> >
>>> >> >> > c. Combined Function Map Pruning:
>>> >> >> >
>>> >> >> > The combined function map can be pruned of functions that are
>>> >> >> > unlikely
>>> >> >> > to benefit from being imported. For example, during the phase-2
>>> thin
>>> >> >> > archive plug step we can safely omit large and (with profile
>>> data)
>>> >> >> > cold functions, which are unlikely to benefit from being inlined.
>>> >> >> > Additionally, all but one copy of comdat functions can be
>>> suppressed.
>>> >> >> >
>>> >> >> >
>>> >> >> > d. Distributed Build System Integration:
>>> >> >> >
>>> >> >> > For a distributed build system, the gold plugin should write the
>>> >> >> > parallel backend invocations into a makefile, including the
>>> mapping
>>> >> >> > from the IR file to the real object file path, and exit.
>>> Additional
>>> >> >> > work needs to be done in the distributed build system itself to
>>> >> >> > distribute and dispatch the parallel backend jobs to the build
>>> >> >> > cluster.
>>> >> >> >
>>> >> >> >
>>> >> >> > e. Dependence Tracking and Incremental Compiles:
>>> >> >> >
>>> >> >> > In order to support build systems that stage from local disks or
>>> >> >> > network storage, the plugin will optionally support computation
>>> of
>>> >> >> > dependent sets of IR files that each module may import from.
>>> This can
>>> >> >> > be computed from profile data, if it exists, or from the symbol
>>> table
>>> >> >> > and heuristics if not. These dependence sets also enable support
>>> for
>>> >> >> > incremental backend compiles.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Teresa Johnson | Software Engineer | tejohnson at google.com |
>>> >> >> > 408-460-2413
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > LLVM Developers mailing list
>>> >> >> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> LLVM Developers mailing list
>>> >> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>> 408-460-2413
>>> >>
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/1393b781/attachment.html>


More information about the llvm-dev mailing list