[LLVMdev] RFC: ThinLTO Impementation Plan

Xinliang David Li xinliangli at gmail.com
Thu May 14 14:28:41 PDT 2015


On Thu, May 14, 2015 at 2:09 PM, Eric Christopher <echristo at gmail.com>
wrote:

>
>
> On Thu, May 14, 2015 at 1:35 PM Teresa Johnson <tejohnson at google.com>
> wrote:
>
>> On Thu, May 14, 2015 at 1:18 PM, Eric Christopher <echristo at gmail.com>
>> wrote:
>> >
>> >
>> > On Thu, May 14, 2015 at 1:11 PM David Blaikie <dblaikie at gmail.com>
>> wrote:
>> >>
>> >> On Thu, May 14, 2015 at 12:53 PM, Eric Christopher <echristo at gmail.com
>> >
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Thu, May 14, 2015 at 11:34 AM Daniel Berlin <dberlin at dberlin.org>
>> >>> wrote:
>> >>>>
>> >>>> On Thu, May 14, 2015 at 11:14 AM, Eric Christopher <
>> echristo at gmail.com>
>> >>>> wrote:
>> >>>> > I'm not sure this is a particularly great assumption to make.
>> >>>>
>> >>>> Which part?
>> >>>
>> >>>
>> >>> The binutils part :)
>> >>>
>> >>>>
>> >>>>
>> >>>> >  We have to
>> >>>> > support a lot of different build systems and tools and
>> concentrating
>> >>>> > on
>> >>>> > something that just binutils uses isn't particularly friendly here.
>> >>>> I think you may have misunderstood
>> >>>> His point was exactly that they want to be transparent to *all of*
>> these
>> >>>> tools.
>> >>>> You are saying "we should be friendly to everyone". He is saying the
>> >>>> same thing.
>> >>>> We should be friendly to everyone. The friendly way to do this is to
>> >>>> not require all of these tools build plugins to handle bitcode.
>> >>>>
>> >>>> Hence, elf-wrapped bitcode.
>> >>>
>> >>>
>> >>> Oh, I understood. I just don't know that I agree. To do anything with
>> the
>> >>> tools will require some knowledge of bitcode anyhow or need the
>> plugin. I'm
>> >>> saying that as a baseline start we should look at how to do this
>> using the
>> >>> tools we've got rather than wrapping things for no real gain.
>> >>
>> >>
>> >> That doesn't seem strictly true - the ar situation (which I'm lead to
>> >> believe is in use in our build system & others, one would assume).
>> With the
>> >> symbol table included as proposed, ar can be used without any
>> knowledge of
>> >> the bitcode or need for a plugin.
>> >>
>> >
>> > For some bits, sure. Optimizing for ar seems a bit silly, why not 'ld
>> -r'?
>>
>> But as mentioned, ld -r can work on native object wrapped bitcode
>> without a plugin as well.
>>
>>
> How? It's not like any partial linking is going to go on inside the
> bitcode if the linker doesn't understand bitcode.
>

What do we want plugin to do anything here?  We just need the linker to
concatenate the bitcode sections and produce a combined bitcode file.


>
>
>> > Agreed. The ar situation is interesting because one thing we discussed
>> after
>> > you wandered off was just adding a ToC section to bitcode as it is and
>> then
>> > having the tools handle that. Would seem to accomplish at least the
>> goals as
>> > I've seen them up to this point without worrying too much.
>>
>> The ToC section is a way we can encode the function index/summary into
>> bitcode, but won't help integrate with existing tools. The main issue
>> we are trying to solve is integrating transparently with existing
>> binutils tools in use in our build system and probably elsewhere.
>>
>>
> Right. I'm not entirely sure what use we're going to see in the existing
> tools that we want to encompass here. There's some of it for convenience
> (i.e. nm etc for developers), but they can use a tool that understands
> bitcode and we can make the existing llvm tools suffice for these needs.
>
> I think the way of looking at this is that we can:
>
> a) go with wrapping things in native object formats, this means
>  - some tools continue to work at the cost of additional I/O and space at
> compile/link time
>

Are you sure about the additional I/O? With native symtab, existing tools
just need to read those, while plugin based approach needs to read bit code
section to feedback symbols to the tool.


>  - we still have to update some tools to work at all
>

If any, it will be minimal.


>
> b) we extend those tools/our own tools and have them be drop in
> replacements to the existing tools. They'll understand the bitcode format
> natively, they'll be smaller, and we'll be able to push the state of the
> art in tooling/analysis a bit more in the future without having to rework
> thin lto.
>
> It's basically a set of trade-offs and for llvm we've historically gone
> the b direction.
>
>
I am fine making llvm tools work with it, but we should not require/force
user using them. I think this is an orthogonal feature.

David




> >
>> > At any rate, I think this aspect of the proposal needs a bit of
>> discussion
>> > and some mapping out of the pros and cons here.
>>
>> Sure, we can continue to discuss and I will try to lay out the pros/cons.
>>
>
> Excellent.
>
> -eric
>
>
>>
>> Teresa
>>
>> >
>> > -eric
>> >
>> >>>
>> >>> I've talked to Teresa a bit offline and we're going to talk more later
>> >>> (and discuss on the list), but there are some discussions about how
>> to make
>> >>> this work either with just bitcode/llvm tools and so not requiring
>> >>> integration on all platforms. The latter is what I consider as
>> particularly
>> >>> friendly :)
>> >>>
>> >>> -eric
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> > I also
>> >>>> > can't imagine how it's necessary for any of the lto aspects as
>> >>>> > currently
>> >>>> > written in the proposal.
>> >>>> >
>> >>>> > -eric
>> >>>> >
>> >>>> > On Thu, May 14, 2015 at 9:26 AM Xinliang David Li
>> >>>> > <xinliangli at gmail.com>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> The design objective is to make thinLTO mostly transparent to
>> binutil
>> >>>> >> tools to enable easy integration with any build system in the
>> wild.
>> >>>> >> 'Pass-through' mode with 'ld -r' instead of the partial LTO mode
>> is
>> >>>> >> another
>> >>>> >> reason.
>> >>>> >>
>> >>>> >> David
>> >>>> >>
>> >>>> >> On Thu, May 14, 2015 at 7:30 AM, Teresa Johnson
>> >>>> >> <tejohnson at google.com>
>> >>>> >> wrote:
>> >>>> >>>
>> >>>> >>> On Thu, May 14, 2015 at 7:22 AM, Eric Christopher
>> >>>> >>> <echristo at gmail.com>
>> >>>> >>> wrote:
>> >>>> >>> > So, what Alex is saying is that we have these tools as well and
>> >>>> >>> > they
>> >>>> >>> > understand bitcode just fine, as well as every object format -
>> not
>> >>>> >>> > just
>> >>>> >>> > ELF.
>> >>>> >>> > :)
>> >>>> >>>
>> >>>> >>> Right, there are also LLVM specific versions (llvm-ar, llvm-nm)
>> that
>> >>>> >>> handle bitcode similarly to the way the standard tool + plugin
>> does.
>> >>>> >>> But the goal we are trying to achieve is to allow the standard
>> >>>> >>> system
>> >>>> >>> versions of the tools to handle these files without requiring a
>> >>>> >>> plugin. I know the LLVM tool handles other object formats, but
>> I'm
>> >>>> >>> not
>> >>>> >>> sure how that helps here? We're not planning to replace those
>> tools,
>> >>>> >>> just allow the standard system versions to handle the
>> intermediate
>> >>>> >>> objects produced by ThinLTO.
>> >>>> >>>
>> >>>> >>> Thanks,
>> >>>> >>> Teresa
>> >>>> >>>
>> >>>> >>> >
>> >>>> >>> > -eric
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > On Thu, May 14, 2015, 6:55 AM Teresa Johnson
>> >>>> >>> > <tejohnson at google.com>
>> >>>> >>> > wrote:
>> >>>> >>> >>
>> >>>> >>> >> On Wed, May 13, 2015 at 11:23 PM, Xinliang David Li
>> >>>> >>> >> <xinliangli at gmail.com> wrote:
>> >>>> >>> >> >
>> >>>> >>> >> >
>> >>>> >>> >> > On Wed, May 13, 2015 at 10:46 PM, Alex Rosenberg
>> >>>> >>> >> > <alexr at leftfield.org>
>> >>>> >>> >> > wrote:
>> >>>> >>> >> >>
>> >>>> >>> >> >> "ELF-wrapped bitcode" seems potentially controversial to
>> me.
>> >>>> >>> >> >>
>> >>>> >>> >> >> What about ar, nm, and various ld implementations adds this
>> >>>> >>> >> >> requirement?
>> >>>> >>> >> >> What about the LLVM implementations of these tools is
>> lacking?
>> >>>> >>> >> >
>> >>>> >>> >> >
>> >>>> >>> >> > Sorry I can not parse your questions properly. Can you make
>> it
>> >>>> >>> >> > clearer?
>> >>>> >>> >>
>> >>>> >>> >> Alex is asking what the issue is with ar, nm, ld -r and
>> regular
>> >>>> >>> >> bitcode that makes using elf-wrapped bitcode easier.
>> >>>> >>> >>
>> >>>> >>> >> The issue is that generally you need to provide a plugin to
>> these
>> >>>> >>> >> tools in order for them to understand and handle bitcode
>> files.
>> >>>> >>> >> We'd
>> >>>> >>> >> like standard tools to work without requiring a plugin as
>> much as
>> >>>> >>> >> possible. And in some cases we want them to be handled
>> different
>> >>>> >>> >> than
>> >>>> >>> >> the way bitcode files are handled with the plugin.
>> >>>> >>> >>
>> >>>> >>> >> nm: Without a plugin, normal bitcode files are inscrutable.
>> When
>> >>>> >>> >> provided the gold plugin it can emit the symbols.
>> >>>> >>> >>
>> >>>> >>> >> ar: Without a plugin, it will create an archive of bitcode
>> files,
>> >>>> >>> >> but
>> >>>> >>> >> without an index, so it can't be handled by the linker even
>> with
>> >>>> >>> >> a
>> >>>> >>> >> plugin on an -flto link. When ar is provided the gold plugin
>> it
>> >>>> >>> >> does
>> >>>> >>> >> create an index, so the linker + gold plugin handle it
>> >>>> >>> >> appropriately
>> >>>> >>> >> on an -flto link.
>> >>>> >>> >>
>> >>>> >>> >> ld -r: Without a plugin, fails when provided bitcode inputs.
>> When
>> >>>> >>> >> provided the gold plugin, it handles them but compiles them
>> all
>> >>>> >>> >> the
>> >>>> >>> >> way through to ELF executable instructions via a partial LTO
>> >>>> >>> >> link.
>> >>>> >>> >> This is where we would like to differ in behavior (while also
>> not
>> >>>> >>> >> requiring a plugin) with ELF-wrapped bitcode: we would like
>> the
>> >>>> >>> >> ld -r
>> >>>> >>> >> output file to still contain ELF-wrapped bitcode, delaying the
>> >>>> >>> >> LTO
>> >>>> >>> >> until the full link step.
>> >>>> >>> >>
>> >>>> >>> >> Let me know if that helps address your concerns.
>> >>>> >>> >>
>> >>>> >>> >> Thanks,
>> >>>> >>> >> Teresa
>> >>>> >>> >>
>> >>>> >>> >> >
>> >>>> >>> >> > David
>> >>>> >>> >> >
>> >>>> >>> >> >>
>> >>>> >>> >> >>
>> >>>> >>> >> >> Alex
>> >>>> >>> >> >>
>> >>>> >>> >> >> > On May 13, 2015, at 7:44 PM, Teresa Johnson
>> >>>> >>> >> >> > <tejohnson at google.com>
>> >>>> >>> >> >> > wrote:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > I've included below an RFC for implementing ThinLTO in
>> LLVM,
>> >>>> >>> >> >> > looking
>> >>>> >>> >> >> > forward to feedback and questions.
>> >>>> >>> >> >> > Thanks!
>> >>>> >>> >> >> > Teresa
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > RFC to discuss plans for implementing ThinLTO upstream.
>> >>>> >>> >> >> > Background
>> >>>> >>> >> >> > can
>> >>>> >>> >> >> > be found in slides from EuroLLVM 2015:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> https://drive.google.com/open?id=0B036uwnWM6RWWER1ZEl5SUNENjQ&authuser=0)
>> >>>> >>> >> >> > As described in the talk, we have a prototype
>> >>>> >>> >> >> > implementation, and
>> >>>> >>> >> >> > would like to start staging patches upstream. This RFC
>> >>>> >>> >> >> > describes
>> >>>> >>> >> >> > a
>> >>>> >>> >> >> > breakdown of the major pieces. We would like to commit
>> >>>> >>> >> >> > upstream
>> >>>> >>> >> >> > gradually in several stages, with all functionality off
>> by
>> >>>> >>> >> >> > default.
>> >>>> >>> >> >> > The core ThinLTO importing support and tuning will
>> require
>> >>>> >>> >> >> > frequent
>> >>>> >>> >> >> > change and iteration during testing and tuning, and for
>> that
>> >>>> >>> >> >> > part
>> >>>> >>> >> >> > we
>> >>>> >>> >> >> > would like to commit rapidly (off by default). See the
>> >>>> >>> >> >> > proposed
>> >>>> >>> >> >> > staged
>> >>>> >>> >> >> > implementation described in the Implementation Plan
>> section.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > ThinLTO Overview
>> >>>> >>> >> >> > ==============
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > See the talk slides linked above for more details. The
>> >>>> >>> >> >> > following
>> >>>> >>> >> >> > is a
>> >>>> >>> >> >> > high-level overview of the motivation.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Cross Module Optimization (CMO) is an effective means for
>> >>>> >>> >> >> > improving
>> >>>> >>> >> >> > runtime performance, by extending the scope of
>> optimizations
>> >>>> >>> >> >> > across
>> >>>> >>> >> >> > source module boundaries. Without CMO, the compiler is
>> >>>> >>> >> >> > limited to
>> >>>> >>> >> >> > optimizing within the scope of single source modules. Two
>> >>>> >>> >> >> > solutions
>> >>>> >>> >> >> > for enabling CMO are Link-Time Optimization (LTO), which
>> is
>> >>>> >>> >> >> > currently
>> >>>> >>> >> >> > supported in LLVM and GCC, and
>> Lightweight-Interprocedural
>> >>>> >>> >> >> > Optimization (LIPO). However, each of these solutions has
>> >>>> >>> >> >> > limitations
>> >>>> >>> >> >> > that prevent it from being enabled by default. ThinLTO
>> is a
>> >>>> >>> >> >> > new
>> >>>> >>> >> >> > approach that attempts to address these limitations,
>> with a
>> >>>> >>> >> >> > goal
>> >>>> >>> >> >> > of
>> >>>> >>> >> >> > being enabled more broadly. ThinLTO is designed with
>> many of
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > same
>> >>>> >>> >> >> > principals as LIPO, and therefore its advantages, without
>> >>>> >>> >> >> > any of
>> >>>> >>> >> >> > its
>> >>>> >>> >> >> > inherent weakness. Unlike in LIPO where the module group
>> >>>> >>> >> >> > decision
>> >>>> >>> >> >> > is
>> >>>> >>> >> >> > made at profile training runtime, ThinLTO makes the
>> decision
>> >>>> >>> >> >> > at
>> >>>> >>> >> >> > compile time, but in a lazy mode that facilitates large
>> >>>> >>> >> >> > scale
>> >>>> >>> >> >> > parallelism. The serial linker plugin phase is designed
>> to
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > razor
>> >>>> >>> >> >> > thin and blazingly fast. By default this step only does
>> >>>> >>> >> >> > minimal
>> >>>> >>> >> >> > preparation work to enable the parallel lazy importing
>> >>>> >>> >> >> > performed
>> >>>> >>> >> >> > later. ThinLTO aims to be scalable like a regular O2
>> build,
>> >>>> >>> >> >> > enabling
>> >>>> >>> >> >> > CMO on machines without large memory configurations,
>> while
>> >>>> >>> >> >> > also
>> >>>> >>> >> >> > integrating well with distributed build systems. Results
>> >>>> >>> >> >> > from
>> >>>> >>> >> >> > early
>> >>>> >>> >> >> > prototyping on SPEC cpu2006 C++ benchmarks are in line
>> with
>> >>>> >>> >> >> > expectations that ThinLTO can scale like O2 while
>> enabling
>> >>>> >>> >> >> > much
>> >>>> >>> >> >> > of
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > CMO performed during a full LTO build.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > A ThinLTO build is divided into 3 phases, which are
>> referred
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > in
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > following implementation plan:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > phase-1: IR and Function Summary Generation (-c compile)
>> >>>> >>> >> >> > phase-2: Thin Linker Plugin Layer (thin archive linker
>> step)
>> >>>> >>> >> >> > phase-3: Parallel Backend with Demand-Driven Importing
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Implementation Plan
>> >>>> >>> >> >> > ================
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > This section gives a high-level breakdown of the ThinLTO
>> >>>> >>> >> >> > support
>> >>>> >>> >> >> > that
>> >>>> >>> >> >> > will be added, in roughly the order that the patches
>> would
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > staged.
>> >>>> >>> >> >> > The patches are divided into three stages. The first
>> stage
>> >>>> >>> >> >> > contains a
>> >>>> >>> >> >> > minimal amount of preparation work that is not
>> >>>> >>> >> >> > ThinLTO-specific.
>> >>>> >>> >> >> > The
>> >>>> >>> >> >> > second stage contains most of the infrastructure for
>> >>>> >>> >> >> > ThinLTO,
>> >>>> >>> >> >> > which
>> >>>> >>> >> >> > will be off by default. The third stage includes
>> >>>> >>> >> >> > enhancements/improvements/tunings that can be performed
>> >>>> >>> >> >> > after the
>> >>>> >>> >> >> > main
>> >>>> >>> >> >> > ThinLTO infrastructure is in.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The second and third implementation stages will
>> initially be
>> >>>> >>> >> >> > very
>> >>>> >>> >> >> > volatile, requiring a lot of iterations and tuning with
>> >>>> >>> >> >> > large
>> >>>> >>> >> >> > apps to
>> >>>> >>> >> >> > get stabilized. Therefore it will be important to do fast
>> >>>> >>> >> >> > commits
>> >>>> >>> >> >> > for
>> >>>> >>> >> >> > these implementation stages.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > 1. Stage 1: Preparation
>> >>>> >>> >> >> > -------------------------------
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The first planned sets of patches are enablers for
>> ThinLTO
>> >>>> >>> >> >> > work:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > a. LTO directory structure:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Restructure the LTO directory to remove circular
>> dependence
>> >>>> >>> >> >> > when
>> >>>> >>> >> >> > ThinLTO pass added. Because ThinLTO is being implemented
>> as
>> >>>> >>> >> >> > a SCC
>> >>>> >>> >> >> > pass
>> >>>> >>> >> >> > within Transforms/IPO, and leverages the LTOModule class
>> for
>> >>>> >>> >> >> > linking
>> >>>> >>> >> >> > in functions from modules, IPO then requires the LTO
>> >>>> >>> >> >> > library.
>> >>>> >>> >> >> > This
>> >>>> >>> >> >> > creates a circular dependence between LTO and IPO. To
>> break
>> >>>> >>> >> >> > that,
>> >>>> >>> >> >> > we
>> >>>> >>> >> >> > need to split the lib/LTO directory/library into
>> >>>> >>> >> >> > lib/LTO/CodeGen
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > lib/LTO/Module, containing LTOCodeGenerator and
>> LTOModule,
>> >>>> >>> >> >> > respectively. Only LTOCodeGenerator has a dependence on
>> IPO,
>> >>>> >>> >> >> > removing
>> >>>> >>> >> >> > the circular dependence.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > b. ELF wrapper generation support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Implement ELF wrapped bitcode writer. In order to more
>> >>>> >>> >> >> > easily
>> >>>> >>> >> >> > interact
>> >>>> >>> >> >> > with tools such as $AR, $NM, and “$LD -r” we plan to emit
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > phase-1
>> >>>> >>> >> >> > bitcode wrapped in ELF via the .llvmbc section, along
>> with a
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > table. The goal is both to interact with these tools
>> without
>> >>>> >>> >> >> > requiring
>> >>>> >>> >> >> > a plugin, and also to avoid doing partial LTO/ThinLTO
>> across
>> >>>> >>> >> >> > files
>> >>>> >>> >> >> > linked with “$LD -r” (i.e. the resulting object file
>> should
>> >>>> >>> >> >> > still
>> >>>> >>> >> >> > contain ELF-wrapped bitcode to enable ThinLTO at the full
>> >>>> >>> >> >> > link
>> >>>> >>> >> >> > step).
>> >>>> >>> >> >> > I will send a separate design document for these changes,
>> >>>> >>> >> >> > but the
>> >>>> >>> >> >> > following is a high-level overview.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Support was added to LLVM for reading ELF-wrapped bitcode
>> >>>> >>> >> >> > (http://reviews.llvm.org/rL218078), but there does not
>> yet
>> >>>> >>> >> >> > exist
>> >>>> >>> >> >> > support in LLVM/Clang for emitting bitcode wrapped in
>> ELF. I
>> >>>> >>> >> >> > plan
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > add support for optionally generating bitcode in an ELF
>> file
>> >>>> >>> >> >> > containing a single .llvmbc section holding the bitcode.
>> >>>> >>> >> >> > Specifically,
>> >>>> >>> >> >> > the patch would add new options “emit-llvm-bc-elf”
>> (object
>> >>>> >>> >> >> > file)
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > corresponding “emit-llvm-elf” (textual assembly code
>> >>>> >>> >> >> > equivalent).
>> >>>> >>> >> >> > Eventually these would be automatically triggered under
>> >>>> >>> >> >> > “-fthinlto
>> >>>> >>> >> >> > -c”
>> >>>> >>> >> >> > and “-fthinlto -S”, respectively.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Additionally, a symbol table will be generated in the ELF
>> >>>> >>> >> >> > file,
>> >>>> >>> >> >> > holding the function symbols within the bitcode. This
>> >>>> >>> >> >> > facilitates
>> >>>> >>> >> >> > handling archives of the ELF-wrapped bitcode created with
>> >>>> >>> >> >> > $AR,
>> >>>> >>> >> >> > since
>> >>>> >>> >> >> > the archive will have a symbol table as well. The archive
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > table
>> >>>> >>> >> >> > enables gold to extract and pass to the plugin the
>> >>>> >>> >> >> > constituent
>> >>>> >>> >> >> > ELF-wrapped bitcode files. To support the concatenated
>> >>>> >>> >> >> > llvmbc
>> >>>> >>> >> >> > section
>> >>>> >>> >> >> > generated by “$LD -r”, some handling needs to be added to
>> >>>> >>> >> >> > gold
>> >>>> >>> >> >> > and to
>> >>>> >>> >> >> > the backend driver to process each original module’s
>> >>>> >>> >> >> > bitcode.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The function index/summary will later be added as a
>> special
>> >>>> >>> >> >> > ELF
>> >>>> >>> >> >> > section alongside the .llvmbc sections.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > 2. Stage 2: ThinLTO Infrastructure
>> >>>> >>> >> >> > ----------------------------------------------
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The next set of patches adds the base implementation of
>> the
>> >>>> >>> >> >> > ThinLTO
>> >>>> >>> >> >> > infrastructure, specifically those required to make
>> ThinLTO
>> >>>> >>> >> >> > functional
>> >>>> >>> >> >> > and generate correct but not necessarily high-performing
>> >>>> >>> >> >> > binaries. It
>> >>>> >>> >> >> > also does not include support to make debug support
>> under -g
>> >>>> >>> >> >> > efficient
>> >>>> >>> >> >> > with ThinLTO.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > a. Clang/LLVM/gold linker options:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > An early set of clang/llvm patches is needed to provide
>> >>>> >>> >> >> > options
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > enable ThinLTO (off by default), so that the rest of the
>> >>>> >>> >> >> > implementation can be disabled by default as it is added.
>> >>>> >>> >> >> > Specifically, clang options -fthinlto (used instead of
>> >>>> >>> >> >> > -flto)
>> >>>> >>> >> >> > will
>> >>>> >>> >> >> > cause clang to invoke the phase-1 emission of LLVM
>> bitcode
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > function summary/index on a compile step, and pass the
>> >>>> >>> >> >> > appropriate
>> >>>> >>> >> >> > option to the gold plugin on a link step. The -thinlto
>> >>>> >>> >> >> > option
>> >>>> >>> >> >> > will be
>> >>>> >>> >> >> > added to the gold plugin and llvm-lto tool to launch the
>> >>>> >>> >> >> > phase-2
>> >>>> >>> >> >> > thin
>> >>>> >>> >> >> > archive step. The -thinlto option will also be added to
>> the
>> >>>> >>> >> >> > ‘opt’
>> >>>> >>> >> >> > tool
>> >>>> >>> >> >> > to invoke it as a phase-3 parallel backend instance.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > b. Thin-archive linking support in Gold plugin and
>> llvm-lto:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Under the new plugin option (see above), the plugin
>> needs to
>> >>>> >>> >> >> > perform
>> >>>> >>> >> >> > the phase-2 (thin archive) link which simply emits a
>> >>>> >>> >> >> > combined
>> >>>> >>> >> >> > function
>> >>>> >>> >> >> > map from the linked modules, without actually performing
>> the
>> >>>> >>> >> >> > normal
>> >>>> >>> >> >> > link. Corresponding support should be added to the
>> >>>> >>> >> >> > standalone
>> >>>> >>> >> >> > llvm-lto
>> >>>> >>> >> >> > tool to enable testing/debugging without involving the
>> >>>> >>> >> >> > linker and
>> >>>> >>> >> >> > plugin.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > c. ThinLTO backend support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Support for invoking a phase-3 backend invocation
>> (including
>> >>>> >>> >> >> > importing) on a module should be added to the ‘opt’ tool
>> >>>> >>> >> >> > under
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > new
>> >>>> >>> >> >> > option. The main change under the option is to
>> instantiate a
>> >>>> >>> >> >> > Linker
>> >>>> >>> >> >> > object used to manage the process of linking imported
>> >>>> >>> >> >> > functions
>> >>>> >>> >> >> > into
>> >>>> >>> >> >> > the module, efficient read of the combined function map,
>> and
>> >>>> >>> >> >> > enable
>> >>>> >>> >> >> > the ThinLTO import pass.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > d. Function index/summary support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > This includes infrastructure for writing and reading the
>> >>>> >>> >> >> > function
>> >>>> >>> >> >> > index/summary section. As noted earlier this will be
>> encoded
>> >>>> >>> >> >> > in a
>> >>>> >>> >> >> > special ELF section within the module, alongside the
>> .llvmbc
>> >>>> >>> >> >> > section
>> >>>> >>> >> >> > containing the bitcode. The thin archive generated by
>> >>>> >>> >> >> > phase-2 of
>> >>>> >>> >> >> > ThinLTO simply contains all of the function index/summary
>> >>>> >>> >> >> > sections
>> >>>> >>> >> >> > across the linked modules, organized for efficient
>> function
>> >>>> >>> >> >> > lookup.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Each function available for importing from the module
>> >>>> >>> >> >> > contains an
>> >>>> >>> >> >> > entry in the module’s function index/summary section and
>> in
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > resulting combined function map. Each function entry
>> >>>> >>> >> >> > contains
>> >>>> >>> >> >> > that
>> >>>> >>> >> >> > function’s offset within the bitcode file, used to
>> >>>> >>> >> >> > efficiently
>> >>>> >>> >> >> > locate
>> >>>> >>> >> >> > and quickly import just that function. The entry also
>> >>>> >>> >> >> > contains
>> >>>> >>> >> >> > summary
>> >>>> >>> >> >> > information (e.g. basic information determined during
>> >>>> >>> >> >> > parsing
>> >>>> >>> >> >> > such as
>> >>>> >>> >> >> > the number of instructions in the function), that will be
>> >>>> >>> >> >> > used to
>> >>>> >>> >> >> > help
>> >>>> >>> >> >> > guide later import decisions. Because the contents of
>> this
>> >>>> >>> >> >> > section
>> >>>> >>> >> >> > will change frequently during ThinLTO tuning, it should
>> also
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > marked
>> >>>> >>> >> >> > with a version id for backwards compatibility or version
>> >>>> >>> >> >> > checking.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > e. ThinLTO importing support:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Support for the mechanics of importing functions from
>> other
>> >>>> >>> >> >> > modules,
>> >>>> >>> >> >> > which can go in gradually as a set of patches since it
>> will
>> >>>> >>> >> >> > be
>> >>>> >>> >> >> > off by
>> >>>> >>> >> >> > default. Separate patches can include:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - BitcodeReader changes to use function index to
>> >>>> >>> >> >> > import/deserialize
>> >>>> >>> >> >> > single function of interest (small changes, leverages
>> >>>> >>> >> >> > existing
>> >>>> >>> >> >> > lazy
>> >>>> >>> >> >> > streamer support).
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - Minor LTOModule changes to pass the ThinLTO function to
>> >>>> >>> >> >> > import
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > its index into bitcode reader.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - Marking of imported functions (for use in
>> ThinLTO-specific
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > linking and global DCE, for example). This can be
>> in-memory
>> >>>> >>> >> >> > initially,
>> >>>> >>> >> >> > but IR support may be required in order to support
>> streaming
>> >>>> >>> >> >> > bitcode
>> >>>> >>> >> >> > out and back in again after importing.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - ModuleLinker changes to do ThinLTO-specific symbol
>> linking
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > static promotion when necessary. The linkage type of
>> >>>> >>> >> >> > imported
>> >>>> >>> >> >> > functions changes to AvailableExternallyLinkage, for
>> >>>> >>> >> >> > example.
>> >>>> >>> >> >> > Statics
>> >>>> >>> >> >> > must be promoted in certain cases, and renamed in
>> consistent
>> >>>> >>> >> >> > ways.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > - GlobalDCE changes to support removing imported
>> functions
>> >>>> >>> >> >> > that
>> >>>> >>> >> >> > were
>> >>>> >>> >> >> > not inlined (very small changes to existing pass logic).
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > f. ThinLTO Import Driver SCC pass:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Adds Transforms/IPO/ThinLTO.cpp with framework for doing
>> >>>> >>> >> >> > ThinLTO
>> >>>> >>> >> >> > via
>> >>>> >>> >> >> > an SCC pass, enabled only under -fthinlto options. The
>> pass
>> >>>> >>> >> >> > includes
>> >>>> >>> >> >> > utilizing the thin archive (global function
>> index/summary),
>> >>>> >>> >> >> > import
>> >>>> >>> >> >> > decision heuristics, invocation of LTOModule/ModuleLinker
>> >>>> >>> >> >> > routines
>> >>>> >>> >> >> > that perform the import, and any necessary callgraph
>> updates
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > verification.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > g. Backend Driver:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > For a single node build, the gold plugin can simply
>> write a
>> >>>> >>> >> >> > makefile
>> >>>> >>> >> >> > and fork the parallel backend instances directly via
>> >>>> >>> >> >> > parallel
>> >>>> >>> >> >> > make.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > 3. Stage 3: ThinLTO Tuning and Enhancements
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> ----------------------------------------------------------------
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > This refers to the patches that are not required for
>> ThinLTO
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > work,
>> >>>> >>> >> >> > but rather to improve compile time, memory, run-time
>> >>>> >>> >> >> > performance
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > usability.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > a. Lazy Debug Metadata Linking:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The prototype implementation included lazy importing of
>> >>>> >>> >> >> > module-level
>> >>>> >>> >> >> > metadata during the ThinLTO pass finalization (i.e. after
>> >>>> >>> >> >> > all
>> >>>> >>> >> >> > function
>> >>>> >>> >> >> > importing is complete). This actually applies to all
>> >>>> >>> >> >> > module-level
>> >>>> >>> >> >> > metadata, not just debug, although it is the largest.
>> This
>> >>>> >>> >> >> > can be
>> >>>> >>> >> >> > added as a separate set of patches. Changes to
>> >>>> >>> >> >> > BitcodeReader,
>> >>>> >>> >> >> > ValueMapper, ModuleLinker
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > b. Import Tuning:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > Tuning the import strategy will be an iterative process
>> that
>> >>>> >>> >> >> > will
>> >>>> >>> >> >> > continue to be refined over time. It involves several
>> >>>> >>> >> >> > different
>> >>>> >>> >> >> > types
>> >>>> >>> >> >> > of changes: adding support for recording additional
>> metrics
>> >>>> >>> >> >> > in
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > function summary, such as profile data and optional
>> >>>> >>> >> >> > heavier-weight
>> >>>> >>> >> >> > IPA
>> >>>> >>> >> >> > analyses, and tuning the import heuristics based on the
>> >>>> >>> >> >> > summary
>> >>>> >>> >> >> > and
>> >>>> >>> >> >> > callsite context.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > c. Combined Function Map Pruning:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > The combined function map can be pruned of functions that
>> >>>> >>> >> >> > are
>> >>>> >>> >> >> > unlikely
>> >>>> >>> >> >> > to benefit from being imported. For example, during the
>> >>>> >>> >> >> > phase-2
>> >>>> >>> >> >> > thin
>> >>>> >>> >> >> > archive plug step we can safely omit large and (with
>> profile
>> >>>> >>> >> >> > data)
>> >>>> >>> >> >> > cold functions, which are unlikely to benefit from being
>> >>>> >>> >> >> > inlined.
>> >>>> >>> >> >> > Additionally, all but one copy of comdat functions can be
>> >>>> >>> >> >> > suppressed.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > d. Distributed Build System Integration:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > For a distributed build system, the gold plugin should
>> write
>> >>>> >>> >> >> > the
>> >>>> >>> >> >> > parallel backend invocations into a makefile, including
>> the
>> >>>> >>> >> >> > mapping
>> >>>> >>> >> >> > from the IR file to the real object file path, and exit.
>> >>>> >>> >> >> > Additional
>> >>>> >>> >> >> > work needs to be done in the distributed build system
>> itself
>> >>>> >>> >> >> > to
>> >>>> >>> >> >> > distribute and dispatch the parallel backend jobs to the
>> >>>> >>> >> >> > build
>> >>>> >>> >> >> > cluster.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > e. Dependence Tracking and Incremental Compiles:
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > In order to support build systems that stage from local
>> >>>> >>> >> >> > disks or
>> >>>> >>> >> >> > network storage, the plugin will optionally support
>> >>>> >>> >> >> > computation
>> >>>> >>> >> >> > of
>> >>>> >>> >> >> > dependent sets of IR files that each module may import
>> from.
>> >>>> >>> >> >> > This
>> >>>> >>> >> >> > can
>> >>>> >>> >> >> > be computed from profile data, if it exists, or from the
>> >>>> >>> >> >> > symbol
>> >>>> >>> >> >> > table
>> >>>> >>> >> >> > and heuristics if not. These dependence sets also enable
>> >>>> >>> >> >> > support
>> >>>> >>> >> >> > for
>> >>>> >>> >> >> > incremental backend compiles.
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > --
>> >>>> >>> >> >> > Teresa Johnson | Software Engineer |
>> tejohnson at google.com |
>> >>>> >>> >> >> > 408-460-2413
>> >>>> >>> >> >> >
>> >>>> >>> >> >> > _______________________________________________
>> >>>> >>> >> >> > LLVM Developers mailing list
>> >>>> >>> >> >> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> >>> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >>> >> >>
>> >>>> >>> >> >> _______________________________________________
>> >>>> >>> >> >> LLVM Developers mailing list
>> >>>> >>> >> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> >>> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >>> >> >
>> >>>> >>> >> >
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >> --
>> >>>> >>> >> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> >>>> >>> >> 408-460-2413
>> >>>> >>> >>
>> >>>> >>> >> _______________________________________________
>> >>>> >>> >> LLVM Developers mailing list
>> >>>> >>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> --
>> >>>> >>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> >>>> >>> 408-460-2413
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>> > _______________________________________________
>> >>>> > LLVM Developers mailing list
>> >>>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>> >
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> LLVM Developers mailing list
>> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >>>
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150514/484a6408/attachment.html>


More information about the llvm-dev mailing list