[llvm-dev] [RFC] (Thin)LTO with Linker Scripts

Fri May 11 11:12:49 PDT 2018

RFC: (Thin)LTO with Linker Scripts

At the last US LLVM Developers' Meeting, we presented [1] a proposal for 
linker
script support in (Thin)LTO. In this RFC, I would like to describe the
proposal in more detail and invite the community's feedback, so we can build
consensus on the upstream implementation.

The end goal of this effort is to extend the benefits of (Thin)LTO, 
including
significant code size and performance improvements, to the many embedded and
system-level software projects that rely on linker scripts to control (ELF)
image layout.

In particular, this proposal seeks to:

  1. Ensure that ELF sections emitted by LTO match the same path-based 
linker
     script rules that they would have matched if the project was compiled
     without LTO.

  2. Make module optimization passes aware of the final output sections of
     symbols in order to limit inter-section (e.g. inlining) or enable
     intra-section (e.g. constant merging) optimizations where needed.

  3. Implement these features without changing the behavior of the 
compiler when
     linker script information is *not* available, particularly on 
source files
     that contain symbols carrying explicit section attributes.

This proposal only addresses changes to Clang/LLVM. The linker also 
needs to be
enhanced to support LTO with linker scripts; so far, this has only been done
for qcld, the linker shipped with the Hexagon SDK.

The proposed implementation involves small changes throughout the 
compilation
flow, so the rest of this document follows the progression from source 
file to
linking. Individual changes, which could map to patches, are marked using
"(X.Y)" to help with referencing.

Step 1: Compilation of individual files
=======================================

In order to determine the output section for symbols, the linker needs to be
able to match symbols in bitcode files to linker script rules; however, 
bitcode
does not naturally contain section names for symbols (except for those with
explicit section attributes).

(1.1) For this reason, we run a pass immediately prior to bitcode 
emission that
initializes a minimal backend and uses the backend to obtain a section 
name for
each GlobalObject. The section name is then stored in the GlobalObject's
explicit section attribute.

(1.2) ThinLTO currently marks all symbols with explicit sections as "not
eligible for import", which is overly conservative when LTO is aware of the
linker script. Since all GlobalObjects now have an explicit section, this
behavior needs to be disabled.

(1.3) Items 1.1 and 1.2 assume that the linker provides linker script
information to LTO. They should therefore not be the default behavior, 
but be
guarded under a clang flag, e.g. "-flto-ls", which is passed in addition to
-flto[=thin], when the user knows that their linker has this capability.

   Note: In the presentation, we proposed a dedicated attribute
   "linker_input_section" instead of using the explicit section attribute.
   After discussions with Peter, I believe we don't need to introduce an
   additional attribute.

Step 2: Symbol resolution in the linker
=======================================

The linker loads all bitcode input files and performs symbol resolution. The
IRSymtab already exposes the explicit section attributes for symbols; in our
case, all symbols now carry this information.

(2.1) In addition to communicating the exisiting SymbolResolution flags
(VisibileToRegularObj etc.), the linker also matches the linker script,
determines an output section for each symbol, and passes it to LTO as 
part of
the SymbolResolution data structure.

(2.2) The linker needs to determine an output section for *all* symbols,
including those with internal linkage. lto::InputFile::symbols() 
currently only
exposes external symbols, so an additional argument is added to include 
locals.

(2.3) The linker provides a unique Module Id for each input file to LTO. 
This
is necessary for the linker to later identify the file origin of each symbol
emitted from LTO.

Step 3: ThinLTO Import, (Thin)LTO Optimization
==============================================

(3.1) The information provided by the linker is stored in the IR prior to
merging (Regular LTO) and before and after importing (ThinLTO). The goal 
is for
every GlobalObject to have two additional attributes by the time 
optimizations
are run:
   - "linker_output_section": This is used for limiting/enabling 
optimizations
     based on knowledge of the eventual section placement of a symbol.
   - "module_id": This keeps track of the file origin of each symbol and 
will be
     used during CodeGen to 'tag' symbols with their origin so the 
linker can
     (re-)match the correct linker script rules after LTO.

(3.2) To reduce 'futile' importing in ThinLTO, output section 
information can
be taken into account when determining the import/export sets. For instance,
functions whose callers are in different output sections will not be inlined
(see 3.3 below), so it does not make sense to import them.

(3.3) Some optimization passes need to be modified to utilize linker script
information - in some cases to enable, and some cases to disable 
optimizations.
Passes that currently behave conservatively for GlobalObjects with explicit
section attributes can be enhanced to take output section information into
account. For instance, ConstantMergePass should merge global constants 
that are
located in the same output section. On the other hand, we need to prevent
inlining across output section boundaries, to name one example.

Step 4: Code Generation / ELF emission
======================================

The output file names produced by (Thin)LTO necessarily differ from the
linker's input files. The linker thus wouldn't be able to match these to
path-based linker script rules.

(4.1) When linker script information is available, we propose to augment the
ELF section names that symbols are emitted to with the module ID. For 
example,

    define void @myFun() section ".text.myFun"
      "linker_output_sectio"=".text" "module_id"="ABC123(f.o)" { }

would be emitted to

    .section ".text.myFun^^ABC123(f.o)","ax", at progbits

This enables the linker to then strip the part after the delimiter (^^) and
override the origin file for the symbol with the original input file for the
purpose of linker script matching.

Item 4.1 doesn't necessarily need to be implemented in 
target-independent code.
The backends can override target lowering functions responsible for ELF 
section
selection, so each backend could have its own convention of how the 
module ID
is encoded.

Conclusion
==========

This document outlined a proposal for the implementation of LTO with linker
scripts.  A variant of the described approach has been in production use for
some time. It successfully extended the benefits of LTO to a number of 
embedded
applications that would have otherwise suffered correctness issues if built
with LTO. Before we start implementing this upstream, I would appreciate 
your
comments and ideas. I am particularly interested in any linker script 
use cases
that are prevalent in projects you care about but that do not readily 
fit the
above model.

References:
  [1] Talk presented at 2017 US LLVM Developers' Meeting, San Jose, CA.
      Slides: 
http://llvm.org/devmtg/2017-10/slides/LTOLinkerScriptsEdlerVonKoch.pdf
      Video:  https://youtu.be/hhaPAKUt35E

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.