[lld] r263336 - Update the documents of the new LLD.

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 14 14:52:55 PDT 2016


So this document now covers from the design philosophy to the actual design
details. Do you have anything you want me to add?

On Mon, Mar 14, 2016 at 2:46 PM, Rafael EspĂ­ndola <
rafael.espindola at gmail.com> wrote:

> Thanks!
>
> On 11 March 2016 at 22:06, Rui Ueyama via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
> > Author: ruiu
> > Date: Sat Mar 12 00:06:40 2016
> > New Revision: 263336
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=263336&view=rev
> > Log:
> > Update the documents of the new LLD.
> >
> > This patch merges the documents for ELF and COFF into one
> > and puts it into docs directory.
> >
> > Added:
> >     lld/trunk/docs/AtomLLD.rst
> >       - copied, changed from r263292, lld/trunk/docs/index.rst
> >     lld/trunk/docs/NewLLD.rst
> > Modified:
> >     lld/trunk/COFF/README.md
> >     lld/trunk/ELF/README.md
> >     lld/trunk/docs/index.rst
> >
> > Modified: lld/trunk/COFF/README.md
> > URL:
> http://llvm.org/viewvc/llvm-project/lld/trunk/COFF/README.md?rev=263336&r1=263335&r2=263336&view=diff
> >
> ==============================================================================
> > --- lld/trunk/COFF/README.md (original)
> > +++ lld/trunk/COFF/README.md Sat Mar 12 00:06:40 2016
> > @@ -1,265 +1 @@
> > -The PE/COFF Linker
> > -==================
> > -
> > -This directory contains a linker for Windows operating system.
> > -Because the fundamental design of this port is different from
> > -the other ports of LLD, this port is separated to this directory.
> > -
> > -The linker is command-line compatible with MSVC linker and is
> > -generally 2x faster than that. It can be used to link real-world
> > -programs such as LLD itself or Clang, or even web browsers which
> > -are probably the largest open-source programs for Windows.
> > -
> > -This document is also applicable to ELF linker because the linker
> > -shares the same design as this COFF linker.
> > -
> > -Overall Design
> > ---------------
> > -
> > -This is a list of important data types in this linker.
> > -
> > -* SymbolBody
> > -
> > -  SymbolBody is a class for symbols. They may be created for symbols
> > -  in object files or in archive file headers. The linker may create
> > -  them out of nothing.
> > -
> > -  There are mainly three types of SymbolBodies: Defined, Undefined, or
> > -  Lazy. Defined symbols are for all symbols that are considered as
> > -  "resolved", including real defined symbols, COMDAT symbols, common
> > -  symbols, absolute symbols, linker-created symbols, etc. Undefined
> > -  symbols are for undefined symbols, which need to be replaced by
> > -  Defined symbols by the resolver. Lazy symbols represent symbols we
> > -  found in archive file headers -- which can turn into Defined symbols
> > -  if we read archieve members, but we haven't done that yet.
> > -
> > -* Symbol
> > -
> > -  Symbol is a pointer to a SymbolBody. There's only one Symbol for
> > -  each unique symbol name (this uniqueness is guaranteed by the symbol
> > -  table). Because SymbolBodies are created for each file
> > -  independently, there can be many SymbolBodies for the same
> > -  name. Thus, the relationship between Symbols and SymbolBodies is 1:N.
> > -
> > -  The resolver keeps the Symbol's pointer to always point to the "best"
> > -  SymbolBody. Pointer mutation is the resolve operation in this
> > -  linker.
> > -
> > -  SymbolBodies have pointers to their Symbols. That means you can
> > -  always find the best SymbolBody from any SymbolBody by following
> > -  pointers twice. This structure makes it very easy to find
> > -  replacements for symbols. For example, if you have an Undefined
> > -  SymbolBody, you can find a Defined SymbolBody for that symbol just
> > -  by going to its Symbol and then to SymbolBody, assuming the resolver
> > -  have successfully resolved all undefined symbols.
> > -
> > -* Chunk
> > -
> > -  Chunk represents a chunk of data that will occupy space in an
> > -  output. Each regular section becomes a chunk.
> > -  Chunks created for common or BSS symbols are not backed by sections.
> > -  The linker may create chunks out of nothing to append additional
> > -  data to an output.
> > -
> > -  Chunks know about their size, how to copy their data to mmap'ed
> > -  outputs, and how to apply relocations to them. Specifically,
> > -  section-based chunks know how to read relocation tables and how to
> > -  apply them.
> > -
> > -* SymbolTable
> > -
> > -  SymbolTable is basically a hash table from strings to Symbols, with
> > -  a logic to resolve symbol conflicts. It resolves conflicts by symbol
> > -  type. For example, if we add Undefined and Defined symbols, the
> > -  symbol table will keep the latter. If we add Defined and Lazy
> > -  symbols, it will keep the former. If we add Lazy and Undefined, it
> > -  will keep the former, but it will also trigger the Lazy symbol to
> > -  load the archive member to actually resolve the symbol.
> > -
> > -* OutputSection
> > -
> > -  OutputSection is a container of Chunks. A Chunk belongs to at most
> > -  one OutputSection.
> > -
> > -There are mainly three actors in this linker.
> > -
> > -* InputFile
> > -
> > -  InputFile is a superclass of file readers. We have a different
> > -  subclass for each input file type, such as regular object file,
> > -  archive file, etc. They are responsible for creating and owning
> > -  SymbolBodies and Chunks.
> > -
> > -* Writer
> > -
> > -  The writer is responsible for writing file headers and Chunks to a
> > -  file. It creates OutputSections, put all Chunks into them, assign
> > -  unique, non-overlapping addresses and file offsets to them, and then
> > -  write them down to a file.
> > -
> > -* Driver
> > -
> > -  The linking process is drived by the driver. The driver
> > -
> > -  - processes command line options,
> > -  - creates a symbol table,
> > -  - creates an InputFile for each input file and put all symbols in it
> > -    into the symbol table,
> > -  - checks if there's no remaining undefined symbols,
> > -  - creates a writer,
> > -  - and passes the symbol table to the writer to write the result to a
> > -    file.
> > -
> > -Performance
> > ------------
> > -
> > -It's generally 2x faster than MSVC link.exe. It takes 3.5 seconds to
> > -self-host on my Xeon 2580 machine. MSVC linker takes 7.0 seconds to
> > -link the same executable. The resulting output is 65MB.
> > -The old LLD is buggy that it produces 120MB executable for some reason,
> > -and it takes 30 seconds to do that.
> > -
> > -We believe the performance difference comes from simplification and
> > -optimizations we made to the new port. Notable differences are listed
> > -below.
> > -
> > -* Reduced number of relocation table reads
> > -
> > -  In the old design, relocation tables are read from beginning to
> > -  construct graphs because they consist of graph edges. In the new
> > -  design, they are not read until we actually apply relocations.
> > -
> > -  This simplification has two benefits. One is that we don't create
> > -  additional objects for relocations but instead consume relocation
> > -  tables directly. The other is that it reduces number of relocation
> > -  entries we have to read, because we won't read relocations for
> > -  dead-stripped COMDAT sections. Large C++ programs tend to consist of
> > -  lots of COMDAT sections. In the old design, the time to process
> > -  relocation table is linear to size of input. In this new model, it's
> > -  linear to size of output.
> > -
> > -* Reduced number of symbol table lookup
> > -
> > -  Symbol table lookup can be a heavy operation because number of
> > -  symbols can be very large and each symbol name can be very long
> > -  (think of C++ mangled symbols -- time to compute a hash value for a
> > -  string is linear to the length.)
> > -
> > -  We look up the symbol table exactly only once for each symbol in the
> > -  new design. This is I believe the minimum possible number. This is
> > -  achieved by the separation of Symbol and SymbolBody. Once you get a
> > -  pointer to a Symbol by looking up the symbol table, you can always
> > -  get the latest symbol resolution result by just dereferencing a
> > -  pointer. (I'm not sure if the idea is new to the linker. At least,
> > -  all other linkers I've investigated so far seem to look up hash
> > -  tables or sets more than once for each new symbol, but I may be
> > -  wrong.)
> > -
> > -* Reduced number of file visits
> > -
> > -  The symbol table implements the Windows linker semantics. We treat
> > -  the symbol table as a bucket of all known symbols, including symbols
> > -  in archive file headers. We put all symbols into one bucket as we
> > -  visit new files. That means we visit each file only once.
> > -
> > -  This is different from the Unix linker semantics, in which we only
> > -  keep undefined symbols and visit each file one by one until we
> > -  resolve all undefined symbols. In the Unix model, we have to visit
> > -  archive files many times if there are circular dependencies between
> > -  archives.
> > -
> > -* Avoiding creating additional objects or copying data
> > -
> > -  The data structures described in the previous section are all thin
> > -  wrappers for classes that LLVM libObject provides. We avoid copying
> > -  data from libObject's objects to our objects. We read much less data
> > -  than before. For example, we don't read symbol values until we apply
> > -  relocations because these values are not relevant to symbol
> > -  resolution. Again, COMDAT symbols may be discarded during symbol
> > -  resolution, so reading their attributes too early could result in a
> > -  waste. We use underlying objects directly where doing so makes
> > -  sense.
> > -
> > -Parallelism
> > ------------
> > -
> > -The abovementioned data structures are also chosen with
> > -multi-threading in mind. It should relatively be easy to make the
> > -symbol table a concurrent hash map, so that we let multiple workers
> > -work on symbol table concurrently. Symbol resolution in this design is
> > -a single pointer mutation, which allows the resolver work concurrently
> > -in a lock-free manner using atomic pointer compare-and-swap.
> > -
> > -It should also be easy to apply relocations and write chunks
> concurrently.
> > -
> > -We created an experimental multi-threaded linker using the Microsoft
> > -ConcRT concurrency library, and it was able to link itself in 0.5
> > -seconds, so we think the design is promising.
> > -
> > -Link-Time Optimization
> > -----------------------
> > -
> > -LTO is implemented by handling LLVM bitcode files as object files.
> > -The linker resolves symbols in bitcode files normally. If all symbols
> > -are successfully resolved, it then calls an LLVM libLTO function
> > -with all bitcode files to convert them to one big regular COFF file.
> > -Finally, the linker replaces bitcode symbols with COFF symbols,
> > -so that we can link the input files as if they were in the native
> > -format from the beginning.
> > -
> > -The details are described in this document.
> > -http://llvm.org/docs/LinkTimeOptimization.html
> > -
> > -Glossary
> > ---------
> > -
> > -* RVA
> > -
> > -  Short for Relative Virtual Address.
> > -
> > -  Windows executables or DLLs are not position-independent; they are
> > -  linked against a fixed address called an image base. RVAs are
> > -  offsets from an image base.
> > -
> > -  Default image bases are 0x140000000 for executables and 0x18000000
> > -  for DLLs. For example, when we are creating an executable, we assume
> > -  that the executable will be loaded at address 0x140000000 by the
> > -  loader, so we apply relocations accordingly. Result texts and data
> > -  will contain raw absolute addresses.
> > -
> > -* VA
> > -
> > -  Short for Virtual Address. Equivalent to RVA + image base. It is
> > -  rarely used. We almost always use RVAs instead.
> > -
> > -* Base relocations
> > -
> > -  Relocation information for the loader. If the loader decides to map
> > -  an executable or a DLL to a different address than their image
> > -  bases, it fixes up binaries using information contained in the base
> > -  relocation table. A base relocation table consists of a list of
> > -  locations containing addresses. The loader adds a difference between
> > -  RVA and actual load address to all locations listed there.
> > -
> > -  Note that this run-time relocation mechanism is much simpler than ELF.
> > -  There's no PLT or GOT. Images are relocated as a whole just
> > -  by shifting entire images in memory by some offsets. Although doing
> > -  this breaks text sharing, I think this mechanism is not actually bad
> > -  on today's computers.
> > -
> > -* ICF
> > -
> > -  Short for Identical COMDAT Folding.
> > -
> > -  ICF is an optimization to reduce output size by merging COMDAT
> sections
> > -  by not only their names but by their contents. If two COMDAT sections
> > -  happen to have the same metadata, actual contents and relocations,
> > -  they are merged by ICF. It is known as an effective technique,
> > -  and it usually reduces C++ program's size by a few percent or more.
> > -
> > -  Note that this is not entirely sound optimization. C/C++ require
> > -  different functions have different addresses. If a program depends on
> > -  that property, it would fail at runtime. However, that's not really an
> > -  issue on Windows because MSVC link.exe enabled the optimization by
> > -  default. As long as your program works with the linker's default
> > -  settings, your program should be safe with ICF.
> > +See docs/NewLLD.rst
> >
> > Modified: lld/trunk/ELF/README.md
> > URL:
> http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/README.md?rev=263336&r1=263335&r2=263336&view=diff
> >
> ==============================================================================
> > --- lld/trunk/ELF/README.md (original)
> > +++ lld/trunk/ELF/README.md Sat Mar 12 00:06:40 2016
> > @@ -1,34 +1 @@
> > -The New ELF Linker
> > -==================
> > -This directory contains a port of the new PE/COFF linker for ELF.
> > -
> > -Overall Design
> > ---------------
> > -See COFF/README.md for details on the design. Note that unlike COFF, we
> do not
> > -distinguish chunks from input sections; they are merged together.
> > -
> > -Capabilities
> > -------------
> > -This linker can link LLVM and Clang on Linux/x86-64 or FreeBSD/x86-64
> > -"Hello world" can be linked on Linux/PPC64 and on Linux/AArch64 or
> > -FreeBSD/AArch64.
> > -
> > -Performance
> > ------------
> > -Achieving good performance is one of our goals. It's too early to reach
> a
> > -conclusion, but we are optimistic about that as it currently seems to
> be faster
> > -than GNU gold. It will be interesting to compare when we are close to
> feature
> > -parity.
> > -
> > -Library Use
> > ------------
> > -
> > -You can embed LLD to your program by linking against it and calling the
> linker's
> > -entry point function lld::elf::link.
> > -
> > -The current policy is that it is your reponsibility to give trustworthy
> object
> > -files. The function is guaranteed to return as long as you do not pass
> corrupted
> > -or malicious object files. A corrupted file could cause a fatal error
> or SEGV.
> > -That being said, you don't need to worry too much about it if you
> create object
> > -files in a usual way and give them to the linker (it is naturally
> expected to
> > -work, or otherwise it's a linker's bug.)
> > +See docs/NewLLD.rst
> >
> > Copied: lld/trunk/docs/AtomLLD.rst (from r263292,
> lld/trunk/docs/index.rst)
> > URL:
> http://llvm.org/viewvc/llvm-project/lld/trunk/docs/AtomLLD.rst?p2=lld/trunk/docs/AtomLLD.rst&p1=lld/trunk/docs/index.rst&r1=263292&r2=263336&rev=263336&view=diff
> >
> ==============================================================================
> > --- lld/trunk/docs/index.rst (original)
> > +++ lld/trunk/docs/AtomLLD.rst Sat Mar 12 00:06:40 2016
> > @@ -1,20 +1,14 @@
> > -.. _index:
> > +ATOM-based lld
> > +==============
> >
> > -lld - The LLVM Linker
> > -=====================
> > -
> > -lld contains two linkers whose architectures are different from each
> other.
> > -One is a linker that implements native features directly.
> > -They are in `COFF` or `ELF` directories. Other directories contains the
> other
> > -implementation that is designed to be a set of modular code for creating
> > -linker tools. This document covers mainly the latter.
> > -For the former, please read README.md in `COFF` directory.
> > +ATOM-based lld is a new set of modular code for creating linker tools.
> > +Currently it supports Mach-O.
> >
> >  * End-User Features:
> >
> >    * Compatible with existing linker options
> > -  * Reads standard Object Files (e.g. ELF, Mach-O, PE/COFF)
> > -  * Writes standard Executable Files (e.g. ELF, Mach-O, PE)
> > +  * Reads standard Object Files
> > +  * Writes standard Executable Files
> >    * Remove clang's reliance on "the system linker"
> >    * Uses the LLVM `"UIUC" BSD-Style license`__.
> >
> > @@ -44,29 +38,6 @@ system assembler tool, the lld project w
> >  system linker tool.
> >
> >
> > -Current Status
> > ---------------
> > -
> > -lld can self host on x86-64 FreeBSD and Linux and x86 Windows.
> > -
> > -All SingleSource tests in test-suite pass on x86-64 Linux.
> > -
> > -All SingleSource and MultiSource tests in the LLVM test-suite
> > -pass on MIPS 32-bit little-endian Linux.
> > -
> > -Source
> > -------
> > -
> > -lld is available in the LLVM SVN repository::
> > -
> > -  svn co http://llvm.org/svn/llvm-project/lld/trunk lld
> > -
> > -lld is also available via the read-only git mirror::
> > -
> > -  git clone http://llvm.org/git/lld.git
> > -
> > -Put it in llvm's tools/ directory, rerun cmake, then build target lld.
> > -
> >  Contents
> >  --------
> >
> >
> > Added: lld/trunk/docs/NewLLD.rst
> > URL:
> http://llvm.org/viewvc/llvm-project/lld/trunk/docs/NewLLD.rst?rev=263336&view=auto
> >
> ==============================================================================
> > --- lld/trunk/docs/NewLLD.rst (added)
> > +++ lld/trunk/docs/NewLLD.rst Sat Mar 12 00:06:40 2016
> > @@ -0,0 +1,309 @@
> > +The ELF and COFF Linkers
> > +========================
> > +
> > +We started rewriting the ELF (Unix) and COFF (Windows) linkers in May
> 2015.
> > +Since then, we have been making a steady progress towards providing
> > +drop-in replacements for the system linkers.
> > +
> > +Currently, the Windows support is mostly complete and is about 2x faster
> > +than the linker that comes as a part of Micrsoft Visual Studio
> toolchain.
> > +
> > +The ELF support is in progress and is able to link large programs
> > +such as Clang or LLD itself. Unless your program depends on linker
> scripts,
> > +you can expect it to be linkable with LLD.
> > +It is currently about 1.2x to 2x faster than GNU gold linker.
> > +We aim to make it a drop-in replacement for the GNU linker.
> > +
> > +We expect that FreeBSD is going to be the first large system
> > +to adopt LLD as the system linker.
> > +We are working on it in collaboration with the FreeBSD project.
> > +
> > +The linkers are notably small; as of March 2016,
> > +the COFF linker is under 7k LOC and the ELF linker is about 10k LOC.
> > +
> > +The linkers are designed to be as fast and simple as possible.
> > +Because it is simple, it is easy to extend it to support new features.
> > +There a few key design choices that we made to achieve these goals.
> > +We will describe them in this document.
> > +
> > +The ELF Linker as a Library
> > +---------------------------
> > +
> > +You can embed LLD to your program by linking against it and calling the
> linker's
> > +entry point function lld::elf::link.
> > +
> > +The current policy is that it is your reponsibility to give trustworthy
> object
> > +files. The function is guaranteed to return as long as you do not pass
> corrupted
> > +or malicious object files. A corrupted file could cause a fatal error
> or SEGV.
> > +That being said, you don't need to worry too much about it if you
> create object
> > +files in the usual way and give them to the linker. It is naturally
> expected to
> > +work, or otherwise it's a linker's bug.
> > +
> > +Design
> > +======
> > +
> > +We will describe the design of the linkers in the rest of the document.
> > +
> > +Key Concepts
> > +------------
> > +
> > +Linkers are fairly large pieces of software.
> > +There are many design choices you have to make to create a complete
> linker.
> > +
> > +This is a list of design choices we've made for ELF and COFF LLD.
> > +We believe that these high-level design choices achieved a right balance
> > +between speed, simplicity and extensibility.
> > +
> > +* Implement as native linkers
> > +
> > +  We implemented the linkers as native linkers for each file format.
> > +
> > +  The two linkers share the same design but do not share code.
> > +  Sharing code makes sense if the benefit is worth its cost.
> > +  In our case, ELF and COFF are different enough that we thought the
> layer to
> > +  abstract the differences wouldn't worth its complexity and run-time
> cost.
> > +  Elimination of the abstract layer has greatly simplified the
> implementation.
> > +
> > +* Speed by design
> > +
> > +  One of the most important thing in archiving high performance is to
> > +  do less rather than do it efficiently.
> > +  Therefore, the high-level design matters more than local
> optimizations.
> > +  Since we are trying to create a high-performance linker,
> > +  it is very important to keep the design as efficient as possible.
> > +
> > +  Broadly speaking, we do not do anything until we have to do it.
> > +  For example, we do not read section contents or relocations
> > +  until we need them to continue linking.
> > +  When we need to do some costly operation (such as looking up
> > +  a hash table for each symbol), we do it only once.
> > +  We obtain a handler (which is typically just a pointer to actual data)
> > +  on the first operation and use it throughout the process.
> > +
> > +* Efficient archive file handling
> > +
> > +  LLD's handling of archive files (the files with ".a" file extension)
> is different
> > +  from the traditional Unix linkers and pretty similar to Windows
> linkers.
> > +  We'll describe how the traditional Unix linker handles archive files,
> > +  what the problem is, and how LLD approached the problem.
> > +
> > +  The traditional Unix linker maintains a set of undefined symbols
> during linking.
> > +  The linker visits each file in the order as they appeared in the
> command line
> > +  until the set becomes empty. What the linker would do depends on file
> type.
> > +
> > +  - If the linker visits an object file, the linker links object files
> to the result,
> > +    and undefined symbols in the object file are added to the set.
> > +
> > +  - If the linker visits an archive file, it checks for the archive
> file's symbol table
> > +    and extracts all object files that have definitions for any symbols
> in the set.
> > +
> > +  This algorithm sometimes leads to a counter-intuitive behavior.
> > +  If you give archive files before object files, nothing will happen
> > +  because when the linker visits archives, there is no undefined
> symbols in the set.
> > +  As a result, no files are extracted from the first archive file,
> > +  and the link is done at that point because the set is empty after it
> visits one file.
> > +
> > +  You can fix the problem by reordering the files,
> > +  but that cannot fix the issue of mutually-dependent archive files.
> > +
> > +  Linking mutually-dependent archive files is tricky.
> > +  You may specify the same archive file multiple times to
> > +  let the linker visit it more than once.
> > +  Or, you may use the special command line options, `-(` and `-)`,
> > +  to let the linker loop over the files between the options until
> > +  no new symbols are added to the set.
> > +
> > +  Visiting the same archive files multiple makes the linker slower.
> > +
> > +  Here is how LLD approached the problem. Instead of memorizing only
> undefined symbols,
> > +  we program LLD so that it memorizes all symbols.
> > +  When it sees an undefined symbol that can be resolved by extracting
> an object file
> > +  from an archive file it previously visited, it immediately extracts
> the file and link it.
> > +  It is doable because LLD does not forget symbols it have seen in
> archive files.
> > +
> > +  We believe that the LLD's way is efficient and easy to justify.
> > +
> > +  The semantics of LLD's archive handling is different from the
> traditional Unix's.
> > +  You can observe it if you carefully craft archive files to exploit it.
> > +  However, in reality, we don't know any program that cannot link
> > +  with our algorithm so far, so we are not too worried about the
> incompatibility.
> > +
> > +Important Data Strcutures
> > +-------------------------
> > +
> > +We will describe the key data structures in LLD in this section.
> > +The linker can be understood as the interactions between them.
> > +Once you understand their functions, the code of the linker should look
> obvious to you.
> > +
> > +* SymbolBody
> > +
> > +  SymbolBody is a class to represent symbols.
> > +  They are created for symbols in object files or archive files.
> > +  The linker creates linker-defined symbols as well.
> > +
> > +  There are basically three types of SymbolBodies: Defined, Undefined,
> or Lazy.
> > +
> > +  - Defined symbols are for all symbols that are considered as
> "resolved",
> > +    including real defined symbols, COMDAT symbols, common symbols,
> > +    absolute symbols, linker-created symbols, etc.
> > +  - Undefined symbols represent undefined symbols, which need to be
> replaced by
> > +    Defined symbols by the resolver until the link is complete.
> > +  - Lazy symbols represent symbols we found in archive file headers
> > +    which can turn into Defined if we read archieve members.
> > +
> > +* Symbol
> > +
> > +  Symbol is a pointer to a SymbolBody. There's only one Symbol for
> > +  each unique symbol name (this uniqueness is guaranteed by the symbol
> table).
> > +  Because SymbolBodies are created for each file independently,
> > +  there can be many SymbolBodies for the same name.
> > +  Thus, the relationship between Symbols and SymbolBodies is 1:N.
> > +  You can think of Symbols as handles for SymbolBodies.
> > +
> > +  The resolver keeps the Symbol's pointer to always point to the "best"
> SymbolBody.
> > +  Pointer mutation is the resolve operation of this linker.
> > +
> > +  SymbolBodies have pointers to their Symbols.
> > +  That means you can always find the best SymbolBody from
> > +  any SymbolBody by following pointers twice.
> > +  This structure makes it very easy and cheap to find replacements for
> symbols.
> > +  For example, if you have an Undefined SymbolBody, you can find a
> Defined
> > +  SymbolBody for that symbol just by going to its Symbol and then to
> SymbolBody,
> > +  assuming the resolver have successfully resolved all undefined
> symbols.
> > +
> > +* SymbolTable
> > +
> > +  SymbolTable is basically a hash table from strings to Symbols
> > +  with a logic to resolve symbol conflicts. It resolves conflicts by
> symbol type.
> > +
> > +  - If we add Undefined and Defined symbols, the symbol table will keep
> the latter.
> > +  - If we add Defined and Lazy symbols, it will keep the former.
> > +  - If we add Lazy and Undefined, it will keep the former,
> > +    but it will also trigger the Lazy symbol to load the archive member
> > +    to actually resolve the symbol.
> > +
> > +* Chunk (COFF specific)
> > +
> > +  Chunk represents a chunk of data that will occupy space in an output.
> > +  Each regular section becomes a chunk.
> > +  Chunks created for common or BSS symbols are not backed by sections.
> > +  The linker may create chunks to append additional data to an output
> as well.
> > +
> > +  Chunks know about their size, how to copy their data to mmap'ed
> outputs,
> > +  and how to apply relocations to them.
> > +  Specifically, section-based chunks know how to read relocation tables
> > +  and how to apply them.
> > +
> > +* InputSection (ELF specific)
> > +
> > +  Since we have less synthesized data for ELF, we don't abstract slices
> of
> > +  input files as Chunks for ELF. Instead, we directly use the input
> section
> > +  as an internal data type.
> > +
> > +  InputSection knows about their size and how to copy themselves to
> > +  mmap'ed outputs, just like COFF Chunks.
> > +
> > +* OutputSection
> > +
> > +  OutputSection is a container of InputSections (ELF) or Chunks (COFF).
> > +  An InputSection or Chunk belongs to at most one OutputSection.
> > +
> > +There are mainly three actors in this linker.
> > +
> > +* InputFile
> > +
> > +  InputFile is a superclass of file readers.
> > +  We have a different subclass for each input file type,
> > +  such as regular object file, archive file, etc.
> > +  They are responsible for creating and owning SymbolBodies and
> > +  InputSections/Chunks.
> > +
> > +* Writer
> > +
> > +  The writer is responsible for writing file headers and
> InputSections/Chunks to a file.
> > +  It creates OutputSections, put all InputSections/Chunks into them,
> > +  assign unique, non-overlapping addresses and file offsets to them,
> > +  and then write them down to a file.
> > +
> > +* Driver
> > +
> > +  The linking process is drived by the driver. The driver
> > +
> > +  - processes command line options,
> > +  - creates a symbol table,
> > +  - creates an InputFile for each input file and put all symbols in it
> into the symbol table,
> > +  - checks if there's no remaining undefined symbols,
> > +  - creates a writer,
> > +  - and passes the symbol table to the writer to write the result to a
> file.
> > +
> > +Link-Time Optimization
> > +----------------------
> > +
> > +LTO is implemented by handling LLVM bitcode files as object files.
> > +The linker resolves symbols in bitcode files normally. If all symbols
> > +are successfully resolved, it then calls an LLVM libLTO function
> > +with all bitcode files to convert them to one big regular ELF/COFF file.
> > +Finally, the linker replaces bitcode symbols with ELF/COFF symbols,
> > +so that we link the input files as if they were in the native
> > +format from the beginning.
> > +
> > +The details are described in this document.
> > +http://llvm.org/docs/LinkTimeOptimization.html
> > +
> > +Glossary
> > +--------
> > +
> > +* RVA (COFF)
> > +
> > +  Short for Relative Virtual Address.
> > +
> > +  Windows executables or DLLs are not position-independent; they are
> > +  linked against a fixed address called an image base. RVAs are
> > +  offsets from an image base.
> > +
> > +  Default image bases are 0x140000000 for executables and 0x18000000
> > +  for DLLs. For example, when we are creating an executable, we assume
> > +  that the executable will be loaded at address 0x140000000 by the
> > +  loader, so we apply relocations accordingly. Result texts and data
> > +  will contain raw absolute addresses.
> > +
> > +* VA
> > +
> > +  Short for Virtual Address. For COFF, it is equivalent to RVA + image
> base.
> > +
> > +* Base relocations (COFF)
> > +
> > +  Relocation information for the loader. If the loader decides to map
> > +  an executable or a DLL to a different address than their image
> > +  bases, it fixes up binaries using information contained in the base
> > +  relocation table. A base relocation table consists of a list of
> > +  locations containing addresses. The loader adds a difference between
> > +  RVA and actual load address to all locations listed there.
> > +
> > +  Note that this run-time relocation mechanism is much simpler than ELF.
> > +  There's no PLT or GOT. Images are relocated as a whole just
> > +  by shifting entire images in memory by some offsets. Although doing
> > +  this breaks text sharing, I think this mechanism is not actually bad
> > +  on today's computers.
> > +
> > +* ICF
> > +
> > +  Short for Identical COMDAT Folding (COFF) or Identical Code Folding
> (ELF).
> > +
> > +  ICF is an optimization to reduce output size by merging read-only
> sections
> > +  by not only their names but by their contents. If two read-only
> sections
> > +  happen to have the same metadata, actual contents and relocations,
> > +  they are merged by ICF. It is known as an effective technique,
> > +  and it usually reduces C++ program's size by a few percent or more.
> > +
> > +  Note that this is not entirely sound optimization. C/C++ require
> > +  different functions have different addresses. If a program depends on
> > +  that property, it would fail at runtime.
> > +
> > +  On Windows, that's not really an issue because MSVC link.exe enabled
> > +  the optimization by default. As long as your program works
> > +  with the linker's default settings, your program should be safe with
> ICF.
> > +
> > +  On Unix, your program is generally not guaranteed to be safe with ICF,
> > +  although large programs happen to work correctly.
> > +  LLD works fine with ICF for example.
> >
> > Modified: lld/trunk/docs/index.rst
> > URL:
> http://llvm.org/viewvc/llvm-project/lld/trunk/docs/index.rst?rev=263336&r1=263335&r2=263336&view=diff
> >
> ==============================================================================
> > --- lld/trunk/docs/index.rst (original)
> > +++ lld/trunk/docs/index.rst Sat Mar 12 00:06:40 2016
> > @@ -4,55 +4,12 @@ lld - The LLVM Linker
> >  =====================
> >
> >  lld contains two linkers whose architectures are different from each
> other.
> > -One is a linker that implements native features directly.
> > -They are in `COFF` or `ELF` directories. Other directories contains the
> other
> > -implementation that is designed to be a set of modular code for creating
> > -linker tools. This document covers mainly the latter.
> > -For the former, please read README.md in `COFF` directory.
> >
> > -* End-User Features:
> > -
> > -  * Compatible with existing linker options
> > -  * Reads standard Object Files (e.g. ELF, Mach-O, PE/COFF)
> > -  * Writes standard Executable Files (e.g. ELF, Mach-O, PE)
> > -  * Remove clang's reliance on "the system linker"
> > -  * Uses the LLVM `"UIUC" BSD-Style license`__.
> > -
> > -* Applications:
> > -
> > -  * Modular design
> > -  * Support cross linking
> > -  * Easy to add new CPU support
> > -  * Can be built as static tool or library
> > -
> > -* Design and Implementation:
> > -
> > -  * Extensive unit tests
> > -  * Internal linker model can be dumped/read to textual format
> > -  * Additional linking features can be plugged in as "passes"
> > -  * OS specific and CPU specific code factored out
> > -
> > -Why a new linker?
> > ------------------
> > -
> > -The fact that clang relies on whatever linker tool you happen to have
> installed
> > -means that clang has been very conservative adopting features which
> require a
> > -recent linker.
> > -
> > -In the same way that the MC layer of LLVM has removed clang's reliance
> on the
> > -system assembler tool, the lld project will remove clang's reliance on
> the
> > -system linker tool.
> > -
> > -
> > -Current Status
> > ---------------
> > -
> > -lld can self host on x86-64 FreeBSD and Linux and x86 Windows.
> > -
> > -All SingleSource tests in test-suite pass on x86-64 Linux.
> > +.. toctree::
> > +   :maxdepth: 1
> >
> > -All SingleSource and MultiSource tests in the LLVM test-suite
> > -pass on MIPS 32-bit little-endian Linux.
> > +   NewLLD
> > +   AtomLLD
> >
> >  Source
> >  ------
> > @@ -66,25 +23,3 @@ lld is also available via the read-only
> >    git clone http://llvm.org/git/lld.git
> >
> >  Put it in llvm's tools/ directory, rerun cmake, then build target lld.
> > -
> > -Contents
> > ---------
> > -
> > -.. toctree::
> > -   :maxdepth: 2
> > -
> > -   design
> > -   getting_started
> > -   ReleaseNotes
> > -   development
> > -   windows_support
> > -   open_projects
> > -   sphinx_intro
> > -
> > -Indices and tables
> > -------------------
> > -
> > -* :ref:`genindex`
> > -* :ref:`search`
> > -
> > -__ http://llvm.org/docs/DeveloperPolicy.html#license
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160314/59f1fbad/attachment.html>


More information about the llvm-commits mailing list