[llvm-commits] [lld] r154207 - in /lld/trunk/docs: _templates/index.html contents.rst design.rst hello.png intro.rst
Daniel Dunbar
daniel at zuster.org
Fri Apr 6 14:02:24 PDT 2012
Author: ddunbar
Date: Fri Apr 6 16:02:24 2012
New Revision: 154207
URL: http://llvm.org/viewvc/llvm-project?rev=154207&view=rev
Log:
docs: Import existing www content into Sphinx.
Added:
lld/trunk/docs/design.rst
lld/trunk/docs/hello.png
Modified:
lld/trunk/docs/_templates/index.html
lld/trunk/docs/contents.rst
lld/trunk/docs/intro.rst
Modified: lld/trunk/docs/_templates/index.html
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/docs/_templates/index.html?rev=154207&r1=154206&r2=154207&view=diff
==============================================================================
--- lld/trunk/docs/_templates/index.html (original)
+++ lld/trunk/docs/_templates/index.html Fri Apr 6 16:02:24 2012
@@ -1,23 +1,56 @@
{% extends "layout.html" %}
{% set title = 'lld' %}
{% block body %}
-<h1>lld - The LLVM Linker</h1>
+<h1>lld: a linker for LLVM</h1>
<p>
- lld is LLVM's linker.
+ lld is a new set of moduler code for creating linker tools.
</p>
+<h2 id="goals">Features and Goals</h2>
+<p><b>End-User Features:</b></p>
+<ul>
+ <li>Compatible with existing linker options</li>
+ <li>Reads standard Object Files (e.g. ELF, mach-o, PE/COFF)</li>
+ <li>Writes standard Executable Files (e.g. ELF, mach-o, PE)</li>
+ <li>Fast link times</li>
+ <li>Minimal memory use</li>
+ <li>Remove clang's reliance on "the system linker"</li>
+ <li>Uses the LLVM 'BSD' License</li>
+</ul>
+
+<p><b>Applications:</b></p>
+<ul>
+ <li>Modular design</li>
+ <li>Support cross linking</li>
+ <li>Easy to add new CPU support</li>
+ <li>Can be built as static tool or library</li>
+</ul>
+
+<p><b>Design and Implementation:</b></p>
+<ul>
+ <li>Extensive unit tests</li>
+ <li>Internal linker model can be dumped/read to textual format</li>
+ <li>Internal linker model can be dumped/read to new native format</li>
+ <li>Native format designed to be fast to read and write</li>
+ <li>Additional linking features can be plugged in as "passes"</li>
+ <li>OS specific and CPU specific code factored out</li>
+</ul>
+
+For more information, see the <a href="{{pathto('intro')}}">introduction</a>
+available as part of the <i>lld</i> documentation below.
+
<h2>Documentation</h2>
<table class="contentstable" align="center" style="margin-left: 30px">
<tr>
<td width="50%">
- <p class="biglink"><a class="biglink" href="{{ pathto("contents") }}">
+ <p class="biglink"><a class="biglink" href="{{ pathto('contents') }}">
Contents</a><br/>
<span class="linkdescr">for a complete overview</span></p>
- <p class="biglink"><a class="biglink" href="{{ pathto("search") }}">
+ <p class="biglink"><a class="biglink" href="{{ pathto('search') }}">
Search page</a><br/>
<span class="linkdescr">search the documentation</span></p>
- <p class="biglink"><a class="biglink" href="{{ pathto("genindex") }}">
+ <p class="biglink"><a class="biglink" href="{{ pathto('genindex') }}">
General Index</a><br/>
<span class="linkdescr">all functions, classes, terms</span></p>
</td></tr>
Modified: lld/trunk/docs/contents.rst
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/docs/contents.rst?rev=154207&r1=154206&r2=154207&view=diff
==============================================================================
--- lld/trunk/docs/contents.rst (original)
+++ lld/trunk/docs/contents.rst Fri Apr 6 16:02:24 2012
@@ -7,6 +7,7 @@
:maxdepth: 2
intro
+ design
Indices and tables
==================
Added: lld/trunk/docs/design.rst
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/docs/design.rst?rev=154207&view=auto
==============================================================================
--- lld/trunk/docs/design.rst (added)
+++ lld/trunk/docs/design.rst Fri Apr 6 16:02:24 2012
@@ -0,0 +1,405 @@
+.. _design:
+
+Linker Design
+=============
+
+Introduction
+------------
+
+lld is a new generation of linker. It is not "section" based like traditional
+linkers which mostly just interlace sections from multiple object files into the
+output file. Instead, lld is based on "Atoms". Traditional section based
+linking work well for simple linking, but their model makes advanced linking
+features difficult to implement. Features like dead code stripping, reordering
+functions for locality, and C++ coalescing require the linker to work at a finer
+grain.
+
+An atom is an indivisible chunk of code or data. An atom has a set of
+attributes, such as: name, scope, content-type, alignment, etc. An atom also
+has a list of References. A Reference contains: a kind, an optional offset, an
+optional addend, and an optional target atom.
+
+The Atom model allows the linker to use standard graph theory models for linking
+data structures. Each atom is a node, and each Reference is an edge. The
+feature of dead code stripping is implemented by following edges to mark all
+live atoms, and then delete the non-live atoms.
+
+
+Atom Model
+----------
+
+An atom is an indivisible chuck of code or data. Typically each user written
+function or global variable is an atom. In addition, the compiler may emit
+other atoms, such as for literal c-strings or floating point constants, or for
+runtime data structures like dwarf unwind info or pointers to initializers.
+
+A simple "hello world" object file would be modeled like this:
+
+.. image:: hello.png
+
+There are three atoms: main, a proxy for printf, and an anonymous atom
+containing the c-string literal "hello world". The Atom "main" has two
+references. One is the call site for the call to printf, and the other is a
+refernce for the instruction that loads the address of the c-string literal.
+
+File Model
+----------
+
+The linker views the input files as basically containers of Atoms and
+References, and just a few attributes of their own. The linker works with three
+kinds of files: object files, static libraries, and dynamic shared libraries.
+Each kind of file has reader object which presents the file in the model
+expected by the linker.
+
+Object File
+~~~~~~~~~~~
+
+An object file is just a container of atoms. When linking an object file, a
+reader is instantiated which parses the object file and instantiates a set of
+atoms representing all content in the .o file. The linker adds all those atoms
+to a master graph.
+
+Static Library (Archive)
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+This is the traditional unix static archive which is just a collection of object
+files with a "table of contents". When linking with a static library, by default
+nothing is added to the master graph of atoms. Instead, if after merging all
+atoms from object files into a master graph, if any "undefined" atoms are left
+remaining in the master graph, the linker reads the table of contents for each
+static library to see if any have the needed definitions. If so, the set of
+atoms from the specified object file in the static library is added to the
+master graph of atoms.
+
+Dynamic Library (Shared Object)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Dynamic libraries are different than object files and static libraries in that
+they don't directly add any content. Their purpose is to check at build time
+that the remaining undefined references can be resolved at runtime, and provide
+a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. The way
+this is modeled in the linker is that a dynamic library contributes no atoms to
+the initial graph of atoms. Instead, (like static libraries) if there are
+"undefined" atoms in the master graph of all atoms, then each dynamic library is
+checked to see if exports the required symbol. If so, a "shared library" atom is
+instantiated by the by the reader which the linker uses to replace the
+"undefined" atom.
+
+Linking Steps
+-------------
+
+Through the use of abstract Atoms, the core of linking is architecture
+independent and file format independent. All command line parsing is factored
+out into a separate "options" abstraction which enables the linker to be driven
+with different command line sets.
+
+The overall steps in linking are:
+
+ #. Command line processing
+
+ #. Parsing input files
+
+ #. Resolving
+
+ #. Passes/Optimizations
+
+ #. Generate output file
+
+The Resolving and Passes steps are done purely on the master graph of atoms, so
+they have no notion of file formats such as mach-o or ELF.
+
+Resolving
+~~~~~~~~~
+
+The resolving step takes all the atoms graphs from each object file and combines
+them into one master object graph. Unfortunately, it is not as simple as
+appending the atom list from each file into one big list. There are many cases
+where atoms need to be coalesced. That is, two or more atoms need to be
+coalesced into one atom. This is necessary to support: C language "tentative
+definitions", C++ weak symbols for templates and inlines defined in headers,
+replacing undefined atoms with actual definition atoms, and for merging copies
+of constants like c-strings and floating point constants.
+
+The linker support coalescing by-name and by-content. By-name is used for
+tentative definitions and weak symbols. By-content is used for constant data
+that can be merged.
+
+The resolving process maintains some global linking "state", including a "symbol
+table" which is a map from llvm::StringRef to lld::Atom*. With these data
+structures, the linker iterates all atoms in all input files. F or each atom, it
+checks if the atom is named and has a global or hidden scope. If so, the atom
+is added to the symbol table map. If there already is a matching atom in that
+table, that means the current atom needs to be coalesced with the found atom, or
+it is a multiple definition error.
+
+When all initial input file atoms have been processed by the resolver, a scan is
+made to see if there are any undefined atoms in the graph. If there are, the
+linker scans all libraries (both static and dynamic) looking for definitions to
+replace the undefined atoms. It is an error if any undefined atoms are left
+remaining.
+
+Dead code stripping (if requested) is done at the end of resolving. The linker
+does a simple mark-and-sweep. It starts with "root" atoms (like "main" in a main
+executable) and follows each references and marks each Atom that it visits as
+"live". When done, all atoms not marked "live" are removed.
+
+The result of the Resolving phase is the creation of an lld::File object. The
+goal is that the lld::File model is <b>the</b> internal representation
+throughout the linker. The file readers parse (mach-o, ELF, COFF) into an
+lld::File. The file writers (mach-o, ELF, COFF) taken an lld::File and produce
+their file kind, and every Pass only operates on an lld::File. This is not only
+a simpler, consistent model, but it enables the state of the linker to be dumped
+at any point in the link for testing purposes.
+
+
+Passes
+~~~~~~
+
+The Passes step is an open ended set of routines that each get a change to
+modify or enhance the current lld::File object. Some example Passes are:
+
+ * stub (PLT) generation
+
+ * GOT instantiation
+
+ * order_file optimization
+
+ * branch island generation
+
+ * branch shim generation
+
+ * Objective-C optimizations (Darwin specific)
+
+ * TLV instantiation (Darwin specific)
+
+ * dtrace probe processing (Darwin specific)
+
+ * compact unwind encoding (Darwin specific)
+
+
+Some of these passes are specific to Darwin's runtime environments. But many of
+the passes are applicable to any OS (such as generating branch island for out of
+range branch instructions).
+
+The general structure of a pass is to iterate through the atoms in the current
+lld::File object, inspecting each atom and doing something. For instance, the
+stub pass, looks for call sites to shared library atoms (e.g. call to printf).
+It then instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for
+each proxy atom needed, and these new atoms are added to the current lld::File
+object. Next, all the noted call sites to shared library atoms have their
+References altered to point to the stub atom instead of the shared library atom.
+
+Generate Output File
+~~~~~~~~~~~~~~~~~~~~
+
+Once the passes are done, the output file writer is given current lld::File
+object. The writer's job is to create the executable content file wrapper and
+place the content of the atoms into it.
+
+lld::File representations
+-------------------------
+
+Just as LLVM has three representations of its IR model, lld has three
+representations of its File/Atom/Reference model:
+
+ * In memory, abstract C++ classes (lld::Atom, lld::Reference, and lld::File).
+
+ * textual (in YAML)
+
+ * binary format ("native")
+
+Binary File Format
+~~~~~~~~~~~~~~~~~~
+
+In theory, lld::File objects could be written to disk in an existing Object File
+format standard (e.g. ELF). Instead we choose to define a new binary file
+format. There are two main reasons for this: fidelity and performance. In order
+for lld to work as a linker on all platforms, its internal model must be rich
+enough to model all CPU and OS linking features. But if we choose an existing
+Object File format as the lld binary format, that means an on going need to
+retrofit each platform specific feature needed from alternate platforms into the
+existing Object File format. Having our own "native" binary format side steps
+that issue. We still need to be able to binary encode all the features, but
+once the in-memory model can represent the feature, it is straight forward to
+binary encode it.
+
+The reason to use a binary file format at all, instead of a textual file format,
+is speed. You want the binary format to be as fast as possible to read into the
+in-memory model. Given that we control the in-memory model and the binary
+format, the obvious way to make reading super fast it to make the file format be
+basically just an array of atoms. The reader just mmaps in the file and looks
+at the header to see how many atoms there are and instantiate that many atom
+objects with the atom attribute information coming from that array. The trick
+is designing this in a way that can be extended as the Atom mode evolves and new
+attributes are added.
+
+The native object file format starts with a header that lists how many "chunks"
+are in the file. A chunk is an array of "ivar data". The native file reader
+instantiates an array of Atom objects (with one large malloc call). Each atom
+contains just a pointer to its vtable and a pointer to its ivar data. All
+methods on lld::Atom are virtual, so all the method implementations return
+values based on the ivar data to which it has a pointer. If a new linking
+features is added which requires a change to the lld::Atom model, a new native
+reader class (e.g. version 2) is defined which knows how to read the new feature
+information from the new ivar data. The old reader class (e.g. version 1) is
+updated to do its best to model (the lack of the new feature) given the old ivar
+data in existing native object files.
+
+With this model for the native file format, files can be read and turned
+into the in-memory graph of lld::Atoms with just a few memory allocations.
+And the format can easily adapt over time to new features
+
+
+Textual representations in YAML
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In designing a textual format we want something easy for humans to read and easy
+for the linker to parse. Since an atom has lots of attributes most of which are
+usually just the default, we should define default values for every attribute so
+that those can be omitted from the text representation. Here is the atoms for a
+simple hello world program expressed in YAML::
+
+ target-triple: x86_64-apple-darwin11
+
+ atoms:
+ - name: _main
+ scope: global
+ type: code
+ content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
+ 00, 00, 31, c0, 5d, c3 ]
+ fixups:
+ - offset: 07
+ kind: pcrel32
+ target: 2
+ - offset: 0E
+ kind: call32
+ target: _fprintf
+
+ - type: c-string
+ content: [ 73, 5A, 00 ]
+
+ ...
+
+The biggest use for the textual format will be writing test cases. Writing test
+cases in C is problematic because the compiler may vary its output over time for
+its own optimization reasons which my inadvertently disable or break the linker
+feature trying to be tested. By writing test cases in the linkers own textual
+format, we can exactly specify every attribute of every atom and thus target
+specific linker logic.
+
+Testing
+~~~~~~~
+
+The lld project contains a test suite which is being built up as new code is
+added to lld. All new lld functionality should have a tests added to the test
+suite. The test suite is `lit <http://llvm.org/cmds/lit.html/>`_ driven. Each
+test is a text file with comments telling lit how to run the test and check the
+result To facilitate testing, the lld project builds a tool called lld-core.
+This tool reads a YAML file (default from stdin), parses it into one or more
+lld::File objects in memory and then feeds those lld::File objects to the
+resolver phase. The output of the resolver is written as a native object file.
+It is then read back in using the native object file reader and then pass to the
+YAML writer. This round-about path means that all three representations
+(in-memory, binary, and text) are exercised, and any new feature has to work in
+all the representations to pass the test.
+
+
+Resolver testing
+~~~~~~~~~~~~~~~~
+
+Basic testing is the "core linking" or resolving phase. That is where the
+linker merges object files. All test cases are written in YAML. One feature of
+YAML is that it allows multiple "documents" to be encoding in one YAML stream.
+That means one text file can appear to the linker as multiple .o files - the
+normal case for the linker.
+
+Here is a simple example of a core linking test case. It checks that an
+undefined atom from one file will be replaced by a definition from another
+file::
+
+ # RUN: lld-core %s | FileCheck %s
+
+ #
+ # Test that undefined atoms are replaced with defined atoms.
+ #
+
+ ---
+ atoms:
+ - name: foo
+ definition: undefined
+ ---
+ atoms:
+ - name: foo
+ scope: global
+ type: code
+ ...
+
+ # CHECK: name: foo
+ # CHECK: scope: global
+ # CHECK: type: code
+ # CHECK-NOT: name: foo
+ # CHECK: ...
+
+
+Passes testing
+~~~~~~~~~~~~~~
+
+Since Passes just operate on an lld::File object, the lld-core tool has the
+option to run a particular pass (after resolving). Thus, you can write a YAML
+test case with carefully crafted input to exercise areas of a Pass and the check
+the resulting lld::File object as represented in YAML.
+
+
+Design Issues
+-------------
+
+There are a number of open issues in the design of lld. The plan is to wait and
+make these design decisions when we need to.
+
+
+Debug Info
+~~~~~~~~~~
+
+Currently, the lld model says nothing about debug info. But the most popular
+debug format is DWARF and there is some impedance mismatch with the lld model
+and DWARF. In lld there are just Atoms and only Atoms that need to be in a
+special section at runtime have an associated section. Also, Atoms do not have
+addresses. The way DWARF is spec'ed different parts of DWARF are supposed to go
+into specially named sections and the DWARF references function code by address.
+
+CPU and OS specific functionality
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently, lld has an abstract "Platform" that deals with any CPU or OS specific
+differences in linking. We just keep adding virtual methods to the base
+Platform class as we find linking areas that might need customization. At some
+point we'll need to structure this better.
+
+
+File Attributes
+~~~~~~~~~~~~~~~
+
+Currently, lld::File just has a path and a way to iterate its atoms. We will
+need to add mores attributes on a File. For example, some equivalent to the
+target triple. There is also a number of cached or computed attributes that
+could make various Passes more efficient. For instance, on Darwin there are a
+number of Objective-C optimizations that can be done by a Pass. But it would
+improve the plain C case if the Objective-C optimization Pass did not have to
+scan all atoms looking for any Objective-C data structures. This could be done
+if the lld::File object had an attribute that said if the file had any
+Objective-C data in it. The Resolving phase would then be required to "merge"
+that attribute as object files are added.
+
+
+Command Line Processing
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Eventually, we may want this linker to be able to be a drop in replacement
+linker for existing linker tools. That means being able to handle command line
+arguments for different platforms (e.g. darwin or linux). Currently, there is
+no command line processing code in lld. If clang winds up incorporating the lld
+libraries into the clang binary, lld may be able to punt this work because clang
+will be responsible for setting up the state for lld.
+
+
+
Added: lld/trunk/docs/hello.png
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/docs/hello.png?rev=154207&view=auto
==============================================================================
Binary files lld/trunk/docs/hello.png (added) and lld/trunk/docs/hello.png Fri Apr 6 16:02:24 2012 differ
Modified: lld/trunk/docs/intro.rst
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/docs/intro.rst?rev=154207&r1=154206&r2=154207&view=diff
==============================================================================
--- lld/trunk/docs/intro.rst (original)
+++ lld/trunk/docs/intro.rst Fri Apr 6 16:02:24 2012
@@ -3,4 +3,62 @@
Introduction
============
-*lld* is the LLVM linker.
+lld is a new set of modular code for creating linker tools.
+
+ * End-User Features:
+
+ * Compatible with existing linker options
+
+ * Reads standard Object Files (e.g. ELF, mach-o, PE/COFF)
+
+ * Writes standard Executable Files (e.g. ELF, mach-o, PE)
+
+ * Fast link times
+
+ * Minimal memory use
+
+ * Remove clang's reliance on "the system linker"
+
+ * Uses the LLVM 'BSD' License
+
+ * Applications:
+
+ * Modular design
+
+ * Support cross linking
+
+ * Easy to add new CPU support
+
+ * Can be built as static tool or library
+
+ * Design and Implementation:
+
+ * Extensive unit tests
+
+ * Internal linker model can be dumped/read to textual format
+
+ * Internal linker model can be dumped/read to new native format
+
+ * Native format designed to be fast to read and write
+
+ * Additional linking features can be plugged in as "passes"
+
+ * OS specific and CPU specific code factored out
+
+
+Why a new linker?
+-----------------
+
+The fact that clang relies on whatever linker tool you happen to have installed
+means that clang has been very conservative adopting features which require a
+recent linker.
+
+In the same way that the MC layer of LLVM has removed clang's reliance on the
+system assembler tool, the lld project will remove clang's reliance on the
+system linker tool.
+
+
+Current Status
+--------------
+
+lld is in its very early stages of development.
More information about the llvm-commits
mailing list