[llvm-commits] [lld] r152591 - in /lld/trunk/www: content.css hello.png index.html linker_design.html menu.css

Mon Mar 12 17:10:27 PDT 2012

Author: kledzik
Date: Mon Mar 12 19:10:27 2012
New Revision: 152591

URL: http://llvm.org/viewvc/llvm-project?rev=152591&view=rev
Log:
Added first round of documentation

Added:
    lld/trunk/www/content.css
    lld/trunk/www/hello.png   (with props)
    lld/trunk/www/index.html
    lld/trunk/www/linker_design.html
    lld/trunk/www/menu.css

Added: lld/trunk/www/content.css
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/www/content.css?rev=152591&view=auto
==============================================================================

--- lld/trunk/www/content.css (added)
+++ lld/trunk/www/content.css Mon Mar 12 19:10:27 2012
@@ -0,0 +1,27 @@
+html { margin: 0px; } body { margin: 8px; }
+
+html, body {
+  padding:0px;
+  font-size:small; font-family:"Lucida Grande", "Lucida Sans Unicode", Arial, Verdana, Helvetica, sans-serif; background-color: #fff; color: #222;
+  line-height:1.5;
+}
+
+h1, h2, h3, tt { color: #000 }
+
+h1 { padding-top:0px; margin-top:0px;}
+h2 { color:#333333; padding-top:0.5em; }
+h3 { padding-top: 0.5em; margin-bottom: -0.25em; color:#2d58b7}
+li { padding-bottom: 0.5em; }
+ul { padding-left:1.5em; }
+
+/* Slides */
+IMG.img_slide {
+    display: block;
+    margin-left: auto;
+    margin-right: auto
+}
+
+.itemTitle { color:#2d58b7 }
+
+/* Tables */
+tr { vertical-align:top }

Added: lld/trunk/www/hello.png
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/www/hello.png?rev=152591&view=auto
==============================================================================
Binary file - no diff available.

Propchange: lld/trunk/www/hello.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: lld/trunk/www/index.html
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/www/index.html?rev=152591&view=auto
==============================================================================
--- lld/trunk/www/index.html (added)
+++ lld/trunk/www/index.html Mon Mar 12 19:10:27 2012
@@ -0,0 +1,99 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+          "http://www.w3.org/TR/html4/strict.dtd">
+<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
+<html>
+<head>
+  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
+  <title>lld: a linker for LLMV</title>
+  <link type="text/css" rel="stylesheet" href="menu.css">
+  <link type="text/css" rel="stylesheet" href="content.css">
+</head>
+
+<body>
+<div id="menu">
+  <div>
+    <a href="http://llvm.org/">LLVM Home</a>
+  </div>
+
+  <div class="submenu">
+    <label>lld Info</label>
+    <a href="./index.html">About</a>
+    <a href="linker_design.html">Design</a>
+  </div>
+
+  <div class="submenu">
+    <label>Quick Links</label>
+    <a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/">commits</a>
+    <a href="http://llvm.org/bugs/">Bug Reports</a>
+    <a href="http://llvm.org/svn/llvm-project/lld/trunk/">Browse SVN</a>
+    <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/">Browse ViewVC</a>
+  </div>
+</div>
+
+<div id="content">
+  <!--*********************************************************************-->
+  <h1>lld: a linker for LLVM</h1>
+  <!--*********************************************************************-->
+
+  <p>lld is a new set of modular code for creating linker tools.</p>
+
+  <!--=====================================================================-->
+  <h2 id="goals">Features and Goals</h2>
+  <!--=====================================================================-->
+
+	<p><b>End-User Features:</b></p>
+    <ul>
+		<li>Compatible with existing linker options</li>
+        <li>Fast link times</li>
+        <li>Minimal memory use</li>
+        <li>Remove clang's reliance on "the system linker"</li>
+    </ul>
+
+	<p><b>Applications:</b></p>
+    <ul>
+        <li>Use the LLVM 'BSD' License</li>
+        <li>Modular design</li>
+        <li>Support cross linking</li>
+        <li>Easy to add new CPU support</li>
+        <li>Can be built as static tool or library</li>
+    </ul>
+	
+	<p><b>Design and Implementation:</b></p>
+    <ul>
+        <li>Extensive unit tests</li>
+        <li>Internal linker model can be dumped/read to textual format</li>
+		<li>Internal linker model can be dumped/read to new native format</li>
+        <li>Additional linking features can be plugged in as "passes"</li>
+        <li>OS specific and CPU specific code factored out</li>
+    </ul>
+
+  <!--=====================================================================-->
+  <h2 id="why">Why a new linker?</h2>
+  <!--=====================================================================-->
+
+  <p>The fact that clang relies on whatever linker tool you happen to have
+	installed means that clang has been very conservative adopting features
+	which require a recent linker.</p>
+	
+ <p>In the same way that the MC layer of LLVM has removed clang's reliance
+	on the system assembler tool, the lld project will remove clang's reliance
+	on the system linker tool.</p>
+
+
+  <!--=====================================================================-->
+  <h2 id="status">Current Status</h2>
+  <!--=====================================================================-->
+
+   <p>lld is in its very early stages of development.</p>
+
+ <!--=====================================================================-->
+  <h2>Design Documents</h2>
+  <!--=====================================================================-->
+
+<ul>
+   <li><a href="linker_design.html">Design of lld</a></li>
+</ul>
+ 
+</div>
+</body>
+</html>

Added: lld/trunk/www/linker_design.html
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/www/linker_design.html?rev=152591&view=auto
==============================================================================
--- lld/trunk/www/linker_design.html (added)
+++ lld/trunk/www/linker_design.html Mon Mar 12 19:10:27 2012
@@ -0,0 +1,421 @@
+<html>
+<head>
+  <title>Design of lld</title>
+</head>
+<body>
+
+
+<h1>
+  Design of lld 
+</h1>
+
+<h2>
+  <a name="introduction">Introduction</a>
+</h2>
+
+<p>lld is a new generation of linker.  It is not "section" based
+like traditional linkers which mostly just interlace sections from multiple
+object files into the output file.  Instead, lld is based on "Atoms".
+Traditional section based linking work well for simple linking, but their model
+makes advanced linking features difficult to implement.  Features like dead code 
+stripping, reordering functions for locality, and C++ coalescing require the
+linker to work at a finer grain.
+</p>
+
+<p>An atom is an indivisible chunk of code or data.  An atom has a set of
+attributes, such as: name, scope, content-type, alignment, etc.  An atom also
+has a list of References.  A Reference contains: a kind, an optional offset, 
+an optional addend, and an optional target atom.</p>
+
+<p>The Atom model allows the linker to use standard graph theory models for 
+linking data structures.  Each atom is a node, and each Reference is an edge. 
+The feature of dead code stripping is implemented by following edges to mark
+all live atoms, and then delete the non-live atoms.</p>
+<br>
+<h2>
+  <a name="Atom model">Atom model</a>
+</h2>
+
+<p>An atom is an indivisible chuck of code or data.  Typically each user
+written function or global variable is an atom.  In addition, the compiler may
+emit other atoms, such as for literal c-strings or floating point constants, or
+for runtime data structures like dwarf unwind info or pointers to initializers.
+</p>
+
+<p>A simple "hello world" object file would be modeled like this:</p>
+<img src="hello.png" alt="hello world graphic"/>
+<p>There are three atoms: main, a proxy for printf, and an anonymous atom 
+containing the c-string literal "hello world".  The Atom "main" has two  
+references. One is the call site for the call to printf, and the other is 
+a refernce for the instruction that loads the address of the c-string literal. 
+</p>
+
+<br>
+<h2>
+  <a name="File model">File model</a>
+</h2>
+
+<p>The linker views the input files as basically containers of Atoms and 
+References, and just a few attributes of their own.  The linker works with three 
+kinds of files: object files, static libraries, and dynamic shared libraries.   
+Each kind of file has reader object which presents the file in the model 
+expected by the linker.</p>
+<h4> <a>Object File</a> 
+</h4>
+An object file is just a container of atoms.  When linking 
+an object file, a reader is instantiated which parses the object file and 
+instantiates a set of atoms representing all content in the .o file.  The
+linker adds all those atoms to a master graph.
+
+<h4> <a>Static Library (Archive)</a> 
+</h4>
+This is the traditional unix static archive which is just a collection of
+object files with a "table of contents". When linking with a static library,
+by default nothing is added to the master graph of atoms. Instead, if after
+merging all atoms from object files into a master graph, if any "undefined" atoms
+are left remaining in the master graph, the linker reads the 
+table of contents for each static library to see if any have the needed 
+definitions. If so, the set of atoms from the specified object file in the 
+static library is added to the master graph of atoms. 
+
+<h4> <a>Dynamic Library (Shared Object)</a> 
+</h4>
+Dynamic libraries are different than object files and static libraries in that 
+they don't directly add any content.
+Their purpose is to check at build time that the remaining undefined references 
+can be resolved at runtime, and provide a list of dynamic libraries (SO_NEEDED) 
+that will be needed at runtime. 
+The way this is modeled in the linker is that a dynamic library contributes
+no atoms to the initial graph of atoms.  Instead, (like static libraries) if
+there are "undefined" atoms in the master graph of all atoms, then each 
+dynamic library is checked to see if exports the required symbol. If so,
+a "shared library" atom is instantiated by the by the reader which the linker
+uses to replace the "undefined" atom.</p>
+
+<br>
+<h2>
+  <a name="Linking Steps">Linking Steps</a>
+</h2>
+<p>Through the use of abstract Atoms, the core of linking is architecture 
+independent and file format independent.  All command line parsing is factored
+out into a separate "options" abstraction which enables the linker to be driven
+with different command line sets.</p>
+<p>The overall steps in linking are:<p>
+<ol>
+  <li>Command line processing</li>
+  <li>Parsing input files</li>
+  <li>Resolving</li>
+  <li>Passes/Optimizations</li>
+  <li>Generate output file</li>
+</ol>
+
+<p>The Resolving and Passes steps are done purely on the master graph of atoms, 
+so they have no notion of file formats such as mach-o or ELF.</p>
+
+<h4> <a>Resolving</a> 
+</h4>
+<p>The resolving step takes all the atoms graphs from each object file and 
+combines them into one master object graph.  Unfortunately, it is not as simple
+as appending the atom list from each file into one big list.  There are many
+cases where atoms need to be coalesced.  That is, two or more atoms need to 
+be coalesced into one atom.  This is necessary to support: C language
+ "tentative definitions", C++ weak symbols for templates and inlines defined
+in headers, replacing undefined atoms with actual definition atoms, and 
+for merging copies of constants like c-strings and floating point constants.</p>
+
+<p>The linker support coalescing by-name and by-content. By-name is used for
+tentative definitions and weak symbols.  By-content is used for constant data
+that can be merged. </p>
+
+<p>The resolving process maintains some global linking "state", including
+a "symbol table" which is a map from llvm::StringRef to lld::Atom*.
+With these data structures, the linker iterates all atoms in all input files. F
+or each atom, it checks if the atom is named and has a global or hidden scope.
+If so, the atom is added to the symbol table map.  If there already is
+a matching atom in that table, that means the current atom needs to be 
+coalesced with the found atom, or it is a multiple definition error.
+</p>
+
+<p>When all initial input file atoms have been processed by the resolver,  
+a scan is made to see if there are any undefined atoms in the graph.  If there
+are, the linker scans all libraries (both static and dynamic) looking for  
+definitions to replace the undefined atoms.  It is an error if any undefined  
+atoms are left remaining.
+</p>
+
+<p>Dead code stripping (if requested) is done at the end of resolving.  The
+linker does a simple mark-and-sweep. It starts with "root" atoms (like "main"
+in a main executable) and follows each references and marks each Atom that
+it visits as "live".  When done, all atoms not marked "live" are removed.
+</p>
+
+<p>The result of the Resolving phase is the creation of an lld::File object.  
+The goal is that the lld::File model is <b>the</b> internal representation throughout
+the linker. The file readers parse (mach-o, ELF, COFF) into an lld::File.
+The file writers (mach-o, ELF, COFF) taken an lld::File and produce their
+file kind, and every Pass only operates on an lld::File.  This is not only
+a simpler, consistent model, but it enables the state of the linker to be 
+dumped at any point in the link for testing purposes.
+</p>
+
+
+<h4> <a>Passes</a> 
+</h4>
+<p>The Passes step
+is an open ended set of routines that each get a change to modify or enhance
+the current lld::File object. Some example Passes are:</p>
+<ul>
+  <li>stub (PLT) generation</li>
+  <li>GOT instantiation</li>
+  <li>order_file optimization</li>
+  <li>branch island generation</li>
+  <li>branch shim generation</li>
+  <li>Objective-C optimizations (Darwin specific)</li>
+  <li>TLV instantiation (Darwin specific)</li>
+  <li>dtrace probe processing (Darwin specific)</li>
+  <li>compact unwind encoding (Darwin specific)</li>
+</ul>
+<p>Some of these passes are specific to Darwin's runtime environments.  But many
+of the passes are applicable to any OS (such as generating branch island for 
+out of range branch instructions).</p>
+
+<p>The general structure of a pass is to iterate through the atoms in the
+current lld::File object, inspecting each atom and doing something.  
+For instance, the stub pass, looks for call sites to shared library atoms
+(e.g. call to printf).  It then
+instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for each 
+proxy atom needed, and these new atoms are added to the current lld::File 
+object.  Next, all the noted call sites to shared library atoms have their
+References altered to point to the stub atom instead of the shared library
+atom.</p>  
+
+<h4><a>Generate Output File</a> 
+</h4>
+<p>Once the passes are done, the output file writer is given current lld::File 
+object.  The writer's job is to create the executable content file wrapper 
+and place the content of the atoms into it. 
+</p>
+
+
+<h2>
+  <a name="lld::File representations">lld::File representations</a>
+</h2>
+<p>Just as LLMV has three representations of its IR model, lld has three
+representations of its File/Atom/Reference model: 
+<ul>
+	<li>In memory, abstract C++ classes 
+	     (lld::Atom, lld::Reference, and lld::File)</li>
+	<li>textual (in YAML)</li>
+	<li>binary format ("native")</li>
+</ul>
+</p>
+
+<h4><a>Binary File Format</a> 
+</h4>
+<p>In theory, lld::File objects could be written to disk in an existing 
+Object File format standard (e.g. ELF).  Instead we choose to define a new
+binary file format. There are two main reasons for this: fidelity and 
+performance.</p>
+<p>In order for lld to work as a linker on all platforms, its internal model
+must be rich enough to model all CPU and OS linking features.  But if we choose 
+an existing Object File format as the lld binary format, that means an on
+going need to retrofit each platform specific feature needed from alternate
+platforms into the existing Object File format.  Having our own "native"
+binary format side steps that issue.  We still need to be able to binary
+encode all the features, but once the in-memory model can represent the feature,
+it is straight forward to binary encode it.</p>
+
+<p>The reason to use a binary file format at all, instead of a textual file 
+format, is speed.  You want the binary format to be as fast as possible to read 
+into the in-memory model. Given that we control the in-memory model and the 
+binary format, the obvious way to make reading super fast it to make the file
+format be basically just an array of atoms.  The reader just mmaps in the file 
+and looks at the header to see how many atoms there are and instantiate that  
+many atom objects with the atom attribute information coming from that array.   
+The trick is designing this in a way that can be extended as the Atom mode 
+evolves and new attributes are added.
+</p>
+
+<p>The native object file format starts with a header that lists how many 
+"chunks" are in the file.  A chunk is an array of "ivar data".  The native
+file reader instantiates an array of Atom objects (with one large malloc call). 
+Each atom contains just a pointer to its vtable and a pointer to its ivar data. 
+All methods on lld::Atom are virtual, so all the method implementations return
+values based on the ivar data to which it has a pointer.
+If a new linking features is added which requires a change to the lld::Atom
+model, a new native reader class (e.g. version 2) is defined which knows
+how to read the new feature information from the new ivar data.  The old
+reader class (e.g. version 1) is updated to do its best to model (the lack
+of the new feature) given the old ivar data in existing native object files.
+</p>
+
+<p>With this model for the native file format, files can be read and turned
+into the in-memory graph of lld::Atoms with just a few memory allocations.  
+And the format can easily adapt over time to new features</p>
+
+
+<h4><a>Textual representations in YAML</a> 
+</h4>
+<p>
+In designing a textual format we want something easy for humans to read and
+easy for the linker to parse.  Since an atom has lots of attributes most of
+which are usually just the default, we should define default values for 
+every attribute so that those can be omitted from the text representation.
+Here is the atoms for a simple hello world program expressed in YAML.
+</p>
+<pre>
+---
+target-triple:   x86_64-apple-darwin11
+
+atoms:
+    - name:    _main
+      scope:   global
+      type:    code
+      content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
+                 00, 00, 31, c0, 5d, c3 ]
+      fixups:
+      - offset: 07
+        kind:   pcrel32
+        target: 2
+      - offset: 0E
+        kind:   call32
+        target: _fprintf
+
+    - type:    c-string
+      content: [ 73, 5A, 00 ]
+
+...
+</pre>
+
+<p>The biggest use for the textual format will be writing test cases. 
+Writing test cases in C is problematic because the compiler 
+may vary its output over time for its own optimization reasons which my 
+inadvertently disable or break the linker feature trying to be tested. By 
+writing test cases in the linkers own textual format, we can exactly specify 
+every attribute of every atom and thus target specific linker logic.
+</p>
+
+
+
+
+<h2>
+  <a name="Testing">Testing</a>
+</h2>
+<p>The lld project contains a test suite which is being built up as new code
+is added to lld.  All new lld functionality should have a tests added to the 
+test suite.</p>
+<p>The test suite is <a href="http://llvm.org/cmds/lit.html">lit</a> driven.
+Each test is a text file with comments telling lit how to run the test and
+check the result</p>
+<p>To facilitate testing, the lld project builds a tool called lld-core.
+This tool reads a YAML file (default from stdin), parses it into one or
+more lld::File objects in memory and then feeds those lld::File objects 
+to the resolver phase.  The output of the resolver is written as a native
+object file.  It is then read back in using the native object file reader
+and then pass to the YAML writer.  This round-about path means that all three
+representations (in-memory, binary, and text) are exercised, and any new
+feature has to work in all the representations to pass the test.
+</p>
+
+<h4><a>Resolver testing</a> 
+</h4>
+<p>Basic testing is the "core linking" or resolving phase.  That
+is where the linker merges object files.  All test cases are written in YAML.
+One feature of YAML is that it allows multiple "documents" to be encoding in
+one YAML stream.  That means one text file can appear to the linker as 
+multiple .o files - the normal case for the linker.  
+</p>
+<p>Here is a simple example of a core linking test case. It checks that
+an undefined atom from one file will be replaced by a definition from
+another file.</p>
+<pre>
+# RUN: lld-core %s | FileCheck %s
+
+#
+# Test that undefined atoms are replaced with defined atoms.
+#
+
+---
+atoms:
+    - name:              foo
+      definition:        undefined
+---
+atoms:
+    - name:              foo
+      scope:             global
+      type:              code
+...
+
+# CHECK:       name:       foo
+# CHECK:       scope:      global
+# CHECK:       type:       code
+# CHECK-NOT:   name:       foo
+# CHECK:       ...
+</pre>
+
+<h4><a>Passes testing</a> 
+</h4>
+<p>Since Passes just operate on an lld::File object, the lld-core tool has
+the option to run a particular pass (after resolving).  Thus, you can write
+a YAML test case with carefully crafted input to exercise areas of a Pass
+and the check the resulting lld::File object as represented in YAML.
+</p>
+
+
+<h2>
+  <a name="Design Issues">Design Issues</a>
+</h2>
+
+<p>There are a number of open issues in the design of lld.  The plan is to 
+wait and make these design decisions when we need to.</p>
+
+
+<h4><a>Debug Info</a> 
+</h4>
+<p>Currently, the lld model says nothing about debug info.  But the most 
+popular debug format is DWARF and there is some impedance mismatch with the 
+lld model and DWARF.  In lld there are just Atoms and only Atoms that need
+to be in a special section at runtime have an associated section.  Also, 
+Atoms do not have addresses.  The way DWARF is spec'ed different parts of 
+DWARF are supposed to go into specially named sections and the DWARF references
+function code by address.</p>
+
+<h4><a>CPU and OS specific functionality</a> 
+</h4>
+<p>Currently, lld has an abstract "Platform" that deals with any CPU or OS
+specific differences in linking.  We just keep adding virtual methods to
+the base Platform class as we find linking areas that might need customization.
+At some point we'll need to structure this better.
+</p>
+
+<h4><a>File Attributes</a> 
+</h4>
+<p>Currently, lld::File just has a path and a way to iterate its atoms. We 
+will need to add mores attributes on a File.  For example, some equivalent
+to the target triple.  There is also a number of cached or computed attributes
+that could make various Passes more efficient.  For instance, on Darwin
+there are a number of Objective-C optimizations that can be done by a Pass.
+But it would improve the plain C case if the Objective-C optimization Pass did
+not have to scan all atoms looking for any Objective-C data structures.  This
+could be done if the lld::File object had an attribute that said if the file
+had any Objective-C data in it. The Resolving phase would then be required
+to "merge" that attribute as object files are added.
+</p>
+
+<h4><a>Command Line Processing</a> 
+</h4>
+<p>Eventually, we may want this linker to be able to be a drop in replacement 
+linker for existing linker tools.  That means being able to handle command
+line arguments for different platforms (e.g. darwin or linux).  Currently,
+there is no command line processing code in lld. If clang winds up
+incorporating the lld libraries into the clang binary, lld may be able to punt
+this work because clang will be responsible for setting up the state for lld.
+</p>
+
+
+
+
+
+</body>
+</html>
+

Added: lld/trunk/www/menu.css
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/www/menu.css?rev=152591&view=auto
==============================================================================
--- lld/trunk/www/menu.css (added)
+++ lld/trunk/www/menu.css Mon Mar 12 19:10:27 2012
@@ -0,0 +1,39 @@
+/***************/
+/* page layout */
+/***************/
+
+[id=menu] {
+	position:fixed;
+	width:25ex;
+}
+[id=content] {
+	/* *****  EDIT THIS VALUE IF CONTENT OVERLAPS MENU ***** */
+	position:absolute;
+  left:29ex;
+	padding-right:4ex;
+}
+
+/**************/
+/* menu style */
+/**************/
+
+#menu .submenu {
+	padding-top:1em;
+	display:block;
+}
+
+#menu label {
+	display:block;
+	font-weight: bold;
+	text-align: center;
+	background-color: rgb(192,192,192);
+}
+#menu a {
+	padding:0 .2em;
+	display:block;
+	text-align: center;
+	background-color: rgb(235,235,235);
+}
+#menu a:visited {
+	color:rgb(100,50,100);
+}