[llvm-commits] [lld] r158374 - in /lld/trunk/docs: Readers.rst development.rst

Fri Jun 15 17:57:39 PDT 2012

On Jun 12, 2012, at 11:12 PM, Sean Silva wrote:

> I had some really major concerns about the general organization and wording of the beginning. I've attached a patch that reworks it to be a lot more upfront about the organization of the components and exactly what is involved. Let me know what you think.
> 
> I think I may have gone a bit overboard with the sphinx markup. That can easily be dialed back to be more "plaintext"-y (although you should build it with sphinx to see how it turns out, and decide whether it may be desirable to use some of Sphinx's features). Needless to say, I don't expect the patch to be committed as-is.
I do like the readability of the html, with the nicely formatted C++ snippets, even though the .rst file is a little uglier.  I've tweaked and submitted your patch.

> 
> For the middle--end, the patch has some little formatting fixups that I always compulsively do as I read. You can ignore them, but I think there might be a couple grammatical or wording fixes worth applying.
> 
> I think the Making Atoms discussion needs to be made more concrete. "Call this function". "Use this constructor", etc.
> 
> Similarly for the testing stuff. "Put a file here". "make this build target to run the tests". etc.
We don't have any Reader test cases yet, so the details of this are not documentable yet.    I'll update this doc when we do get some Reader test cases.

> Also, you should link Readers.rst into a toctree; sphinx is warning about it not being linked in (you do link to it but not through the toctree; I would just remove the other link).
It was not obvious to me how to use toctree without also creating an table-of-contents chunk in the html…  Patches welcome :-)

-Nick

> 
> --Sean Silva
> 
> Having moved this information higher up, you should leave this out.
> On Tue, Jun 12, 2012 at 3:43 PM, Nick Kledzik <kledzik at apple.com> wrote:
> Author: kledzik
> Date: Tue Jun 12 17:43:35 2012
> New Revision: 158374
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=158374&view=rev
> Log:
> Wrote initial doc on how to create a Reader
> 
> Added:
>    lld/trunk/docs/Readers.rst
> Modified:
>    lld/trunk/docs/development.rst
> 
> Added: lld/trunk/docs/Readers.rst
> URL: http://llvm.org/viewvc/llvm-project/lld/trunk/docs/Readers.rst?rev=158374&view=auto
> ==============================================================================
> --- lld/trunk/docs/Readers.rst (added)
> +++ lld/trunk/docs/Readers.rst Tue Jun 12 17:43:35 2012
> @@ -0,0 +1,163 @@
> +.. _Readers:
> +
> +Developing lld Readers
> +======================
> +
> +Introduction
> +------------
> +
> +One goal of lld is to be file format independent.  This is done
> +through a plug-in model for reading object files. The lld::Reader is the base
> +class for all object file readers.  A Reader follows the factory method pattern.
> +A Reader instantiates an lld::File object (which is a graph of Atoms) from a
> +given object file (on disk or in-memory).
> +
> +Every Reader subclass defines its own "options" class (for instance the mach-o
> +Reader defines the class ReaderOptionsMachO).  This options class is the
> +one-and-only way to control how the Reader operates when parsing an input file
> +into an Atom graph.  For instance, you may want the Reader to only accept
> +certain architectures.  The options class can be instantiated from command
> +line options, or it can be subclassed and the ivars programmatically set.
> +
> +
> +Where to start
> +--------------
> +
> +The lld project already has a skeleton of source code for Readers of ELF, COFF,
> +mach-o, and the lld native object file format.  If your file format is a
> +variant of one of those, you should modify the existing Reader to support
> +your variant.  This is done by adding new ivar(s) to the Options class for that
> +Reader which specifies which file format variant to expect.  And then modifying
> +the Reader to check those ivars and respond parse the object file accordingly.
> +
> +If your object file format is not a variant of any existing Reader, you'll need
> +to create a new Reader subclass. If your file format is called "Foo", you'll
> +need to create these files::
> +
> +    ./include/lld/ReaderWriter/ReaderFoo.h
> +    ./lib/ReaderWriter/Foo/ReaderFoo.cpp
> +
> +The public interface for you reader is just the ReaderOptions subclass
> +(e.g.  ReaderOptionsFoo) and the function to create a Reader given the options::
> +
> +    Reader* createReaderFoo(const ReaderOptionsFoo &options);
> +
> +In the implementation, you can define a ReaderFoo class, but that class is
> +private to your ReaderWriter directory.
> +
> +
> +Readers are factories
> +---------------------
> +
> +The linker will usually only instantiate your Reader once.  That one Reader will
> +have its parseFile() method called many times with different input files.
> +To support a multithreaded linking, the Reader may be parsing multiple input
> +files in parallel. Therefore, there should be no parsing state in you Reader
> +object.  Any parsing state should be in ivars of your File subclass or in
> +some temporary object.
> +
> +The key method to implement in a reader is::
> +
> +  virtual error_code parseFile(std::unique_ptr<MemoryBuffer> mb,
> +                               std::vector<std::unique_ptr<File>> &result);
> +
> +It takes a memory buffer (which contains the contents of the object file
> +being read) and returns an instantiated lld::File object which is
> +a collection of Atoms. The result is a vector of File pointers (instead of
> +simple a File pointer) because some file formats allow multiple object
> +"files" to be encoded in one file system file.
> +
> +
> +Memory Ownership
> +----------------
> +
> +If parseFile() is successful, it either passes ownership of the MemoryBuffer
> +to the File object, or it deletes the MemoryBuffer.  The former is done if the
> +Atoms contain pointers into the MemoryBuffer (e.g. StringRefs for symbols
> +or ArrayRefs for section content).  If parseFile() fails, the MemoryBuffer
> +must be deleted by the Reader.
> +
> +Atoms objects are always owned by their File object.  During core linking
> +when Atoms are coalesced or dead stripped away, core linking does not delete
> +those Atoms. Core linking just removes those unused Atoms from its internal
> +list. The destructor of a File object is responsible for deleting all Atoms
> +it owns, and if ownership of the MemoryBuffer was passed to it, the File
> +destructor needs to delete that too.
> +
> +
> +Making Atoms
> +------------
> +
> +The internal model of lld is purely Atom based.  But most object files do not
> +have an explicit concept of Atoms, instead most have "sections".  The way
> +to think of this, is that a section is just list of Atoms with common
> +attributes.
> +
> +The first step in parsing section based object files is to cleave each
> +section into a list of Atoms.  The technique may vary by section type.  For
> +code sections (e.g. .text), there are usually symbols at the start of each
> +function. Those symbol address are the points at which the section is cleaved
> +into discrete Atoms.  Some file formats (like ELF) also include the
> +length of each symbol in the symbol table.  Otherwise, the length of each
> +Atom is calculated to run to the start of the next symbol or the end of the
> +section.
> +
> +Other sections types can be implicitly cleaved.  For instance c-string literals
> +or unwind info (e.g. .eh_frame) can be cleaved by having the Reader look at
> +the content of the section.  It is important to cleave sections into Atoms
> +to remove false dependencies.  For instance the .eh_frame section often
> +has no symbols, but contains "pointers" to the functions for which it
> +has unwind info.  If the .eh_frame section was not cleaved (but left as one
> +big Atom), there would always be a reference (from the eh_frame Atom) to
> +each function.  So the linker would be unable to coalesce or dead stripped
> +away the function atoms.
> +
> +The lld Atom model also requires that a reference to an undefined symbol be
> +modeled as a Reference to an UndefinedAtom.  So the Reader also needs to
> +create an UndefinedAtom for each undefined symbol in the object file.
> +
> +Once all Atoms have been created, the second step is to create References
> +(recall that Atoms are "nodes" and References are "edges").  Most References
> +are created by looking at the "relocation records" in the object file.  If
> +a function contains a call to "malloc", there is usually a relocation record
> +specifying the address in the section and the symbol table index.  Your
> +Reader will need to convert the address to an Atom and offset and the symbol
> +table index into a target Atom.  If "malloc" is not defined in the object file,
> +the target Atom of the Reference will be an UndefinedAtom.
> +
> +
> +Performance
> +-----------
> +Once you have the above working to parse an object file into Atoms and
> +References, you'll want to look at performance.  Some techniques that can
> +help performance are:
> +
> +* Use llvm::BumpPtrAllocator or pre-allocate one big vector<Reference> and then
> +  just have each atom point to its subrange of References in that vector.
> +  This can be faster that allocating each Reference as separate object.
> +* Pre-scan the symbol table and determine how many atoms are in each section
> +  then allocate space for all the Atom objects at once.
> +* Don't copy symbol names or section content to each Atom, instead use
> +  StringRef and ArrayRef in each Atom to point to its name and content in the
> +  MemoryBuffer.
> +
> +
> +Testing
> +-------
> +
> +We are still working on infrastructure to test Readers.  The issue is that
> +you don't want to check in binary files to the test suite. And the tools
> +for creating your object file from assembly source may not be available on
> +every OS.
> +
> +We are investigating a way to use yaml to describe the section, symbols,
> +and content of a file.  Then have some code which will write out an object
> +file from that yaml description.
> +
> +Once that is in place, you can write test cases that contain section/symbols
> +yaml and is run through the linker to produce Atom/References based yaml which
> +is then run through FileCheck to verify the Atoms and References are as
> +expected.
> +
> +
> +
> 
> Modified: lld/trunk/docs/development.rst
> URL: http://llvm.org/viewvc/llvm-project/lld/trunk/docs/development.rst?rev=158374&r1=158373&r2=158374&view=diff
> ==============================================================================
> --- lld/trunk/docs/development.rst (original)
> +++ lld/trunk/docs/development.rst Tue Jun 12 17:43:35 2012
> @@ -5,7 +5,15 @@
> 
>  lld is developed as part of the `LLVM <http://llvm.org>`_ project.
> 
> -See the :ref:`getting started <getting_started>` guide.
> +Creating a Reader
> +-----------------
> +
> +See the :ref:`Creating a Reader <Readers>` guide.
> +
> +
> +
> +Documentation
> +-------------
> 
>  The project documentation is written in reStructuredText and generated using the
>  `Sphinx <http://sphinx.pocoo.org/>`_ documentation generator. For more
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> <lld-Reader.patch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120615/63eacd42/attachment.html>