[llvm-commits] [lld] r158374 - in /lld/trunk/docs: Readers.rst development.rst

Tue Jun 12 23:14:00 PDT 2012

That last line after my signature is a stray that I forgot to delete from
my email before I decided to just rework the beginning.

On Tue, Jun 12, 2012 at 11:12 PM, Sean Silva <silvas at purdue.edu> wrote:

> I had some really major concerns about the general organization and
> wording of the beginning. I've attached a patch that reworks it to be a lot
> more upfront about the organization of the components and exactly what is
> involved. Let me know what you think.
>
> I think I may have gone a bit overboard with the sphinx markup. That can
> easily be dialed back to be more "plaintext"-y (although you should build
> it with sphinx to see how it turns out, and decide whether it may be
> desirable to use some of Sphinx's features). Needless to say, I don't
> expect the patch to be committed as-is.
>
> For the middle--end, the patch has some little formatting fixups that I
> always compulsively do as I read. You can ignore them, but I think there
> might be a couple grammatical or wording fixes worth applying.
>
> I think the Making Atoms discussion needs to be made more concrete. "Call
> this function". "Use this constructor", etc.
>
> Similarly for the testing stuff. "Put a file here". "make this build
> target to run the tests". etc.
>
> Also, you should link Readers.rst into a toctree; sphinx is warning about
> it not being linked in (you do link to it but not through the toctree; I
> would just remove the other link).
>
> --Sean Silva
>
> Having moved this information higher up, you should leave this out.
>
> On Tue, Jun 12, 2012 at 3:43 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
>> Author: kledzik
>> Date: Tue Jun 12 17:43:35 2012
>> New Revision: 158374
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=158374&view=rev
>> Log:
>> Wrote initial doc on how to create a Reader
>>
>> Added:
>>    lld/trunk/docs/Readers.rst
>> Modified:
>>    lld/trunk/docs/development.rst
>>
>> Added: lld/trunk/docs/Readers.rst
>> URL:
>> http://llvm.org/viewvc/llvm-project/lld/trunk/docs/Readers.rst?rev=158374&view=auto
>>
>> ==============================================================================
>> --- lld/trunk/docs/Readers.rst (added)
>> +++ lld/trunk/docs/Readers.rst Tue Jun 12 17:43:35 2012
>> @@ -0,0 +1,163 @@
>> +.. _Readers:
>> +
>> +Developing lld Readers
>> +======================
>> +
>> +Introduction
>> +------------
>> +
>> +One goal of lld is to be file format independent.  This is done
>> +through a plug-in model for reading object files. The lld::Reader is the
>> base
>> +class for all object file readers.  A Reader follows the factory method
>> pattern.
>> +A Reader instantiates an lld::File object (which is a graph of Atoms)
>> from a
>> +given object file (on disk or in-memory).
>> +
>> +Every Reader subclass defines its own "options" class (for instance the
>> mach-o
>> +Reader defines the class ReaderOptionsMachO).  This options class is the
>> +one-and-only way to control how the Reader operates when parsing an
>> input file
>> +into an Atom graph.  For instance, you may want the Reader to only accept
>> +certain architectures.  The options class can be instantiated from
>> command
>> +line options, or it can be subclassed and the ivars programmatically set.
>> +
>> +
>> +Where to start
>> +--------------
>> +
>> +The lld project already has a skeleton of source code for Readers of
>> ELF, COFF,
>> +mach-o, and the lld native object file format.  If your file format is a
>> +variant of one of those, you should modify the existing Reader to support
>> +your variant.  This is done by adding new ivar(s) to the Options class
>> for that
>> +Reader which specifies which file format variant to expect.  And then
>> modifying
>> +the Reader to check those ivars and respond parse the object file
>> accordingly.
>> +
>> +If your object file format is not a variant of any existing Reader,
>> you'll need
>> +to create a new Reader subclass. If your file format is called "Foo",
>> you'll
>> +need to create these files::
>> +
>> +    ./include/lld/ReaderWriter/ReaderFoo.h
>> +    ./lib/ReaderWriter/Foo/ReaderFoo.cpp
>> +
>> +The public interface for you reader is just the ReaderOptions subclass
>> +(e.g.  ReaderOptionsFoo) and the function to create a Reader given the
>> options::
>> +
>> +    Reader* createReaderFoo(const ReaderOptionsFoo &options);
>> +
>> +In the implementation, you can define a ReaderFoo class, but that class
>> is
>> +private to your ReaderWriter directory.
>> +
>> +
>> +Readers are factories
>> +---------------------
>> +
>> +The linker will usually only instantiate your Reader once.  That one
>> Reader will
>> +have its parseFile() method called many times with different input files.
>> +To support a multithreaded linking, the Reader may be parsing multiple
>> input
>> +files in parallel. Therefore, there should be no parsing state in you
>> Reader
>> +object.  Any parsing state should be in ivars of your File subclass or in
>> +some temporary object.
>> +
>> +The key method to implement in a reader is::
>> +
>> +  virtual error_code parseFile(std::unique_ptr<MemoryBuffer> mb,
>> +                               std::vector<std::unique_ptr<File>>
>> &result);
>> +
>> +It takes a memory buffer (which contains the contents of the object file
>> +being read) and returns an instantiated lld::File object which is
>> +a collection of Atoms. The result is a vector of File pointers (instead
>> of
>> +simple a File pointer) because some file formats allow multiple object
>> +"files" to be encoded in one file system file.
>> +
>> +
>> +Memory Ownership
>> +----------------
>> +
>> +If parseFile() is successful, it either passes ownership of the
>> MemoryBuffer
>> +to the File object, or it deletes the MemoryBuffer.  The former is done
>> if the
>> +Atoms contain pointers into the MemoryBuffer (e.g. StringRefs for symbols
>> +or ArrayRefs for section content).  If parseFile() fails, the
>> MemoryBuffer
>> +must be deleted by the Reader.
>> +
>> +Atoms objects are always owned by their File object.  During core linking
>> +when Atoms are coalesced or dead stripped away, core linking does not
>> delete
>> +those Atoms. Core linking just removes those unused Atoms from its
>> internal
>> +list. The destructor of a File object is responsible for deleting all
>> Atoms
>> +it owns, and if ownership of the MemoryBuffer was passed to it, the File
>> +destructor needs to delete that too.
>> +
>> +
>> +Making Atoms
>> +------------
>> +
>> +The internal model of lld is purely Atom based.  But most object files
>> do not
>> +have an explicit concept of Atoms, instead most have "sections".  The way
>> +to think of this, is that a section is just list of Atoms with common
>> +attributes.
>> +
>> +The first step in parsing section based object files is to cleave each
>> +section into a list of Atoms.  The technique may vary by section type.
>>  For
>> +code sections (e.g. .text), there are usually symbols at the start of
>> each
>> +function. Those symbol address are the points at which the section is
>> cleaved
>> +into discrete Atoms.  Some file formats (like ELF) also include the
>> +length of each symbol in the symbol table.  Otherwise, the length of each
>> +Atom is calculated to run to the start of the next symbol or the end of
>> the
>> +section.
>> +
>> +Other sections types can be implicitly cleaved.  For instance c-string
>> literals
>> +or unwind info (e.g. .eh_frame) can be cleaved by having the Reader look
>> at
>> +the content of the section.  It is important to cleave sections into
>> Atoms
>> +to remove false dependencies.  For instance the .eh_frame section often
>> +has no symbols, but contains "pointers" to the functions for which it
>> +has unwind info.  If the .eh_frame section was not cleaved (but left as
>> one
>> +big Atom), there would always be a reference (from the eh_frame Atom) to
>> +each function.  So the linker would be unable to coalesce or dead
>> stripped
>> +away the function atoms.
>> +
>> +The lld Atom model also requires that a reference to an undefined symbol
>> be
>> +modeled as a Reference to an UndefinedAtom.  So the Reader also needs to
>> +create an UndefinedAtom for each undefined symbol in the object file.
>> +
>> +Once all Atoms have been created, the second step is to create References
>> +(recall that Atoms are "nodes" and References are "edges").  Most
>> References
>> +are created by looking at the "relocation records" in the object file.
>>  If
>> +a function contains a call to "malloc", there is usually a relocation
>> record
>> +specifying the address in the section and the symbol table index.  Your
>> +Reader will need to convert the address to an Atom and offset and the
>> symbol
>> +table index into a target Atom.  If "malloc" is not defined in the
>> object file,
>> +the target Atom of the Reference will be an UndefinedAtom.
>> +
>> +
>> +Performance
>> +-----------
>> +Once you have the above working to parse an object file into Atoms and
>> +References, you'll want to look at performance.  Some techniques that can
>> +help performance are:
>> +
>> +* Use llvm::BumpPtrAllocator or pre-allocate one big vector<Reference>
>> and then
>> +  just have each atom point to its subrange of References in that vector.
>> +  This can be faster that allocating each Reference as separate object.
>> +* Pre-scan the symbol table and determine how many atoms are in each
>> section
>> +  then allocate space for all the Atom objects at once.
>> +* Don't copy symbol names or section content to each Atom, instead use
>> +  StringRef and ArrayRef in each Atom to point to its name and content
>> in the
>> +  MemoryBuffer.
>> +
>> +
>> +Testing
>> +-------
>> +
>> +We are still working on infrastructure to test Readers.  The issue is
>> that
>> +you don't want to check in binary files to the test suite. And the tools
>> +for creating your object file from assembly source may not be available
>> on
>> +every OS.
>> +
>> +We are investigating a way to use yaml to describe the section, symbols,
>> +and content of a file.  Then have some code which will write out an
>> object
>> +file from that yaml description.
>> +
>> +Once that is in place, you can write test cases that contain
>> section/symbols
>> +yaml and is run through the linker to produce Atom/References based yaml
>> which
>> +is then run through FileCheck to verify the Atoms and References are as
>> +expected.
>> +
>> +
>> +
>>
>> Modified: lld/trunk/docs/development.rst
>> URL:
>> http://llvm.org/viewvc/llvm-project/lld/trunk/docs/development.rst?rev=158374&r1=158373&r2=158374&view=diff
>>
>> ==============================================================================
>> --- lld/trunk/docs/development.rst (original)
>> +++ lld/trunk/docs/development.rst Tue Jun 12 17:43:35 2012
>> @@ -5,7 +5,15 @@
>>
>>  lld is developed as part of the `LLVM <http://llvm.org>`_ project.
>>
>> -See the :ref:`getting started <getting_started>` guide.
>> +Creating a Reader
>> +-----------------
>> +
>> +See the :ref:`Creating a Reader <Readers>` guide.
>> +
>> +
>> +
>> +Documentation
>> +-------------
>>
>>  The project documentation is written in reStructuredText and generated
>> using the
>>  `Sphinx <http://sphinx.pocoo.org/>`_ documentation generator. For more
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120612/5d608b93/attachment.html>