[LLVMdev] [lld] Atom object model refactoring.

Wed Jul 18 15:55:20 PDT 2012

On Jul 18, 2012, at 3:41 PM, Clow, Marshall wrote:
> On Jul 18, 2012, at 2:34 PM, Nick Kledzik wrote:
>> On Jul 18, 2012, at 12:52 PM, Michael Spencer wrote:
>>> I've run into some issues with the current atom object model that I
>>> would like to fix.
>>> 
>>> The current 4 atoms are not expressive enough. We need to be able to
>>> serialize a larger set of atoms, many of which are format specific.
>>> 
>>> The set of common atoms (shared between all formats) should be the set
>>> that the resolver requires to work. SharedLibrary is not included in
>>> this (by looking at the source code).
>>> 
>>> The driving use case for this for me is the Import Address Table in
>>> PE/COFF. It is a section created by the writer that specifies external
>>> symbols to import and then acts as the GOT/PLT at runtime. Building
>>> this table requires extra information to be maintained in an efficient
>>> format. It also needs to be an atom so that relocations can point to
>>> it. However it does not have a well defined size or content until the
>>> table is complete.
>> Why is the IAT not just constructed in the PECOFF Writer?  Why does it need
>> to be an Atom?   What relocations need to point to it?  If they are relocations
>> created by the Writer, you are fine.  If you mean that other atoms may
>> have References (in)to the IAT, then that is what SharedLibraryAtoms are
>> for.  They are place holders that expand to something real in the Writer.
>> 
>> Mach-o has all kinds of crazy data structures that are constructed in the Writer.  
>> This is different than Darwin ld64 where the Writer actually created atoms for 
>> its data structures and feed them back to the resolver.  I wanted to avoid 
>> that insanity in lld.  
>> 
>> The Writer is handed a list of atoms from which to construct the executable.
>> It is free to create more atoms (private to the Writer) or just lay down data
>> structures - which ever is easier.  
>> 
>>> The File interface for atoms should be changed to File::iterator
>>> begin(); File::iterator end(); where File::iterator is some type of
>>> iterator over Atom.
>>> 
>>> As for serialization. Each atom can have its own serialize/unserialize
>>> function for both the Native format and YAML.
>> 
>> The four Atom kinds each have very different attributes and are used 
>> differently,  That is why I broke them out into separate lists.  
> 
> [ Just trying to understand here. ]
> 
> So, what I'm hearing is that there are four different kinds of Atoms.
> No more, no less - matching the enum in Atom.h.
> Is that correct?
Stated that way, it makes the "four" seem arbitrary.  It makes more sense once 
you see that the four kinds are:

1) DefinedAtom
     95% of all atoms.  This is a chunk of code or data
2) UndefinedAtom
     This is a place holder in object files for a reference to some atom outside the translation unit.
     During core linking it is usually replaced by (coalesced into) another Atom.
3) SharedLibraryAtom
      If a required symbol name turns out to be defined in a dynamic shared library (and not some
      object file).  A SharedLibraryAtom is the placeholder Atom used to represent that fact.  
      It is similar to an UndefinedAtom, but it also tracks information about the associated shared library.
4) AbsoluteAtom
     This is for embedded support where some stuff is implemented in ROM at some fixed address.  This
      atom has no content.  It is just an address that the Writer needs to fixup any references to point to.

> 
> The readers generate a list of atoms from some object format.
> The linker does a bunch of graph stuff on the atoms.
> The writers get a list of (interconnected) atoms, and write an executable from that.
> 
> Am I missing something?
That is the high level summary.  


-Nick