[LLVMdev] [lld] Atom object model refactoring.

Wed Jul 18 14:34:25 PDT 2012

On Jul 18, 2012, at 12:52 PM, Michael Spencer wrote:
> I've run into some issues with the current atom object model that I
> would like to fix.
> 
> The current 4 atoms are not expressive enough. We need to be able to
> serialize a larger set of atoms, many of which are format specific.
> 
> The set of common atoms (shared between all formats) should be the set
> that the resolver requires to work. SharedLibrary is not included in
> this (by looking at the source code).
> 
> The driving use case for this for me is the Import Address Table in
> PE/COFF. It is a section created by the writer that specifies external
> symbols to import and then acts as the GOT/PLT at runtime. Building
> this table requires extra information to be maintained in an efficient
> format. It also needs to be an atom so that relocations can point to
> it. However it does not have a well defined size or content until the
> table is complete.
Why is the IAT not just constructed in the PECOFF Writer?  Why does it need
to be an Atom?   What relocations need to point to it?  If they are relocations
created by the Writer, you are fine.  If you mean that other atoms may
have References (in)to the IAT, then that is what SharedLibraryAtoms are
for.  They are place holders that expand to something real in the Writer.

Mach-o has all kinds of crazy data structures that are constructed in the Writer.  
This is different than Darwin ld64 where the Writer actually created atoms for 
its data structures and feed them back to the resolver.  I wanted to avoid 
that insanity in lld.  

The Writer is handed a list of atoms from which to construct the executable.
It is free to create more atoms (private to the Writer) or just lay down data
structures - which ever is easier.  

> 
> The File interface for atoms should be changed to File::iterator
> begin(); File::iterator end(); where File::iterator is some type of
> iterator over Atom.
> 
> As for serialization. Each atom can have its own serialize/unserialize
> function for both the Native format and YAML.
The four Atom kinds each have very different attributes and are used 
differently,  That is why I broke them out into separate lists.  

> This would also change ContentType to not contain so many format
> specific values. It would also allow us to get rid of isThumb as a
> DefinedAtom level attribute.
I'm all for getting rid of isThumb(), but that seems orthogonal to your issue.

-Nick