[LLVMdev] Status of YAML IO?

Tue Oct 30 15:34:45 PDT 2012

On Oct 30, 2012, at 11:10 AM, Shankar Easwaran wrote:
>>> 
>>> 3) How are you planning to support Atom specific flags ? Is there a way already ?
>>>    (This would be needed to group similiar atoms together)
>> It is still an open question how to support platform specific Atom attributes.  As much as possible we'd like to expand the Atom model to be a superset of all the platform specific flags.  But there are some attributes that are very much tied to one platform.  One idea is to just add a new Reference which has no target but its kind (and maybe addend) encode the platform specific attributes.  The Reference kind is already platform specific.
> 
> How about if the atom flags could be overridden ? The Atom flag could have a MIN/MAX and anything above the MAX or lower than the MIN are platform specific, like how its dealt with section indexes ?

I know ELF file format has some ranges for various values that are specifically reserved for processors or "user" defined functionality.  It serves the needs of ELF well.  It allows processor and software tools teams to use ELF but work independently (and/or in secret) on new functionality without needed to coordinate with a central ELF owner.

But lld is different. It is not a file format.  It is an API. If a particular processor needs to express something not captured in the Atom model, we should discuss what that functionality is and see if we can grow the Atom model.  There may well be another processor that needs some similar functionality.    If we added a generic uint32_t DefinedAtom::flags() method, I would be concerned that lld porters would be quick to just use the bits for whatever they need and not see if the Atom model needs expanding.  

An example of something I added (but am not happy with) is DefinedAtom::isThumb().  This is something only applicable to ARM (and only if you care about interop of thumb and arm code).  

Given that the Reference::Kind field is already platform specific, I'm leaning towards saying that the way to add platform specific atom attributes is to add a Reference with no target to the Atom with a Kind field that for that platform means whatever attribute you need.  

> 
>> 
>>> 5) are you planning to support dwarf information too ?
>> Debugging information is another big open question.  The dwarf format is very much tied to the section model.  Not only is the debug information put is sections with special names, but the dwarf debug into references code by its address in the .o files (the Atom model does not model addresses).    I'm sure the lldb guys have some ideas on direction of where they would like debug information to go.  It may be that the Atom model has a different representation for debug info.  And when generating a final linked image you can choose the debug format you want.  A Writer could convert the debug info to dwarf if requested.
> Wouldnt it be hard to get the source / line information right if the linker tries to write the debug information ?
Just as hard as reading and writing dwarf debug information in general ;-)

Let me also mention why the debug information is not an issue for MacOS/iOS.  Dwarf is designed to work with "dumb" linkers or "smart" linkers.  A dumb linker just copies all the dwarf sections from all input files to the output file, and applies any relocations.  This is simple, but the resulting dwarf is huge with tons of "dead" dwarf in it (because of coalescing by the linker).  A smart linker knows how to parse up dwarf and optimize the combining of sections.  The resulting dwarf is much smaller, but it takes a lot of computation to do the merge.  

When we (Apple/darwin) switched from stabs to dwarf years ago, we decided to take a different approach. We realized a dumb linker would be slow because of all the I/O copying dwarf.  A smart linker would be slow because of all the computation needed.  So, instead the darwin linker just ignores all dwarf in .o files!  Instead it writes "debug notes" to the final linked image that lists the paths to all the .o files used to create the image.  This approach makes linking fast.  Next, if you happen to run the program in the debugger, the debugger would see the debug notes and go read the .o files' dwarf information.  Lastly, if you are making a release build, you run a tool called dsymutil on the final linked image.  dsymutil finds the debug notes, parses the .o files' dwarf information then does all the computation to produce an optimal dwarf output file (we use a .dSYM extension).  Later, if you need to debug a release build, you point the debugger at the .dSYM file.  

Perhaps the initial approach you should take for ELF is to go the dumb linker route.  Have the ELF reader produce on Atom for each dwarf section with all the fixups/References needed.  Then the ELF Writer will just concatenate those sections into the output file, and apply the fixups.  

-Nick