[LLVMdev] Status of YAML IO?

Nick Kledzik kledzik at apple.com
Tue Oct 30 10:38:40 PDT 2012


On Oct 30, 2012, at 7:12 AM, Shankar Easwaran wrote:

> Hi Nick,
> 
> I had a few questions :-
> 
> 1) Is there a way to validate that the input file is of a valid format, thats defined by the YAML Reader ? 
Do you mean different than if the yaml reader accepts it?   Tons of files will be valid yaml syntactically.  It is the semantic level checking that is hard, and that is what YAML I/O does. 

> 2) How are you plannning to represent section groups in the YAML ?
You mean the ELF concept of section groups in YAML encoded ELF?   The YAML encoding of ELF (or COFF or mach-o) does not know anything deeper about the meaning of the files.  It is just the bytes from each section and the entries in the symbol table.  If a section group is a section of bytes which are interpreted as an array of symbol/section indexes, then the ELF encoded YAML just has the raw bytes for the section.

> 3) How are you planning to support Atom specific flags ? Is there a way already ?
>     (This would be needed to group similiar atoms together)
It is still an open question how to support platform specific Atom attributes.  As much as possible we'd like to expand the Atom model to be a superset of all the platform specific flags.  But there are some attributes that are very much tied to one platform.  One idea is to just add a new Reference which has no target but its kind (and maybe addend) encode the platform specific attributes.  The Reference kind is already platform specific.


> 4) Are you planning to support representing shared libraries too in this model ?
Yes, we already support shared library atoms in yaml.


> 5) are you planning to support dwarf information too ?
Debugging information is another big open question.  The dwarf format is very much tied to the section model.  Not only is the debug information put is sections with special names, but the dwarf debug into references code by its address in the .o files (the Atom model does not model addresses).    I'm sure the lldb guys have some ideas on direction of where they would like debug information to go.  It may be that the Atom model has a different representation for debug info.  And when generating a final linked image you can choose the debug format you want.  A Writer could convert the debug info to dwarf if requested.  

-Nick

> 
> On 10/29/2012 9:26 PM, Nick Kledzik wrote:
>> Michael,
>> 
>> To validate the refactor of YAML Reader/Writer using YAML I/O.  I updated all the test cases to be compatible with YAML I/O.  One issue that was a gnarly was how to handle the test cases with archives.  Currently, we have test cases like:
>> 
>> ---
>> atoms:
>>     - name: foo
>>      # more stuff
>> ---
>> archive:
>>    - name bar.o 
>>      atoms:
>>        - name:  bar
>>         # more stuff
>> 
>> 
>> This sort of weak/dynamic typing is hard to using with YAML I/O which enforces stronger typing which helps it do better error checking.   The core of the problem is when a new document is started, we don't know what kind of file it is going to be, to know what keys are legal.   I first looked into used tags to specify the document type.  For example:
>> 
>> --- !archive
>> members:
>>    - name: bar.o
>>    # more stuff
>> 
>> But after modifying YAMLParser to make that the tag available, then trying to figure out how to make the tag actionable in the trait, I realized that for maps, the tag is just like another key.  So, if every client agreed that the first key/value was a particular key name (e.g. tag:  type) which YAML I/O already supports, then there is no need for tags and no need for an additional mechanism in YAML I/O.
>> 
>> So, I know have the traits set up to support archives assuming the first (option) key of each document type read by lld will be "kind:".  The archive-basic.objctxt case now looks like:
>> 
>> # RUN: lld-core %s | FileCheck %s
>> 
>> #
>> # Tests archives in YAML. Tests that an undefined in a regular file will load
>> # all atoms in select archive members.
>> #
>> 
>> ---
>> defined-atoms:
>>     - name:              foo
>>       type:              code
>> 
>> undefined-atoms:
>>     - name:              bar
>> 
>> ---
>> kind:                   archive
>> members:
>>   - name:               bar.o
>>     content:
>>       defined-atoms:
>>         - name:              bar
>>           scope:             global
>>           type:              code
>> 
>>         - name:              bar2
>>           type:              code
>> 
>>   - name:               baz.o
>>     content: 
>>       defined-atoms:
>>         - name:              baz
>>           scope:             global
>>           type:              code
>> 
>>         - name:              baz2
>>           type:              code
>> ...
>> 
>> # CHECK:       name:       foo
>> # CHECK-NOT:  undefined-atoms:
>> # CHECK:       name:       bar
>> # CHECK:       name:       bar2
>> # CHECK-NOT:   name:       baz
>> # CHECK:       ...
>> 
>> My thinking is that we can extend this to support embedded COFF/ELF/MachO in yaml by using new kind values.  For example:
>> 
>> ---
>> kind:                   object-coff
>> header:
>>    # stuff 
>> sections:
>>    # stuff 
>> symbols:
>>    # stuff 
>> ...
>> 
>> The MappingTrait<const ld::File*> will look at the kind value and switch off it.   We just need an external function (per file format) which can be called with the same mapping() parameters which will do the io.map*() calls and have traits for platform specific types,  which turns the yaml into an in-memory binary object, then runs the Reader to return a File*.  I'll be prototyping this approach for mach-o.
>> 
>> -Nick
>> 
>> 
>> On Oct 25, 2012, at 9:59 AM, Sean Silva wrote:
>>>> To better understand how a client would use YAML I/O.  I've completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O.  The source code is now about half the size.  But more importantly,  the error checking is much, much better and any time an attribute (e.g. of an Atom) is changed or added, there is just one place to update the yaml code instead of two places (the reader and writer).
>>> 
>>> Fantastic!
>> 
> 
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/db965a52/attachment.html>


More information about the llvm-dev mailing list