[LLVMdev] [RFC] YAML I/O

Nick Kledzik kledzik at apple.com
Wed Jul 25 12:43:13 PDT 2012


I've been working on reading and writing yaml encoded documents for the lld project.  Michael Spencer added the YAMLParser.h functionality to llvm/Support to help in parsing yaml documents.  That parser greatly helps at the syntax level, but you still need to hand write a lot of semantic checking and then convert the various node types in to something usable.  

I've developed a layer on top of YAMLParser.h I'm calling YAMLIO.h (yaml I/O) which unifies parsing and writing yaml documents and handles most semantic checking, and is very easy to use!  Basically, you define your yaml document schema as a mix of C++ structs and vectors, and YAMLIO does the rest.   Lets look at a quick example first.  Suppose this is your yaml document:

- name:          Tom
  age:           20
- name:          Richard
  age:           27
  speaks-french: true
- name:          Harry
  age:           23

To read or write such yaml data you would define a C++ type: for the mapping (a struct),  one for the sequence of those mappings (a typedef).  In the struct you add a yamlMapping() method which associates mapping keys with field names and the fields's type. (Note: the yamlMapping() method was inspired by the boost serialize() method).

using llvm::yaml::Sequence;
using llvm::yaml::DocumentList;
using llvm::yaml::IO;
using llvm::yaml::Input;
using llvm::yaml::Output;
using llvm::yaml::YamlMap;

struct Person : public YamlMap {
  StringRef     name;
  uint8_t       age;
  bool          speaks_french;

  void yamlMapping(IO &io) {
    requiredKey(io, name,          "name");
    requiredKey(io, age,           "age");
    optionalKey(io, speaks_french, "speaks-french");
  }
};

typedef Sequence<Person>          PersonList;
typedef DocumentList<PersonList>  PersonDocumentList;

That's it.  The yamlMapping() method is  processed by both the Input and Output to properly handle key-values in a yaml mapping.  The Sequence and DocumentList templates are subclasses of std::vector<>.  

The data structures are regular structs and vectors.  An example of creating them:

  // build a person
  Person a;
  a.name = "Tom";
  a.age = 27;
  a.speaks_french = false;
  // build sequence of persons
  PersonList persons.
  persons.push_back(a);

To write a yaml documents your code looks like:

void dump(PersonList &persons, raw_ostream &out) {
  Output yout(out);
  yout << persons;
}

To read a yaml  document your code looks like:

void readYaml(StringRef filePath) {
  Input yin(filePath);
  DocumentList<PersonList> docList;
  yin >> docList; 
  // if there was an error parsing, message already printed out
  if ( yin.error() ) 
     return;
  
  for(PersonList &pl : docList) {
    for(Person &person : pl) {
      // process each Person
    }
  }
}


YAMLIO also handles semantic error checking for you.  For instance if your document contained an illegal value for a key like:

- name:          Richard
  age:           27
  speaks-french: oui

You would  get an error like:

YAML:6:18: error: invalid boolean
  speaks-french: oui
                 ^~~~

If the document has an key not in your schema like:

- name:          Tom
  pets:          true
  age:           20

You would  get an error like:

YAML:3:18: error: unknown key 'pets'
  pets:          true
  ^~~~

As you see, the model of YAMLIO is that you define intermediate data structures which define your yaml schema.  The job of YAML IO is to convert between those intermediate data structures and yaml documents.  YAMLIO most likely won't be able to convert between your existing native data structures and yaml.  You will probably need to define new intermediate data structures (the schema) and then write code to convert between your native data structures and the intermediate ones.  But that glue code is super simple, mostly just copying fields and iterating lists. All the yaml specific work (formatting and semantic checking) is done by YAMLIO.


In the example above the scalar types (strings, integers, booleans) were all built-in types .  YAMLIO also has support for enumerations and bit masks.  Here is an example of a simple enumeration (color) and a bit mask set (flags).  Suppose your data structures already defines Colors and Flags:

  enum Colors {
     cRed,
     cBlue,
     cGreen
  };
  #define FlagBig     1
  #define FlagLittle  2
  #define FlagRound   4
  #define FlagPointy  8

And you want the yaml documents to use human readable values for colors and flags, rather than just the integer value used internally.  To handle that, you define conversion tables and hand them to YAMLIO.  For instance: 
 
using llvm::yaml::IO;
using llvm::yaml::Input;
using llvm::yaml::Output;
using llvm::yaml::YamlMap;
using llvm::yaml::UniqueValue;
using llvm::yaml::BitValue;

static const UniqueValue<Colors> colorConversions[] = {
  {cRed,     "red"},
  {cBlue,    "blue"},
  {cGreen,   "green"},
  {cRed,      NULL} // default value for optional keys
};

static const BitValue<uint32_t> flagConversions[] = {
  {FlagBig,     "big"},
  {FlagLittle,  "little"},
  {FlagRound,   "round"},
  {FlagPointy,  "pointy"},
  {0,            NULL}
};

struct Test : public YamlMap {
  StringRef     name;
  Color         color;
  uint32_t      flags;

  void yamlMapping(IO &io) {
    requiredKey(io, name,  "name");
    optionalKey(io, color, "color", colorConversions);
    requiredKey(io, flags, "flags", flagConversions);
  }
};

The above defines a yaml mapping with three keys: name, color, and flags.  When writing the color value out, the table colorConversions is used to map the in memory value to a string.  In this case, the color field is marked as optional.  That means when reading the yaml document, if there is no "color:" key, the struct's color field will be filled in with the last value (the one with the NULL string pointer) in the table, in this case the value red.

When writing the flags value out, the table flagConversion is used to convert the bits in the flags field to a sequence of flag values.  

A valid yaml document for this schema is:

- name:          Tom
  color:         blue
  flags:         [ big ]
- name:          Richard
  color:         red
  flags:         [ little, pointy ]
- name:          Harry
  flags:         [ little, round ]


My initial plan was to add YAMLIO  to lld and let it mature there, but a got a request to move this down into llvm for another llvm client to use.   So, I thought I'd see what llvm community thought of this support.

To see a larger example, attached is a sample mach-o object file (for hello world) encoded in yaml along with the YAMLIO based schema for reading or writing those documents.


-Nick


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120725/6d72fd0f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: example.yaml
Type: application/octet-stream
Size: 2987 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120725/6d72fd0f/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120725/6d72fd0f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ObjectIO.h
Type: application/octet-stream
Size: 6124 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120725/6d72fd0f/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120725/6d72fd0f/attachment-0002.html>


More information about the llvm-dev mailing list