[llvm-commits] [PATCH] YAML I/O

Wed Aug 22 18:30:47 PDT 2012

>   template <>
>   struct llvm::yaml::ScalarTrait<Color> {
>     static void doScalar(IO &io, Color &value) {
>       io.beginEnumScalar();
>       io.enumScalarMatch(value, "red",   cRed);
>       io.enumScalarMatch(value, "blue",  cBlue);
>       io.enumScalarMatch(value, "green", cGreen);
>       io.endEnumScalar();
>     }
>   };

To be honest, I was quite fond of the static table based approach that
the original patch was using. What happened to that? My preference is
because the current approach ends up emitting a bunch of code, whereas
the static tables are much more compact and simple (IMO).

If you do go for this more "code" approach, then I would prefer to get
rid of the explicit begin/end here. Instead, I would have an ancillary
type whose constructor does the "begin" stuff and destructor does the
"end" stuff, and on which you call the methods. something like

Helper h(io)
h.enumScalarMatch(value, "red",   cRed);
h.enumScalarMatch(value, "blue",   cBlue);
h.enumScalarMatch(value, "green",   cGreen);
// ~Helper() does the "end" stuff.

> There is no trait for yaml sequences.  Instead, if your data type is a class
> with begin, end, and push_back methods, it is assumed to be a sequence.

This is a really good idea, and really simple!

>     static void mapping(IO &io, const lld::Reference*& ref) {
>       MappingHelper<MyReference, const lld::Reference*> keys(io, ref);
>
>       io.reqKey("kind",           keys->_kind);
>       io.optKey("offset",         keys->_offset);
>       io.optKey("target",         keys->_targetName);
>       io.optKey("addend",         keys->_addend);
>     }

This approach seems insanely convoluted. Why not just use pointer to
member functions? E.g.

struct MapTraits<const lld::Reference*> {
  static void mapping(IO &io, const lld::Reference *&ref) {
    io.reqKey("kind", &lld::Reference::kind, &lld::Reference::setKind);
    ...
  }
};

However, presumably there needs to be some form of dynamic dispatch
here as well, otherwise how will you serialize/deserialize an
arbitrary lld::Reference (where you don't necessarily know the dynamic
type)?

--Sean Silva

On Wed, Aug 22, 2012 at 8:49 PM, Nick Kledzik <kledzik at apple.com> wrote:
> Sean,
>
> I've working on reimplementing YAML I/O to use a traits based approach.  I'm
> using lld's internal object as a test bed.  The File/Atom/Reference objects
> in lld have no public ivars.  Everything is accessed through virtual
> methods.  So, if I can do yaml I/O on those classes just by defining trait
> specializations, then the mechanism should be very adaptable.
>
> I have something working now, but it is all using C++11.  I still need to
> discover what issues will arise when used by C++03.
>
> Here is a flavor of what I have working.  I want to make sure this is the
> right direction:
>
> If you have an enum like:
>
>    enum Color { cRed, cBlue, cGreen };
>
> You can write a trait like this:
>
>   template <>
>   struct llvm::yaml::ScalarTrait<Color> {
>     static void doScalar(IO &io, Color &value) {
>       io.beginEnumScalar();
>       io.enumScalarMatch(value, "red",   cRed);
>       io.enumScalarMatch(value, "blue",  cBlue);
>       io.enumScalarMatch(value, "green", cGreen);
>       io.endEnumScalar();
>     }
>   };
>
> Which describes how to convert the in-memory enum value to a yaml scalar and
> back.  I'm also working on a way that you can do arbitrary conversion of
> scalars.
>
>
> If you have a simple POD struct like this:
>
> struct MyInfo {
>   int    hat_size;
>   int    age;
>   Color  hat_color;
> };
>
> You can write a trait like this:
>
> template <>
> struct llvm::yaml::MapTraits<MyInfo> {
>   static void mapping(IO &io, MyInfo& info) {
>     io.reqKey("hat-size",    info.hat_size);
>     io.optKey("age",         info.age,         21);
>     io.optKey("hat-color",   hat_color,        cBlue);
>   }
> };
>
> Which is used to both read and write yaml.   The "age" and "hat-color" keys
> are optional in yaml.  If not specified (in yaml), they default to 21 and
> cBlue. The "hat-size" key is required, and you will get an error it is not
> present in the yaml.
>
> There is no trait for yaml sequences.  Instead, if your data type is a class
> with begin, end, and push_back methods, it is assumed to be a sequence.
>
>
> Now, the interesting case is the handling of non-POD data types.  The
> reqKey() and optKey() methods need a lvalue to they can be read (when
> creating yaml) and write (when parsing yaml).  It may also be the case that
> your existing data structures is not a container of structs, but a container
> of pointers to structs.  But in both those cases, you want to be able to
> have the same yaml representation.  Lastly, in the parsing yaml case, you
> need to be able to instantiate an internal object, whereas the writing yaml
> case needs to examine an existing object.
>
> Here is an example of the lld Reference type and the trait for converting it
> to and from yaml:
>
>   template <>
>   struct MapTraits<const lld::Reference*> {
>
>     class MyReference : public lld::Reference {
>     public:
>       MyReference()
>         : _target(nullptr), _targetName(), _offset(0), _addend(0) , _kind(0)
> {
>       }
>       MyReference(const lld::Reference* ref)
>         : _target(nullptr),
>         _targetName(ref->target() ? ref->target()->name() : ""),
>         _offset(ref->offsetInAtom()),
>         _addend(ref->addend()),
>         _kind(ref->kind()) {
>       }
>
>       virtual uint64_t         offsetInAtom() const { return _offset; }
>       virtual Kind             kind() const         { return _kind; }
>       virtual const lld::Atom *target() const       { return _target; }
>       virtual Addend           addend() const       { return _addend; }
>       virtual void             setKind(Kind k)      { _kind = k; }
>       virtual void             setAddend(Addend a)  { _addend = a; }
>       virtual void             setTarget(const lld::Atom *a) { _target = a;
> }
>
>
>
>       const lld::Atom *_target;
>       StringRef        _targetName;
>       uint32_t         _offset;
>       Addend           _addend;
>       Kind             _kind;
>     };
>
>
>     static void mapping(IO &io, const lld::Reference*& ref) {
>       MappingHelper<MyReference, const lld::Reference*> keys(io, ref);
>
>       io.reqKey("kind",           keys->_kind);
>       io.optKey("offset",         keys->_offset);
>       io.optKey("target",         keys->_targetName);
>       io.optKey("addend",         keys->_addend);
>     }
>
>
>
>   };
>
> Some salient points:
> * The trait is on "const lld::Reference*" because only pointers to
> References are passed around inside lld.
> * The lld class Reference in an abstract base class, so a concrete instance
> must be defined (MyReference).
> * There are two constructors for MyReference.  The default  constructor is
> used when parsing yaml to create the initial object which is then overridden
> as key/values are found in yaml.  The other constructor is used when writing
> yaml to create a temporary (stack) instance which contains the fields needed
> for mapping() to access.
> * MappingHelper<> is a utility which detects if you are reading or writing
> and constructs the appropriate object.  It is only needed for non-POD
> structs.
>
> -Nick
>
> On Aug 8, 2012, at 6:34 PM, Sean Silva wrote:
>
> Your suggestion is to remove the intermediate data structures and instead
> define the schema via external trait templates.   I can see how this would
> seem easier (not having to write glue code to copy to and from the
> intermediate data types).  But that copying also does normalization.  For
> instance, your native object may have two ivars that together make one yaml
> key-value, or one ivar is best represented a as couple of yaml key-values.
> Or your sequence may have a preferred sort order in yaml, but that is not
> the actual list order in memory.
>
>
> I don't get what you're saying here. A traits class can easily handle
> all those conversions easily.
>
> It would look something like:
>
> template<>
> class YamlMapTraits<Person> {
>  void yamlMapping(IO &io, Person *P) {
>    requiredKey(io, &P->name, "name");
>    optionalKey(io, &P->hatSize, "hat-size");
>  }
> };
>
> Here I was just trying to mimic the one of the examples from your
> documentation so the feel should be similar. However, the door is open
> to specifying the correspondence however you really want in the traits
> class.
>
> I think the hard part of a traits approach is figuring out how clients will
> write the normalization code.  And how to make the difficulty of that code
> scale to how denormalized the native objects are.
>
>
> One possibility I can think of off the top of my head is to have the
> traits class declare a private intermediate struct which it
> deserializes to (similar to the intermediate that the current API
> _forces_ you to have), and then just construct the object from the
> intermediate. It's so much more flexible to do this with a traits
> class.
>
> --Sean Silva
>
> On Wed, Aug 8, 2012 at 5:34 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
>
> On Aug 8, 2012, at 12:46 PM, Sean Silva wrote:
>
>
> But EnumValue is not quite right because it can be used with #defines too.
>
>
> Do we really want to encourage people to use #defines? Is there any
>
> set of constants in the LLVM tree which are defined with #defines and
>
> not in an enum?
>
>
>
> I'm not sure what you mean by traits-based in this context.
>
>
> A traits-based design means that you have a class template which
>
> provides a collection of type-specific information which is provided
>
> by specializing the class template for a particular type. For example,
>
> see include/llvm/ADT/GraphTraits.h, which uses GraphTraits<T> to
>
> specify how to adapt T to a common interface that graph algorithms can
>
> use. This is noninvasive (maybe needing a friend declaration at most).
>
> Your current approach using inheritance and virtual functions is
>
> invasive, forces the serializable class to inherit (causing multiple
>
> inheritance in the case that the serializable class already has a
>
> base), and forces the serializable class to suddenly have virtual
>
> functions.
>
>
> Overall, I think a traits-based design would be simpler, more loosely
>
> coupled, and seems to fit the use case more naturally.
>
> I as wrote in the documentation this was not intended to allow you to go
> directly from existing data structures to yaml and back.  Instead the schema
> "language" is written in terms of new data structure declarations (subclass
> of YamlMap and specialization of Sequence<>).
>
>
> Your suggestion is to remove the intermediate data structures and instead
> define the schema via external trait templates.   I can see how this would
> seem easier (not having to write glue code to copy to and from the
> intermediate data types).  But that copying also does normalization.  For
> instance, your native object may have two ivars that together make one yaml
> key-value, or one ivar is best represented a as couple of yaml key-values.
> Or your sequence may have a preferred sort order in yaml, but that is not
> the actual list order in memory.
>
>
> I think the hard part of a traits approach is figuring out how clients will
> write the normalization code.  And how to make the difficulty of that code
> scale to how denormalized the native objects are.
>
>
> I'll play around with this idea and see what works and what does not.
>
>
> -Nick
>
>
>
>
> On Tue, Aug 7, 2012 at 4:57 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
> On Aug 7, 2012, at 2:07 PM, Sean Silva wrote:
>
> Thanks for writing awesome docs!
>
>
> +Sometime sequences are known to be short and the one entry per line is too
>
> +verbose, so YAML offers an alternate syntax for sequences called a "Flow
>
> +Sequence" in which you put comma separated sequence elements into square
>
> +brackets.  The above example could then be simplified to :
>
>
> It's probably worth mentioning here that the "Flow" syntax is
>
> (exactly?) JSON. Also, noting that JSON is a proper subset of YAML is
>
> in general is probably worth mentioning.
>
>
> +   .. code-block:: none
>
>
> pygments (and hence Sphinx) supports `yaml` highlighting
>
> <http://pygments.org/docs/lexers/>
>
>
> +the following document:
>
> +
>
> +   .. code-block:: none
>
>
> The precedent for code listings is generally that the `..
>
> code-block::` is at the same level of indentation as the paragraph
>
> introducing it.
>
>
> +You can combine mappings and squences by indenting.  For example a sequence
>
> +of mappings in which one of the mapping values is itself a sequence:
>
>
> s/squences/sequences/
>
>
> +of a new document is denoted with "---".  So in order for Input to handle
>
> +multiple documents, it operators on an llvm::yaml::Document<>.
>
>
> s/operators/operates/
>
>
> +can set values in the context in the outer map's yamlMapping() method and
>
> +retrive those values in the inner map's yamlMapping() method.
>
>
> s/retrive/retrieve/
>
>
> +of a new document is denoted with "---".  So in order for Input to handle
>
>
> For clarity, I would put the --- in monospace (e.g. "``---``"), here
>
> and in other places.
>
> Thanks for the Sphinx tips.  I've incorporated them and ran a spell checker
> too ;-)
>
>
>
>
> +UniqueValue
>
> +-----------
>
>
> I think that EnumValue be more self-documenting than UniqueValue.
>
> I'm happy to give UniqueValue a better name.  But EnumValue is not quite
> right because it can be used with #defines too.  The real constraint is that
> there be a one-to-one mapping of strings to values.    I want it to contrast
> with BitValue which maps a set (sequence) of strings to a set of values
> OR'ed together.
>
>
>
>
> At a design level, what are the pros/cons of this approach compared
>
> with a traits-based approach? What made you choose this design versus
>
> a traits-based approach?
>
>
> I'm not sure what you mean by traits-based in this context.    The back
> story is that for lld I've been writing code to read and write yaml
> documents.  Michael's YAMLParser.h certainly makes reading more robust, but
> there is still tons of (semantic level) error checking you to hand code.  It
> seemed like most of my code was checking for errors.  Also it was a pain to
> keep the yaml reading code is sync with yaml writing code.
>
>
> What we really need was a way to describe the schema of the yaml documents
> and have some tool generate the code to read and write.  There is a tool
> called Kwalify which defines a way to express a yaml schema and can check
> it.  But it has a number of limitations.
>
>
> Last month a wrote up a proposal for defining a yaml schema language and a
> tool that would use that schema to generate C++ code to read/validate and
> write yaml conforming to the schema.  The best feedback I got  (from Daniel
> Dunbar) was that rather than create another language (yaml schema language)
> and tools, to try to see if you could express the schema in C++ directly,
> using meta-programming or whatever.   I looked at Boost serialization for
> inspiration and came up with this Yaml I/O library.
>
>
> -Nick
>
>
>
>
> On Mon, Aug 6, 2012 at 12:17 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
> Attached is a patch for review which implements the Yaml I/O library I
> proposed on llvm-dev July 25th.
>
>
>
>
>
> The patch includes the implementation, test cases, and documentation.
>
>
> I've included a PDF of the documentation, so you don't have to install the
> patch and run sphinx to read it.
>
>
>
>
> There are probably more aspects of yaml we can support in YAML I/O, but the
> current patch is enough to support my needs for encoding mach-o as yaml for
> lld test cases.
>
>
> I was initially planning on just adding this code to lld, but I've had two
> requests to push it down into llvm.
>
>
> Again, here are examples of the mach-o schema and an example mach-o
> document:
>
>
>
>
>
>
>
> -Nick
>
>
>
>
>
> _______________________________________________
>
> llvm-commits mailing list
>
> llvm-commits at cs.uiuc.edu
>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
>
>