[llvm-commits] [PATCH] YAML I/O

Sean Silva silvas at purdue.edu
Wed Aug 8 18:34:34 PDT 2012


> Your suggestion is to remove the intermediate data structures and instead define the schema via external trait templates.   I can see how this would seem easier (not having to write glue code to copy to and from the intermediate data types).  But that copying also does normalization.  For instance, your native object may have two ivars that together make one yaml key-value, or one ivar is best represented a as couple of yaml key-values.  Or your sequence may have a preferred sort order in yaml, but that is not the actual list order in memory.

I don't get what you're saying here. A traits class can easily handle
all those conversions easily.

It would look something like:

template<>
class YamlMapTraits<Person> {
  void yamlMapping(IO &io, Person *P) {
    requiredKey(io, &P->name, "name");
    optionalKey(io, &P->hatSize, "hat-size");
  }
};

Here I was just trying to mimic the one of the examples from your
documentation so the feel should be similar. However, the door is open
to specifying the correspondence however you really want in the traits
class.

> I think the hard part of a traits approach is figuring out how clients will write the normalization code.  And how to make the difficulty of that code scale to how denormalized the native objects are.

One possibility I can think of off the top of my head is to have the
traits class declare a private intermediate struct which it
deserializes to (similar to the intermediate that the current API
_forces_ you to have), and then just construct the object from the
intermediate. It's so much more flexible to do this with a traits
class.

--Sean Silva

On Wed, Aug 8, 2012 at 5:34 PM, Nick Kledzik <kledzik at apple.com> wrote:
>
> On Aug 8, 2012, at 12:46 PM, Sean Silva wrote:
>
>>> But EnumValue is not quite right because it can be used with #defines too.
>>
>> Do we really want to encourage people to use #defines? Is there any
>> set of constants in the LLVM tree which are defined with #defines and
>> not in an enum?
>>
>>
>>> I'm not sure what you mean by traits-based in this context.
>>
>> A traits-based design means that you have a class template which
>> provides a collection of type-specific information which is provided
>> by specializing the class template for a particular type. For example,
>> see include/llvm/ADT/GraphTraits.h, which uses GraphTraits<T> to
>> specify how to adapt T to a common interface that graph algorithms can
>> use. This is noninvasive (maybe needing a friend declaration at most).
>> Your current approach using inheritance and virtual functions is
>> invasive, forces the serializable class to inherit (causing multiple
>> inheritance in the case that the serializable class already has a
>> base), and forces the serializable class to suddenly have virtual
>> functions.
>>
>> Overall, I think a traits-based design would be simpler, more loosely
>> coupled, and seems to fit the use case more naturally.
> I as wrote in the documentation this was not intended to allow you to go directly from existing data structures to yaml and back.  Instead the schema "language" is written in terms of new data structure declarations (subclass of YamlMap and specialization of Sequence<>).
>
> Your suggestion is to remove the intermediate data structures and instead define the schema via external trait templates.   I can see how this would seem easier (not having to write glue code to copy to and from the intermediate data types).  But that copying also does normalization.  For instance, your native object may have two ivars that together make one yaml key-value, or one ivar is best represented a as couple of yaml key-values.  Or your sequence may have a preferred sort order in yaml, but that is not the actual list order in memory.
>
> I think the hard part of a traits approach is figuring out how clients will write the normalization code.  And how to make the difficulty of that code scale to how denormalized the native objects are.
>
> I'll play around with this idea and see what works and what does not.
>
> -Nick
>
>
>>
>> On Tue, Aug 7, 2012 at 4:57 PM, Nick Kledzik <kledzik at apple.com> wrote:
>>> On Aug 7, 2012, at 2:07 PM, Sean Silva wrote:
>>>> Thanks for writing awesome docs!
>>>>
>>>> +Sometime sequences are known to be short and the one entry per line is too
>>>> +verbose, so YAML offers an alternate syntax for sequences called a "Flow
>>>> +Sequence" in which you put comma separated sequence elements into square
>>>> +brackets.  The above example could then be simplified to :
>>>>
>>>> It's probably worth mentioning here that the "Flow" syntax is
>>>> (exactly?) JSON. Also, noting that JSON is a proper subset of YAML is
>>>> in general is probably worth mentioning.
>>>>
>>>> +   .. code-block:: none
>>>>
>>>> pygments (and hence Sphinx) supports `yaml` highlighting
>>>> <http://pygments.org/docs/lexers/>
>>>>
>>>> +the following document:
>>>> +
>>>> +   .. code-block:: none
>>>>
>>>> The precedent for code listings is generally that the `..
>>>> code-block::` is at the same level of indentation as the paragraph
>>>> introducing it.
>>>>
>>>> +You can combine mappings and squences by indenting.  For example a sequence
>>>> +of mappings in which one of the mapping values is itself a sequence:
>>>>
>>>> s/squences/sequences/
>>>>
>>>> +of a new document is denoted with "---".  So in order for Input to handle
>>>> +multiple documents, it operators on an llvm::yaml::Document<>.
>>>>
>>>> s/operators/operates/
>>>>
>>>> +can set values in the context in the outer map's yamlMapping() method and
>>>> +retrive those values in the inner map's yamlMapping() method.
>>>>
>>>> s/retrive/retrieve/
>>>>
>>>> +of a new document is denoted with "---".  So in order for Input to handle
>>>>
>>>> For clarity, I would put the --- in monospace (e.g. "``---``"), here
>>>> and in other places.
>>> Thanks for the Sphinx tips.  I've incorporated them and ran a spell checker too ;-)
>>>
>>>
>>>>
>>>> +UniqueValue
>>>> +-----------
>>>>
>>>> I think that EnumValue be more self-documenting than UniqueValue.
>>> I'm happy to give UniqueValue a better name.  But EnumValue is not quite right because it can be used with #defines too.  The real constraint is that there be a one-to-one mapping of strings to values.    I want it to contrast with BitValue which maps a set (sequence) of strings to a set of values OR'ed together.
>>>
>>>
>>>
>>>> At a design level, what are the pros/cons of this approach compared
>>>> with a traits-based approach? What made you choose this design versus
>>>> a traits-based approach?
>>>
>>> I'm not sure what you mean by traits-based in this context.    The back story is that for lld I've been writing code to read and write yaml documents.  Michael's YAMLParser.h certainly makes reading more robust, but there is still tons of (semantic level) error checking you to hand code.  It seemed like most of my code was checking for errors.  Also it was a pain to keep the yaml reading code is sync with yaml writing code.
>>>
>>> What we really need was a way to describe the schema of the yaml documents and have some tool generate the code to read and write.  There is a tool called Kwalify which defines a way to express a yaml schema and can check it.  But it has a number of limitations.
>>>
>>> Last month a wrote up a proposal for defining a yaml schema language and a tool that would use that schema to generate C++ code to read/validate and write yaml conforming to the schema.  The best feedback I got  (from Daniel Dunbar) was that rather than create another language (yaml schema language) and tools, to try to see if you could express the schema in C++ directly, using meta-programming or whatever.   I looked at Boost serialization for inspiration and came up with this Yaml I/O library.
>>>
>>> -Nick
>>>
>>>
>>>>
>>>> On Mon, Aug 6, 2012 at 12:17 PM, Nick Kledzik <kledzik at apple.com> wrote:
>>>>> Attached is a patch for review which implements the Yaml I/O library I proposed on llvm-dev July 25th.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The patch includes the implementation, test cases, and documentation.
>>>>>
>>>>> I've included a PDF of the documentation, so you don't have to install the patch and run sphinx to read it.
>>>>>
>>>>>
>>>>>
>>>>> There are probably more aspects of yaml we can support in YAML I/O, but the current patch is enough to support my needs for encoding mach-o as yaml for lld test cases.
>>>>>
>>>>> I was initially planning on just adding this code to lld, but I've had two requests to push it down into llvm.
>>>>>
>>>>> Again, here are examples of the mach-o schema and an example mach-o document:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -Nick
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>
>




More information about the llvm-commits mailing list