[llvm-commits] [PATCH] YAML parser.

Thu Feb 16 07:54:13 PST 2012

On Fri, Feb 10, 2012 at 12:40 AM, Michael Spencer <bigcheesegs at gmail.com> wrote:
> Attached is the latest patch. Implements most of your comments, and
> the items discussed on IRC.
>
> - Michael Spencer

http://codereview.appspot.com/5604054/diff/1/include/llvm/Support/YAMLParser.h
File include/llvm/Support/YAMLParser.h (right):

http://codereview.appspot.com/5604054/diff/1/include/llvm/Support/YAMLParser....
include/llvm/Support/YAMLParser.h:161: /// others) before the SimpleKey's Tok.
On 2012/02/05 05:52:50, Bigcheesegs wrote:
> On 2012/02/02 10:19:22, klimek wrote:
> > In general, this comment doesn't really help me understand the concept. It
> > basically explains some coupling, but that looks pretty random when I'm
> > contextually at this part of the file.
>
> Agreed, I'll try moving this with the rest of the scanner over to another
> header.

My comment was more about the comment here... I'd change the comment to explain
where simple keys come from.

http://codereview.appspot.com/5604054/diff/1/include/llvm/Support/YAMLParser....
include/llvm/Support/YAMLParser.h:191: if (Pos >= End)
On 2012/02/05 05:52:50, Bigcheesegs wrote:
> On 2012/02/02 10:19:22, klimek wrote:
> > Why can that happen?
>
> There are some places where it errors at the end of a file. For example it
> expected something that isn't there. So instead of having clients correct this
> before calling, I correct it here.

I'm still not sure I understand why:
- Pos > End is possible
- in the case where Pos == End we don't want to point *after* the last character

http://codereview.appspot.com/5604054/diff/1/include/llvm/Support/YAMLParser....
include/llvm/Support/YAMLParser.h:477: // These functions forward to Document
and Scanner.
On 2012/02/05 05:52:50, Bigcheesegs wrote:
> On 2012/02/02 10:19:22, klimek wrote:
> > This makes the Node interface pretty large... Is that really necessary?
>
> I don't know how to simplify this.

Inline the methods?

http://codereview.appspot.com/5604054/diff/1/include/llvm/Support/YAMLParser....
include/llvm/Support/YAMLParser.h:493: static inline bool classof(const NullNode
*) { return true; }
On 2012/02/05 05:52:50, Bigcheesegs wrote:
> On 2012/02/02 10:19:22, klimek wrote:
> > Does the 'inline' keyword actually change anything here?
>
> Not sure, this is how the function is defined elsewhere in llvm.

I'd propose to take it out, as no-op code makes people like me wonder whether
there's a reason it's there after all.

http://codereview.appspot.com/5604054/diff/1/include/llvm/Support/YAMLParser....
include/llvm/Support/YAMLParser.h:512: // utf8).
On 2012/02/05 05:52:50, Bigcheesegs wrote:
> On 2012/02/02 10:19:22, klimek wrote:
> > Why not whatever the original encoding was?
>
> Because then the parser would be required to understand these other encodings.
> The idea is to convert everything to UTF-8 before parsing. This also
simplifies
> client code.

My gut feeling is not yet settled on whether it disagrees, but let's try it your
way first :)

http://codereview.appspot.com/5604054/diff/6001/include/llvm/Support/YAMLPars...
File include/llvm/Support/YAMLParser.h (right):

http://codereview.appspot.com/5604054/diff/6001/include/llvm/Support/YAMLPars...
include/llvm/Support/YAMLParser.h:71: std::string escape(StringRef Input);
This is not called anywhere yet(?) Why do we need it? Are we planning to write
yaml or do we only need to parse it?

http://codereview.appspot.com/5604054/diff/6001/include/llvm/Support/YAMLPars...
include/llvm/Support/YAMLParser.h:90: OwningPtr<Scanner> scanner;
I would strongly argue for consistency and rather see a getScanner() method than
being inconsistent in naming here.

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp
File lib/Support/YAMLParser.cpp (right):

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:667: std::string EscapedInput;
Curious: is there a reason not to use a Twine here (which would be my default
choice for string concatenation).

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:693: EscapedInput += "\\x" +
std::string(HexStr.size() - 2, '0') + HexStr;
Why's the middle part there?

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:694: } else if (*i & 0x80) { // utf8
// utf8
is confusing to me here:
I thought we always assume utf8, and 0x80 merely means that we have a multi-byte
utf8 sequence...

So I'd expect a comment like:
// utf8 multi-byte sequence

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:788: if (   *Position == 0x09
Indentation off?

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:795: if (   u8d.second != 0
Somehow I cannot match this up with 5.1.
Especially:
"""On input, a YAML processor must accept all Unicode characters except those
explicitly excluded above."""

And, why do we care about non-printable chars at all?

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:839: if (  (C >= '0' && C <= '9')
Just return.

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:847: if (  C == '-'
Just return.

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:860: && Current + 2 < End
So we're not inc'ing Current by 3 when encountering %NN because NN will be
parsable anyway?

http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
lib/Support/YAMLParser.cpp:1627: switch (UnquotedValue[0]) {
Do we want to support all possible escapes, or can we get away with the subset
we really need?