[llvm-commits] [PATCH] YAML parser.

Manuel Klimek klimek at google.com
Fri Feb 24 16:40:04 PST 2012


Ok, generally looks good for our use case minus style (which I don't
feel comfortable making calls on). I assume you'll need to get
approval from somebody else...

Cheers,
/Manuel

On Tue, Feb 21, 2012 at 11:12 AM, Michael Spencer <bigcheesegs at gmail.com> wrote:
> On Mon, Feb 20, 2012 at 5:04 PM, Manuel Klimek <klimek at google.com> wrote:
>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp
>> File lib/Support/YAMLParser.cpp (right):
>>
>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
>> lib/Support/YAMLParser.cpp:693: EscapedInput += "\\x" +
>> std::string(HexStr.size() - 2, '0') + HexStr;
>> On 2012/02/16 22:32:42, Bigcheesegs wrote:
>>> On 2012/02/16 15:52:35, klimek wrote:
>>> > Why's the middle part there?
>>>
>>> \x requires two characters after it. This adds the 0 if needed.
>>
>> Ok, shouldn't that be 2 - HexStr.size() then?
>
> Oh, you're right. I fixed that later in the code, but not up here. Fixed.
>
>>
>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
>> lib/Support/YAMLParser.cpp:795: if (   u8d.second != 0
>> On 2012/02/16 22:32:42, Bigcheesegs wrote:
>>> On 2012/02/16 15:52:35, klimek wrote:
>>> > Somehow I cannot match this up with 5.1.
>>> > Especially:
>>> > """On input, a YAML processor must accept all Unicode characters except
>> those
>>> > explicitly excluded above."""
>>> >
>>> > And, why do we care about non-printable chars at all?
>>>
>>> This excludes the characters listed in 5.1 in addition to the other stuff
>>> removed from nb-char.
>>>
>>> This is needed to skip entire UTF-8 minimal well formed subsequences at a time
>>> to keep proper column counts.
>>
>> If I'm not completely misreading, this excludes much more:
>> """To ensure readability, YAML streams use only the printable subset of the
>> Unicode character set. The allowed character range explicitly excludes the C0
>> control block #x0-#x1F (except for TAB #x9, LF #xA, and CR #xD which are
>> allowed), DEL #x7F, the C1 control block #x80-#x9F (except for NEL #x85 which is
>> allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."""
>>
>> The surrogate block cannot actually come out of decodeUTF8.
>
> All of the ranges excluded there are excluded by skip_nb_char. Yes,
> the surrogate block check is redundant. It documents the exact ranges
> that will be accepted.
>
>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#ne...
>> lib/Support/YAMLParser.cpp:1627: switch (UnquotedValue[0]) {
>> On 2012/02/16 22:32:42, Bigcheesegs wrote:
>>> On 2012/02/16 15:52:35, klimek wrote:
>>> > Do we want to support all possible escapes, or can we get away with the
>> subset
>>> > we really need?
>>>
>>> It's only a few extra lines with no extra complexity...
>>
>> It is code that needs to be maintained ... shrug.
>>
>> On Thu, Feb 16, 2012 at 2:38 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
>>> http://codereview.appspot.com/5604054/diff/6001/include/llvm/Support/YAMLParser.h
>>> File include/llvm/Support/YAMLParser.h (right):
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/include/llvm/Support/YAMLParser.h#newcode71
>>> include/llvm/Support/YAMLParser.h:71: std::string escape(StringRef
>>> Input);
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> This is not called anywhere yet(?) Why do we need it? Are we planning
>>>
>>> to write
>>>>
>>>> yaml or do we only need to parse it?
>>>
>>>
>>> It's used when writing out YAML files.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp
>>> File lib/Support/YAMLParser.cpp (right):
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode667
>>> lib/Support/YAMLParser.cpp:667: std::string EscapedInput;
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> Curious: is there a reason not to use a Twine here (which would be my
>>>
>>> default
>>>>
>>>> choice for string concatenation).
>>>
>>>
>>> Twine cannot be used here as it is only valid until the end of the
>>> statement. See Twine.h.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode693
>>> lib/Support/YAMLParser.cpp:693: EscapedInput += "\\x" +
>>> std::string(HexStr.size() - 2, '0') + HexStr;
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> Why's the middle part there?
>>>
>>>
>>> \x requires two characters after it. This adds the 0 if needed.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode694
>>> lib/Support/YAMLParser.cpp:694: } else if (*i & 0x80) { // utf8
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> // utf8
>>>> is confusing to me here:
>>>> I thought we always assume utf8, and 0x80 merely means that we have a
>>>
>>> multi-byte
>>>>
>>>> utf8 sequence...
>>>
>>>
>>>> So I'd expect a comment like:
>>>> // utf8 multi-byte sequence
>>>
>>>
>>> Done.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode788
>>> lib/Support/YAMLParser.cpp:788: if (   *Position == 0x09
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> Indentation off?
>>>
>>>
>>> It lines up with the other side of the || below.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode795
>>> lib/Support/YAMLParser.cpp:795: if (   u8d.second != 0
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> Somehow I cannot match this up with 5.1.
>>>> Especially:
>>>> """On input, a YAML processor must accept all Unicode characters
>>>
>>> except those
>>>>
>>>> explicitly excluded above."""
>>>
>>>
>>>> And, why do we care about non-printable chars at all?
>>>
>>>
>>> This excludes the characters listed in 5.1 in addition to the other
>>> stuff removed from nb-char.
>>>
>>> This is needed to skip entire UTF-8 minimal well formed subsequences at
>>> a time to keep proper column counts.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode839
>>> lib/Support/YAMLParser.cpp:839: if (  (C >= '0' && C <= '9')
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> Just return.
>>>
>>>
>>> Done.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode847
>>> lib/Support/YAMLParser.cpp:847: if (  C == '-'
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> Just return.
>>>
>>>
>>> Done.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode860
>>> lib/Support/YAMLParser.cpp:860: && Current + 2 < End
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> So we're not inc'ing Current by 3 when encountering %NN because NN
>>>
>>> will be
>>>>
>>>> parsable anyway?
>>>
>>>
>>> Yes, however, there's no reason it can't skip all 3.
>>>
>>>
>>> http://codereview.appspot.com/5604054/diff/6001/lib/Support/YAMLParser.cpp#newcode1627
>>> lib/Support/YAMLParser.cpp:1627: switch (UnquotedValue[0]) {
>>> On 2012/02/16 15:52:35, klimek wrote:
>>>>
>>>> Do we want to support all possible escapes, or can we get away with
>>>
>>> the subset
>>>>
>>>> we really need?
>>>
>>>
>>> It's only a few extra lines with no extra complexity...
>>>
>>> http://codereview.appspot.com/5604054/
>>>
>>> - Michael Spencer
>
> Attached is a new patch that includes your changes, and the updated
> decodeUTF8 you posted on IRC.
>
> - Michael Spencer




More information about the llvm-commits mailing list