[PATCH] YAML: Implement block scalar parsing

Wed May 6 14:57:24 PDT 2015

2015-05-06 13:27 GMT-07:00 Duncan P. N. Exon Smith <dexonsmith at apple.com>:

>
> > On 2015 May 5, at 15:57, Alex Lorenz <arphaman at gmail.com> wrote:
> >
> > Hi bogner, bob.wilson, dexonsmith,
> >
> > This patch implements the parsing of YAML block scalars. Some code
> existed there before, but it couldn't parse block scalars.
> > This patch is based on the patch 'http://reviews.llvm.org/D9448' that
> enables the YAMLParser tests.
> > I've added a new yaml node type to represent the block scalar values.
> > I've deleted the files 'spec-09-27.test' and 'spec-09-28.test' as they
> are identical to the file 'spec-09-26.test'.
> >
> > REPOSITORY
> >  rL LLVM
> >
> > http://reviews.llvm.org/D9503
> >
> > Files:
> >  include/llvm/Support/YAMLParser.h
> >  lib/Support/YAMLParser.cpp
> >  test/YAMLParser/spec-09-14.test
> >  test/YAMLParser/spec-09-18.test
> >  test/YAMLParser/spec-09-19.test
> >  test/YAMLParser/spec-09-20.test
> >  test/YAMLParser/spec-09-21.test
> >  test/YAMLParser/spec-09-22.test
> >  test/YAMLParser/spec-09-24.test
> >  test/YAMLParser/spec-09-25.test
> >  test/YAMLParser/spec-09-26.test
> >  test/YAMLParser/spec-09-27.test
> >  test/YAMLParser/spec-09-28.test
> >  test/YAMLParser/spec-09-29.test
> >  test/YAMLParser/spec-09-30.test
> >  test/YAMLParser/spec-09-31.test
> >  test/YAMLParser/spec-09-32.test
> >  test/YAMLParser/spec-09-33.test
> >  unittests/Support/YAMLParserTest.cpp
> >  utils/yaml-bench/YAMLBench.cpp
> >
> > EMAIL PREFERENCES
> >  http://reviews.llvm.org/settings/panel/emailpreferences/
> > <D9503.24963.patch>
> > Index: include/llvm/Support/YAMLParser.h
> > ===================================================================
> > --- include/llvm/Support/YAMLParser.h
> > +++ include/llvm/Support/YAMLParser.h
> > @@ -107,6 +107,7 @@
> >    enum NodeKind {
> >      NK_Null,
> >      NK_Scalar,
> > +    NK_BlockScalar,
> >      NK_KeyValue,
> >      NK_Mapping,
> >      NK_Sequence,
> > @@ -222,6 +223,36 @@
> >                                   SmallVectorImpl<char> &Storage) const;
> >  };
> >
> > +/// \brief A block scalar node is an opaque datum that can be presented
> as a
> > +///        series of zero or more Unicode scalar values.
> > +///
> > +/// Example:
> > +///   |
> > +///     Hello
> > +///     World
> > +class BlockScalarNode : public Node {
> > +  void anchor() override;
> > +
> > +public:
> > +  BlockScalarNode(std::unique_ptr<Document> &D, StringRef Anchor,
> StringRef Tag,
> > +                  std::string &Value, StringRef RawVal)
> > +      : Node(NK_BlockScalar, D, Anchor, Tag), Value(std::move(Value)) {
> > +    SMLoc Start = SMLoc::getFromPointer(RawVal.begin());
> > +    SMLoc End = SMLoc::getFromPointer(RawVal.end());
> > +    SourceRange = SMRange(Start, End);
> > +  }
> > +
> > +  /// \brief Gets the value of this node as a StringRef.
> > +  StringRef getValue() const { return Value; }
> > +
> > +  static inline bool classof(const Node *N) {
> > +    return N->getType() == NK_BlockScalar;
> > +  }
> > +
> > +private:
> > +  std::string Value;
> > +};
> > +
> >  /// \brief A key and value pair. While not technically a Node under the
> YAML
> >  ///        representation graph, it is easier to treat them this way.
> >  ///
> > @@ -252,8 +283,10 @@
> >    Node *getValue();
> >
> >    void skip() override {
> > -    getKey()->skip();
> > -    getValue()->skip();
> > +    if (Node *Key = getKey())
> > +      Key->skip();
> > +    if (Node *Val = getValue())
> > +      Val->skip();
>
> I'm probably missing something obvious, but why do you need to
> conditionalize these?  Do they both need to be conditional?  (If not,
> can you change the other to an `assert()`?)
>
> If it's possible to commit this separately along with a testcase (or
> unit test) that exposes the bug, please do.  Otherwise please indicate
> in this commit message which test required the change.
>

This came up in one of my block scalar parsing unit tests. In particular,
this test case failed:

  ExpectParseError("Invalid block scalar header", "test: | failure");

Because the block scalar failed and returned false, the key value node for
the key 'test' didn't
actually have a value, so it crashed when the method skip was called during
unit testing.

I think that you're right, it would be better to create a separate patch
for this issue. I will
create a new patch with a test case that deals with just this problem so
that this patch
won't have to fix this bug.

>
> >    }
> >
> >    static inline bool classof(const Node *N) {
> > Index: lib/Support/YAMLParser.cpp
> > ===================================================================
> > --- lib/Support/YAMLParser.cpp
> > +++ lib/Support/YAMLParser.cpp
> > @@ -101,6 +101,7 @@
> >  void Node::anchor() {}
> >  void NullNode::anchor() {}
> >  void ScalarNode::anchor() {}
> > +void BlockScalarNode::anchor() {}
> >  void KeyValueNode::anchor() {}
> >  void MappingNode::anchor() {}
> >  void SequenceNode::anchor() {}
> > @@ -128,6 +129,7 @@
> >      TK_Key,
> >      TK_Value,
> >      TK_Scalar,
> > +    TK_BlockScalar,
> >      TK_Alias,
> >      TK_Anchor,
> >      TK_Tag
> > @@ -137,6 +139,9 @@
> >    /// of the token in the input.
> >    StringRef Range;
> >
> > +  /// The value of a block scalar node.
> > +  std::string Value;
> > +
> >    Token() : Kind(TK_Error) {}
> >  };
> >  }
> > @@ -348,6 +353,14 @@
> >    ///          b-break.
> >    StringRef::iterator skip_b_break(StringRef::iterator Position);
> >
> > +  /// @brief Skip a single s-space[31] starting at Position.
> > +  ///
> > +  /// An s-space is 0x20
> > +  ///
> > +  /// @returns The code unit after the s-space, or Position if it's not
> a
> > +  ///          s-space.
> > +  StringRef::iterator skip_s_space(StringRef::iterator Position);
> > +
> >    /// @brief Skip a single s-white[33] starting at Position.
> >    ///
> >    /// A s-white is 0x20 | 0x9
> > @@ -417,6 +430,9 @@
> >                   , Token::TokenKind Kind
> >                   , TokenQueueT::iterator InsertPoint);
> >
> > +  /// @brief Skip a single-line comment if it's present on the current
> line.
> > +  void skipComment();
> > +
> >    /// @brief Skip whitespace and comments until the start of the next
> token.
> >    void scanToNextToken();
> >
> > @@ -608,6 +624,9 @@
> >      case Token::TK_Scalar:
> >        OS << "Scalar: ";
> >        break;
> > +    case Token::TK_BlockScalar:
> > +      OS << "Block Scalar: ";
> > +      break;
> >      case Token::TK_Alias:
> >        OS << "Alias: ";
> >        break;
> > @@ -812,6 +831,13 @@
> >    return Position;
> >  }
> >
> > +StringRef::iterator Scanner::skip_s_space(StringRef::iterator Position)
> {
> > +  if (Position == End)
> > +    return Position;
> > +  if (*Position == ' ')
> > +    return Position + 1;
> > +  return Position;
> > +}
> >
> >  StringRef::iterator Scanner::skip_s_white(StringRef::iterator Position)
> {
> >    if (Position == End)
> > @@ -967,24 +993,27 @@
> >    return true;
> >  }
> >
> > +void Scanner::skipComment() {
> > +  if (*Current != '#')
> > +    return;
> > +  while (true) {
> > +    // This may skip more than one byte, thus Column is only incremented
> > +    // for code points.
> > +    StringRef::iterator i = skip_nb_char(Current);
>
> LLVM style would spell this with an uppercase, as in `I`.  I know you
> copy/pasted, but you can fix it in transit ;).
>
> > +    if (i == Current)
> > +      break;
> > +    Current = i;
> > +    ++Column;
> > +  }
> > +}
>
> Looks like you pulled this directly from `scanToNextToken()`.  Please do
> that in a prep-commit.
>
> Also see my note below about whether you pulled the right bits.
>

Yes, I extracted the method 'skipComment' from 'scanToNextToken' as I
wanted to reuse it in 'scanBlockScalar'.
You're right, It would be better to make it a separate NFC commit. I will
update the patch and commit the NFC extraction
of the 'skipComment' method with updates to 'i' to conform to the LLVM
style.

>
> > +
> >  void Scanner::scanToNextToken() {
> >    while (true) {
> >      while (*Current == ' ' || *Current == '\t') {
> >        skip(1);
> >      }
> >
> > -    // Skip comment.
> > -    if (*Current == '#') {
> > -      while (true) {
> > -        // This may skip more than one byte, thus Column is only
> incremented
> > -        // for code points.
> > -        StringRef::iterator i = skip_nb_char(Current);
> > -        if (i == Current)
> > -          break;
> > -        Current = i;
> > -        ++Column;
> > -      }
> > -    }
> > +    skipComment();
> >
> >      // Skip EOL.
> >      StringRef::iterator i = skip_b_break(Current);
> > @@ -1368,37 +1397,196 @@
> >  }
> >
> >  bool Scanner::scanBlockScalar(bool IsLiteral) {
> > -  StringRef::iterator Start = Current;
> > -  skip(1); // Eat | or >
> > -  while(true) {
> > -    StringRef::iterator i = skip_nb_char(Current);
> > -    if (i == Current) {
> > -      if (Column == 0)
> > -        break;
> > -      i = skip_b_break(Current);
> > -      if (i != Current) {
> > -        // We got a line break.
> > -        Column = 0;
> > -        ++Line;
> > -        Current = i;
> > -        continue;
> > -      } else {
> > -        // There was an error, which should already have been printed
> out.
> > -        return false;
> > -      }
> > -    }
> > -    Current = i;
> > -    ++Column;
> > +  std::string Str;
> > +  raw_string_ostream OS(Str);
> > +  auto BlockStart = Current;
> > +  // If this is true, then the scanner has to look for a first
> non-blank line
> > +  // to determine the indenation of the block scalar.
> > +  bool LookingForIndent = true;
> > +  unsigned BlockIndent = 0;
> > +  unsigned BlockExitIndent = Indent < 0 ? 0 : (unsigned)Indent;
> > +  unsigned LineBreaks = 0;
> > +
> > +  unsigned MaxAllSpaceLineCharacters = 0;
> > +  StringRef::iterator LongestAllSpaceLine;
> > +
> > +  bool FoldNextLine = false;
> > +
> > +  skip(1); // Eat '|' or '>'
> > +
> > +  // Block Header ([162] c-b-block-header)
> > +  // Chomping indicator ([164] c-chomping-indicator)
> > +  char ChompingIndicator = ' ';
> > +  if (Current != End && (*Current == '+' || *Current == '-')) {
> > +    ChompingIndicator = *Current;
> > +    skip(1);
> > +  }
> > +  // Indentation indicator ([163] c-indentation-indicator)
> > +  if (Current != End && (*Current >= '1' && *Current <= '9')) {
> > +    BlockIndent = unsigned(*Current - '0');
> > +    LookingForIndent = false;
> > +    skip(1);
> >    }
> > +  // Check for chomping indicator once again.
> > +  if (Current != End && (*Current == '+' || *Current == '-')) {
> > +    ChompingIndicator = *Current;
> > +    skip(1);
> > +  }
> > +  Current = skip_while(&Scanner::skip_s_white, Current);
> > +  skipComment();
>
> These two lines are both part of `scanToNextToken()`.  I wonder if you
> picked the right thing to extract?
>

I'm not sure what you mean by this. 'skipComment' is used here as well
as in 'scanToNextToken', but the rest is different.

>
> >
> > -  if (Start == Current) {
> > -    setError("Got empty block scalar", Start);
> > +  if (Current == End) {
> > +    // End of file after block header, we have an empty scalar.
> > +    Token T;
> > +    T.Kind = Token::TK_BlockScalar;
> > +    T.Range = StringRef(BlockStart, Current - BlockStart);
> > +    TokenQueue.push_back(T);
> > +    return true;
> > +  }
> > +  auto I = skip_b_break(Current);
> > +  if (I == Current) {
> > +    setError("Expected a line break after block scalar header",
> Current);
> >      return false;
> >    }
>
> ...
>
> A few general questions about the rest (which I haven't taken the time
> to read carefully yet):
>
>   - Is it possible to separate out the restructuring you've done into a
>     series of NFC ("no functionality change") commits?  That would make
>     the functional changes easier to review.
>

Yes, I'll commit the NFC change I outlined above and will take out
the change to the 'skip' method in KeyValueNode as well;.

  - How many new features/bugfixes are you rolling into this?  Can any
>     of them be split apart into separate commits?  If not, can you
>     enumerate them on the list (and, of course, in the commit message)?
>

This should be a one feature commit really, as it implements parsing
of block scalars.

>   - You haven't added any new tests (although I see you've modified the
>     old ones).  Do the current tests really cover everything you've
>     changed here?  If not, please add some targeted ones.

I did add a new test in the yaml parser unittest. I also did change the
existing YAMLParser tests,
but since they were never actually ran before I think that it's kind of
equivalent to the
addition of new tests.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150506/6ee2f354/attachment.html>