<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2015-05-06 13:27 GMT-07:00 Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5"><br>

> On 2015 May 5, at 15:57, Alex Lorenz <<a href="mailto:arphaman@gmail.com">arphaman@gmail.com</a>> wrote:<br>

><br>

> Hi bogner, bob.wilson, dexonsmith,<br>

><br>

> This patch implements the parsing of YAML block scalars. Some code existed there before, but it couldn't parse block scalars.<br>

> This patch is based on the patch '<a href="http://reviews.llvm.org/D9448" target="_blank">http://reviews.llvm.org/D9448</a>' that enables the YAMLParser tests.<br>

> I've added a new yaml node type to represent the block scalar values.<br>

> I've deleted the files 'spec-09-27.test' and 'spec-09-28.test' as they are identical to the file 'spec-09-26.test'.<br>

><br>

> REPOSITORY<br>

>  rL LLVM<br>

><br>

> <a href="http://reviews.llvm.org/D9503" target="_blank">http://reviews.llvm.org/D9503</a><br>

><br>

> Files:<br>

>  include/llvm/Support/YAMLParser.h<br>

>  lib/Support/YAMLParser.cpp<br>

>  test/YAMLParser/spec-09-14.test<br>

>  test/YAMLParser/spec-09-18.test<br>

>  test/YAMLParser/spec-09-19.test<br>

>  test/YAMLParser/spec-09-20.test<br>

>  test/YAMLParser/spec-09-21.test<br>

>  test/YAMLParser/spec-09-22.test<br>

>  test/YAMLParser/spec-09-24.test<br>

>  test/YAMLParser/spec-09-25.test<br>

>  test/YAMLParser/spec-09-26.test<br>

>  test/YAMLParser/spec-09-27.test<br>

>  test/YAMLParser/spec-09-28.test<br>

>  test/YAMLParser/spec-09-29.test<br>

>  test/YAMLParser/spec-09-30.test<br>

>  test/YAMLParser/spec-09-31.test<br>

>  test/YAMLParser/spec-09-32.test<br>

>  test/YAMLParser/spec-09-33.test<br>

>  unittests/Support/YAMLParserTest.cpp<br>

>  utils/yaml-bench/YAMLBench.cpp<br>

><br>

> EMAIL PREFERENCES<br>

>  <a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/settings/panel/emailpreferences/</a><br>

</div></div>> <D9503.24963.patch><br>

> Index: include/llvm/Support/YAMLParser.h<br>

> ===================================================================<br>

> --- include/llvm/Support/YAMLParser.h<br>

> +++ include/llvm/Support/YAMLParser.h<br>

> @@ -107,6 +107,7 @@<br>

>    enum NodeKind {<br>

>      NK_Null,<br>

>      NK_Scalar,<br>

> +    NK_BlockScalar,<br>

>      NK_KeyValue,<br>

>      NK_Mapping,<br>

>      NK_Sequence,<br>

> @@ -222,6 +223,36 @@<br>

>                                   SmallVectorImpl<char> &Storage) const;<br>

>  };<br>

><br>

> +/// \brief A block scalar node is an opaque datum that can be presented as a<br>

> +///        series of zero or more Unicode scalar values.<br>

> +///<br>

> +/// Example:<br>

> +///   |<br>

> +///     Hello<br>

> +///     World<br>

> +class BlockScalarNode : public Node {<br>

> +  void anchor() override;<br>

> +<br>

> +public:<br>

> +  BlockScalarNode(std::unique_ptr<Document> &D, StringRef Anchor, StringRef Tag,<br>

> +                  std::string &Value, StringRef RawVal)<br>

> +      : Node(NK_BlockScalar, D, Anchor, Tag), Value(std::move(Value)) {<br>

> +    SMLoc Start = SMLoc::getFromPointer(RawVal.begin());<br>

> +    SMLoc End = SMLoc::getFromPointer(RawVal.end());<br>

> +    SourceRange = SMRange(Start, End);<br>

> +  }<br>

> +<br>

> +  /// \brief Gets the value of this node as a StringRef.<br>

> +  StringRef getValue() const { return Value; }<br>

> +<br>

> +  static inline bool classof(const Node *N) {<br>

> +    return N->getType() == NK_BlockScalar;<br>

> +  }<br>

> +<br>

> +private:<br>

> +  std::string Value;<br>

> +};<br>

> +<br>

>  /// \brief A key and value pair. While not technically a Node under the YAML<br>

>  ///        representation graph, it is easier to treat them this way.<br>

>  ///<br>

> @@ -252,8 +283,10 @@<br>

>    Node *getValue();<br>

><br>

>    void skip() override {<br>

> -    getKey()->skip();<br>

> -    getValue()->skip();<br>

> +    if (Node *Key = getKey())<br>

> +      Key->skip();<br>

> +    if (Node *Val = getValue())<br>

> +      Val->skip();<br>

<br>

I'm probably missing something obvious, but why do you need to<br>

conditionalize these?  Do they both need to be conditional?  (If not,<br>

can you change the other to an `assert()`?)<br>

<br>

If it's possible to commit this separately along with a testcase (or<br>

unit test) that exposes the bug, please do.  Otherwise please indicate<br>

in this commit message which test required the change.<br></blockquote><div><br></div><div>This came up in one of my block scalar parsing unit tests. In particular, this test case failed:</div><div><br></div><div>  ExpectParseError("Invalid block scalar header", "test: | failure");</div><div><br></div><div>Because the block scalar failed and returned false, the key value node for the key 'test' didn't</div><div>actually have a value, so it crashed when the method skip was called during unit testing.</div><div><br></div><div>I think that you're right, it would be better to create a separate patch for this issue. I will</div><div>create a new patch with a test case that deals with just this problem so that this patch</div><div>won't have to fix this bug.</div>

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

>    }<br>

><br>

>    static inline bool classof(const Node *N) {<br>

> Index: lib/Support/YAMLParser.cpp<br>

> ===================================================================<br>

> --- lib/Support/YAMLParser.cpp<br>

> +++ lib/Support/YAMLParser.cpp<br>

> @@ -101,6 +101,7 @@<br>

>  void Node::anchor() {}<br>

>  void NullNode::anchor() {}<br>

>  void ScalarNode::anchor() {}<br>

> +void BlockScalarNode::anchor() {}<br>

>  void KeyValueNode::anchor() {}<br>

>  void MappingNode::anchor() {}<br>

>  void SequenceNode::anchor() {}<br>

> @@ -128,6 +129,7 @@<br>

>      TK_Key,<br>

>      TK_Value,<br>

>      TK_Scalar,<br>

> +    TK_BlockScalar,<br>

>      TK_Alias,<br>

>      TK_Anchor,<br>

>      TK_Tag<br>

> @@ -137,6 +139,9 @@<br>

>    /// of the token in the input.<br>

>    StringRef Range;<br>

><br>

> +  /// The value of a block scalar node.<br>

> +  std::string Value;<br>

> +<br>

>    Token() : Kind(TK_Error) {}<br>

>  };<br>

>  }<br>

> @@ -348,6 +353,14 @@<br>

>    ///          b-break.<br>

>    StringRef::iterator skip_b_break(StringRef::iterator Position);<br>

><br>

> +  /// @brief Skip a single s-space[31] starting at Position.<br>

> +  ///<br>

> +  /// An s-space is 0x20<br>

> +  ///<br>

> +  /// @returns The code unit after the s-space, or Position if it's not a<br>

> +  ///          s-space.<br>

> +  StringRef::iterator skip_s_space(StringRef::iterator Position);<br>

> +<br>

>    /// @brief Skip a single s-white[33] starting at Position.<br>

>    ///<br>

>    /// A s-white is 0x20 | 0x9<br>

> @@ -417,6 +430,9 @@<br>

>                   , Token::TokenKind Kind<br>

>                   , TokenQueueT::iterator InsertPoint);<br>

><br>

> +  /// @brief Skip a single-line comment if it's present on the current line.<br>

> +  void skipComment();<br>

> +<br>

>    /// @brief Skip whitespace and comments until the start of the next token.<br>

>    void scanToNextToken();<br>

><br>

> @@ -608,6 +624,9 @@<br>

>      case Token::TK_Scalar:<br>

>        OS << "Scalar: ";<br>

>        break;<br>

> +    case Token::TK_BlockScalar:<br>

> +      OS << "Block Scalar: ";<br>

> +      break;<br>

>      case Token::TK_Alias:<br>

>        OS << "Alias: ";<br>

>        break;<br>

> @@ -812,6 +831,13 @@<br>

>    return Position;<br>

>  }<br>

><br>

> +StringRef::iterator Scanner::skip_s_space(StringRef::iterator Position) {<br>

> +  if (Position == End)<br>

> +    return Position;<br>

> +  if (*Position == ' ')<br>

> +    return Position + 1;<br>

> +  return Position;<br>

> +}<br>

><br>

>  StringRef::iterator Scanner::skip_s_white(StringRef::iterator Position) {<br>

>    if (Position == End)<br>

> @@ -967,24 +993,27 @@<br>

>    return true;<br>

>  }<br>

><br>

> +void Scanner::skipComment() {<br>

> +  if (*Current != '#')<br>

> +    return;<br>

> +  while (true) {<br>

> +    // This may skip more than one byte, thus Column is only incremented<br>

> +    // for code points.<br>

> +    StringRef::iterator i = skip_nb_char(Current);<br>

<br>

LLVM style would spell this with an uppercase, as in `I`.  I know you<br>

copy/pasted, but you can fix it in transit ;).<br>

<br>

> +    if (i == Current)<br>

> +      break;<br>

> +    Current = i;<br>

> +    ++Column;<br>

> +  }<br>

> +}<br>

<br>

Looks like you pulled this directly from `scanToNextToken()`.  Please do<br>

that in a prep-commit.<br>

<br>

Also see my note below about whether you pulled the right bits.<br></blockquote><div><br></div><div><br></div><div>Yes, I extracted the method 'skipComment' from 'scanToNextToken' as I wanted to reuse it in 'scanBlockScalar'.</div><div>You're right, It would be better to make it a separate NFC commit. I will update the patch and commit the NFC extraction</div><div>of the 'skipComment' method with updates to 'i' to conform to the LLVM style.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

> +<br>

>  void Scanner::scanToNextToken() {<br>

>    while (true) {<br>

>      while (*Current == ' ' || *Current == '\t') {<br>

>        skip(1);<br>

>      }<br>

><br>

> -    // Skip comment.<br>

> -    if (*Current == '#') {<br>

> -      while (true) {<br>

> -        // This may skip more than one byte, thus Column is only incremented<br>

> -        // for code points.<br>

> -        StringRef::iterator i = skip_nb_char(Current);<br>

> -        if (i == Current)<br>

> -          break;<br>

> -        Current = i;<br>

> -        ++Column;<br>

> -      }<br>

> -    }<br>

> +    skipComment();<br>

><br>

>      // Skip EOL.<br>

>      StringRef::iterator i = skip_b_break(Current);<br>

> @@ -1368,37 +1397,196 @@<br>

>  }<br>

><br>

>  bool Scanner::scanBlockScalar(bool IsLiteral) {<br>

> -  StringRef::iterator Start = Current;<br>

> -  skip(1); // Eat | or ><br>

> -  while(true) {<br>

> -    StringRef::iterator i = skip_nb_char(Current);<br>

> -    if (i == Current) {<br>

> -      if (Column == 0)<br>

> -        break;<br>

> -      i = skip_b_break(Current);<br>

> -      if (i != Current) {<br>

> -        // We got a line break.<br>

> -        Column = 0;<br>

> -        ++Line;<br>

> -        Current = i;<br>

> -        continue;<br>

> -      } else {<br>

> -        // There was an error, which should already have been printed out.<br>

> -        return false;<br>

> -      }<br>

> -    }<br>

> -    Current = i;<br>

> -    ++Column;<br>

> +  std::string Str;<br>

> +  raw_string_ostream OS(Str);<br>

> +  auto BlockStart = Current;<br>

> +  // If this is true, then the scanner has to look for a first non-blank line<br>

> +  // to determine the indenation of the block scalar.<br>

> +  bool LookingForIndent = true;<br>

> +  unsigned BlockIndent = 0;<br>

> +  unsigned BlockExitIndent = Indent < 0 ? 0 : (unsigned)Indent;<br>

> +  unsigned LineBreaks = 0;<br>

> +<br>

> +  unsigned MaxAllSpaceLineCharacters = 0;<br>

> +  StringRef::iterator LongestAllSpaceLine;<br>

> +<br>

> +  bool FoldNextLine = false;<br>

> +<br>

> +  skip(1); // Eat '|' or '>'<br>

> +<br>

> +  // Block Header ([162] c-b-block-header)<br>

> +  // Chomping indicator ([164] c-chomping-indicator)<br>

> +  char ChompingIndicator = ' ';<br>

> +  if (Current != End && (*Current == '+' || *Current == '-')) {<br>

> +    ChompingIndicator = *Current;<br>

> +    skip(1);<br>

> +  }<br>

> +  // Indentation indicator ([163] c-indentation-indicator)<br>

> +  if (Current != End && (*Current >= '1' && *Current <= '9')) {<br>

> +    BlockIndent = unsigned(*Current - '0');<br>

> +    LookingForIndent = false;<br>

> +    skip(1);<br>

>    }<br>

> +  // Check for chomping indicator once again.<br>

> +  if (Current != End && (*Current == '+' || *Current == '-')) {<br>

> +    ChompingIndicator = *Current;<br>

> +    skip(1);<br>

> +  }<br>

> +  Current = skip_while(&Scanner::skip_s_white, Current);<br>

> +  skipComment();<br>

<br>

These two lines are both part of `scanToNextToken()`.  I wonder if you<br>

picked the right thing to extract?<br></blockquote><div><br></div><div>I'm not sure what you mean by this. 'skipComment' is used here as well</div><div>as in 'scanToNextToken', but the rest is different.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

><br>

> -  if (Start == Current) {<br>

> -    setError("Got empty block scalar", Start);<br>

> +  if (Current == End) {<br>

> +    // End of file after block header, we have an empty scalar.<br>

> +    Token T;<br>

> +    T.Kind = Token::TK_BlockScalar;<br>

> +    T.Range = StringRef(BlockStart, Current - BlockStart);<br>

> +    TokenQueue.push_back(T);<br>

> +    return true;<br>

> +  }<br>

> +  auto I = skip_b_break(Current);<br>

> +  if (I == Current) {<br>

> +    setError("Expected a line break after block scalar header", Current);<br>

>      return false;<br>

>    }<br>

<br>

...<br>

<br>

A few general questions about the rest (which I haven't taken the time<br>

to read carefully yet):<br>

<br>

  - Is it possible to separate out the restructuring you've done into a<br>

    series of NFC ("no functionality change") commits?  That would make<br>

    the functional changes easier to review.<br></blockquote><div><br></div><div>Yes, I'll commit the NFC change I outlined above and will take out</div><div>the change to the 'skip' method in KeyValueNode as well;.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

  - How many new features/bugfixes are you rolling into this?  Can any<br>

    of them be split apart into separate commits?  If not, can you<br>

    enumerate them on the list (and, of course, in the commit message)?<br></blockquote><div><br></div><div>This should be a one feature commit really, as it implements parsing</div><div>of block scalars.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

  - You haven't added any new tests (although I see you've modified the<br>

    old ones).  Do the current tests really cover everything you've<br>

    changed here?  If not, please add some targeted ones. </blockquote><div><br></div><div>I did add a new test in the yaml parser unittest. I also did change the existing YAMLParser tests,</div><div>but since they were never actually ran before I think that it's kind of equivalent to the</div><div>addition of new tests.</div><div><br></div></div><br></div></div>