<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2015-05-06 13:27 GMT-07:00 Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5"><br>
> On 2015 May 5, at 15:57, Alex Lorenz <<a href="mailto:arphaman@gmail.com">arphaman@gmail.com</a>> wrote:<br>
><br>
> Hi bogner, bob.wilson, dexonsmith,<br>
><br>
> This patch implements the parsing of YAML block scalars. Some code existed there before, but it couldn't parse block scalars.<br>
> This patch is based on the patch '<a href="http://reviews.llvm.org/D9448" target="_blank">http://reviews.llvm.org/D9448</a>' that enables the YAMLParser tests.<br>
> I've added a new yaml node type to represent the block scalar values.<br>
> I've deleted the files 'spec-09-27.test' and 'spec-09-28.test' as they are identical to the file 'spec-09-26.test'.<br>
><br>
> REPOSITORY<br>
> rL LLVM<br>
><br>
> <a href="http://reviews.llvm.org/D9503" target="_blank">http://reviews.llvm.org/D9503</a><br>
><br>
> Files:<br>
> include/llvm/Support/YAMLParser.h<br>
> lib/Support/YAMLParser.cpp<br>
> test/YAMLParser/spec-09-14.test<br>
> test/YAMLParser/spec-09-18.test<br>
> test/YAMLParser/spec-09-19.test<br>
> test/YAMLParser/spec-09-20.test<br>
> test/YAMLParser/spec-09-21.test<br>
> test/YAMLParser/spec-09-22.test<br>
> test/YAMLParser/spec-09-24.test<br>
> test/YAMLParser/spec-09-25.test<br>
> test/YAMLParser/spec-09-26.test<br>
> test/YAMLParser/spec-09-27.test<br>
> test/YAMLParser/spec-09-28.test<br>
> test/YAMLParser/spec-09-29.test<br>
> test/YAMLParser/spec-09-30.test<br>
> test/YAMLParser/spec-09-31.test<br>
> test/YAMLParser/spec-09-32.test<br>
> test/YAMLParser/spec-09-33.test<br>
> unittests/Support/YAMLParserTest.cpp<br>
> utils/yaml-bench/YAMLBench.cpp<br>
><br>
> EMAIL PREFERENCES<br>
> <a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/settings/panel/emailpreferences/</a><br>
</div></div>> <D9503.24963.patch><br>
> Index: include/llvm/Support/YAMLParser.h<br>
> ===================================================================<br>
> --- include/llvm/Support/YAMLParser.h<br>
> +++ include/llvm/Support/YAMLParser.h<br>
> @@ -107,6 +107,7 @@<br>
> enum NodeKind {<br>
> NK_Null,<br>
> NK_Scalar,<br>
> + NK_BlockScalar,<br>
> NK_KeyValue,<br>
> NK_Mapping,<br>
> NK_Sequence,<br>
> @@ -222,6 +223,36 @@<br>
> SmallVectorImpl<char> &Storage) const;<br>
> };<br>
><br>
> +/// \brief A block scalar node is an opaque datum that can be presented as a<br>
> +/// series of zero or more Unicode scalar values.<br>
> +///<br>
> +/// Example:<br>
> +/// |<br>
> +/// Hello<br>
> +/// World<br>
> +class BlockScalarNode : public Node {<br>
> + void anchor() override;<br>
> +<br>
> +public:<br>
> + BlockScalarNode(std::unique_ptr<Document> &D, StringRef Anchor, StringRef Tag,<br>
> + std::string &Value, StringRef RawVal)<br>
> + : Node(NK_BlockScalar, D, Anchor, Tag), Value(std::move(Value)) {<br>
> + SMLoc Start = SMLoc::getFromPointer(RawVal.begin());<br>
> + SMLoc End = SMLoc::getFromPointer(RawVal.end());<br>
> + SourceRange = SMRange(Start, End);<br>
> + }<br>
> +<br>
> + /// \brief Gets the value of this node as a StringRef.<br>
> + StringRef getValue() const { return Value; }<br>
> +<br>
> + static inline bool classof(const Node *N) {<br>
> + return N->getType() == NK_BlockScalar;<br>
> + }<br>
> +<br>
> +private:<br>
> + std::string Value;<br>
> +};<br>
> +<br>
> /// \brief A key and value pair. While not technically a Node under the YAML<br>
> /// representation graph, it is easier to treat them this way.<br>
> ///<br>
> @@ -252,8 +283,10 @@<br>
> Node *getValue();<br>
><br>
> void skip() override {<br>
> - getKey()->skip();<br>
> - getValue()->skip();<br>
> + if (Node *Key = getKey())<br>
> + Key->skip();<br>
> + if (Node *Val = getValue())<br>
> + Val->skip();<br>
<br>
I'm probably missing something obvious, but why do you need to<br>
conditionalize these? Do they both need to be conditional? (If not,<br>
can you change the other to an `assert()`?)<br>
<br>
If it's possible to commit this separately along with a testcase (or<br>
unit test) that exposes the bug, please do. Otherwise please indicate<br>
in this commit message which test required the change.<br></blockquote><div><br></div><div>This came up in one of my block scalar parsing unit tests. In particular, this test case failed:</div><div><br></div><div> ExpectParseError("Invalid block scalar header", "test: | failure");</div><div><br></div><div>Because the block scalar failed and returned false, the key value node for the key 'test' didn't</div><div>actually have a value, so it crashed when the method skip was called during unit testing.</div><div><br></div><div>I think that you're right, it would be better to create a separate patch for this issue. I will</div><div>create a new patch with a test case that deals with just this problem so that this patch</div><div>won't have to fix this bug.</div>
<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
> }<br>
><br>
> static inline bool classof(const Node *N) {<br>
> Index: lib/Support/YAMLParser.cpp<br>
> ===================================================================<br>
> --- lib/Support/YAMLParser.cpp<br>
> +++ lib/Support/YAMLParser.cpp<br>
> @@ -101,6 +101,7 @@<br>
> void Node::anchor() {}<br>
> void NullNode::anchor() {}<br>
> void ScalarNode::anchor() {}<br>
> +void BlockScalarNode::anchor() {}<br>
> void KeyValueNode::anchor() {}<br>
> void MappingNode::anchor() {}<br>
> void SequenceNode::anchor() {}<br>
> @@ -128,6 +129,7 @@<br>
> TK_Key,<br>
> TK_Value,<br>
> TK_Scalar,<br>
> + TK_BlockScalar,<br>
> TK_Alias,<br>
> TK_Anchor,<br>
> TK_Tag<br>
> @@ -137,6 +139,9 @@<br>
> /// of the token in the input.<br>
> StringRef Range;<br>
><br>
> + /// The value of a block scalar node.<br>
> + std::string Value;<br>
> +<br>
> Token() : Kind(TK_Error) {}<br>
> };<br>
> }<br>
> @@ -348,6 +353,14 @@<br>
> /// b-break.<br>
> StringRef::iterator skip_b_break(StringRef::iterator Position);<br>
><br>
> + /// @brief Skip a single s-space[31] starting at Position.<br>
> + ///<br>
> + /// An s-space is 0x20<br>
> + ///<br>
> + /// @returns The code unit after the s-space, or Position if it's not a<br>
> + /// s-space.<br>
> + StringRef::iterator skip_s_space(StringRef::iterator Position);<br>
> +<br>
> /// @brief Skip a single s-white[33] starting at Position.<br>
> ///<br>
> /// A s-white is 0x20 | 0x9<br>
> @@ -417,6 +430,9 @@<br>
> , Token::TokenKind Kind<br>
> , TokenQueueT::iterator InsertPoint);<br>
><br>
> + /// @brief Skip a single-line comment if it's present on the current line.<br>
> + void skipComment();<br>
> +<br>
> /// @brief Skip whitespace and comments until the start of the next token.<br>
> void scanToNextToken();<br>
><br>
> @@ -608,6 +624,9 @@<br>
> case Token::TK_Scalar:<br>
> OS << "Scalar: ";<br>
> break;<br>
> + case Token::TK_BlockScalar:<br>
> + OS << "Block Scalar: ";<br>
> + break;<br>
> case Token::TK_Alias:<br>
> OS << "Alias: ";<br>
> break;<br>
> @@ -812,6 +831,13 @@<br>
> return Position;<br>
> }<br>
><br>
> +StringRef::iterator Scanner::skip_s_space(StringRef::iterator Position) {<br>
> + if (Position == End)<br>
> + return Position;<br>
> + if (*Position == ' ')<br>
> + return Position + 1;<br>
> + return Position;<br>
> +}<br>
><br>
> StringRef::iterator Scanner::skip_s_white(StringRef::iterator Position) {<br>
> if (Position == End)<br>
> @@ -967,24 +993,27 @@<br>
> return true;<br>
> }<br>
><br>
> +void Scanner::skipComment() {<br>
> + if (*Current != '#')<br>
> + return;<br>
> + while (true) {<br>
> + // This may skip more than one byte, thus Column is only incremented<br>
> + // for code points.<br>
> + StringRef::iterator i = skip_nb_char(Current);<br>
<br>
LLVM style would spell this with an uppercase, as in `I`. I know you<br>
copy/pasted, but you can fix it in transit ;).<br>
<br>
> + if (i == Current)<br>
> + break;<br>
> + Current = i;<br>
> + ++Column;<br>
> + }<br>
> +}<br>
<br>
Looks like you pulled this directly from `scanToNextToken()`. Please do<br>
that in a prep-commit.<br>
<br>
Also see my note below about whether you pulled the right bits.<br></blockquote><div><br></div><div><br></div><div>Yes, I extracted the method 'skipComment' from 'scanToNextToken' as I wanted to reuse it in 'scanBlockScalar'.</div><div>You're right, It would be better to make it a separate NFC commit. I will update the patch and commit the NFC extraction</div><div>of the 'skipComment' method with updates to 'i' to conform to the LLVM style.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
> +<br>
> void Scanner::scanToNextToken() {<br>
> while (true) {<br>
> while (*Current == ' ' || *Current == '\t') {<br>
> skip(1);<br>
> }<br>
><br>
> - // Skip comment.<br>
> - if (*Current == '#') {<br>
> - while (true) {<br>
> - // This may skip more than one byte, thus Column is only incremented<br>
> - // for code points.<br>
> - StringRef::iterator i = skip_nb_char(Current);<br>
> - if (i == Current)<br>
> - break;<br>
> - Current = i;<br>
> - ++Column;<br>
> - }<br>
> - }<br>
> + skipComment();<br>
><br>
> // Skip EOL.<br>
> StringRef::iterator i = skip_b_break(Current);<br>
> @@ -1368,37 +1397,196 @@<br>
> }<br>
><br>
> bool Scanner::scanBlockScalar(bool IsLiteral) {<br>
> - StringRef::iterator Start = Current;<br>
> - skip(1); // Eat | or ><br>
> - while(true) {<br>
> - StringRef::iterator i = skip_nb_char(Current);<br>
> - if (i == Current) {<br>
> - if (Column == 0)<br>
> - break;<br>
> - i = skip_b_break(Current);<br>
> - if (i != Current) {<br>
> - // We got a line break.<br>
> - Column = 0;<br>
> - ++Line;<br>
> - Current = i;<br>
> - continue;<br>
> - } else {<br>
> - // There was an error, which should already have been printed out.<br>
> - return false;<br>
> - }<br>
> - }<br>
> - Current = i;<br>
> - ++Column;<br>
> + std::string Str;<br>
> + raw_string_ostream OS(Str);<br>
> + auto BlockStart = Current;<br>
> + // If this is true, then the scanner has to look for a first non-blank line<br>
> + // to determine the indenation of the block scalar.<br>
> + bool LookingForIndent = true;<br>
> + unsigned BlockIndent = 0;<br>
> + unsigned BlockExitIndent = Indent < 0 ? 0 : (unsigned)Indent;<br>
> + unsigned LineBreaks = 0;<br>
> +<br>
> + unsigned MaxAllSpaceLineCharacters = 0;<br>
> + StringRef::iterator LongestAllSpaceLine;<br>
> +<br>
> + bool FoldNextLine = false;<br>
> +<br>
> + skip(1); // Eat '|' or '>'<br>
> +<br>
> + // Block Header ([162] c-b-block-header)<br>
> + // Chomping indicator ([164] c-chomping-indicator)<br>
> + char ChompingIndicator = ' ';<br>
> + if (Current != End && (*Current == '+' || *Current == '-')) {<br>
> + ChompingIndicator = *Current;<br>
> + skip(1);<br>
> + }<br>
> + // Indentation indicator ([163] c-indentation-indicator)<br>
> + if (Current != End && (*Current >= '1' && *Current <= '9')) {<br>
> + BlockIndent = unsigned(*Current - '0');<br>
> + LookingForIndent = false;<br>
> + skip(1);<br>
> }<br>
> + // Check for chomping indicator once again.<br>
> + if (Current != End && (*Current == '+' || *Current == '-')) {<br>
> + ChompingIndicator = *Current;<br>
> + skip(1);<br>
> + }<br>
> + Current = skip_while(&Scanner::skip_s_white, Current);<br>
> + skipComment();<br>
<br>
These two lines are both part of `scanToNextToken()`. I wonder if you<br>
picked the right thing to extract?<br></blockquote><div><br></div><div>I'm not sure what you mean by this. 'skipComment' is used here as well</div><div>as in 'scanToNextToken', but the rest is different.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
><br>
> - if (Start == Current) {<br>
> - setError("Got empty block scalar", Start);<br>
> + if (Current == End) {<br>
> + // End of file after block header, we have an empty scalar.<br>
> + Token T;<br>
> + T.Kind = Token::TK_BlockScalar;<br>
> + T.Range = StringRef(BlockStart, Current - BlockStart);<br>
> + TokenQueue.push_back(T);<br>
> + return true;<br>
> + }<br>
> + auto I = skip_b_break(Current);<br>
> + if (I == Current) {<br>
> + setError("Expected a line break after block scalar header", Current);<br>
> return false;<br>
> }<br>
<br>
...<br>
<br>
A few general questions about the rest (which I haven't taken the time<br>
to read carefully yet):<br>
<br>
- Is it possible to separate out the restructuring you've done into a<br>
series of NFC ("no functionality change") commits? That would make<br>
the functional changes easier to review.<br></blockquote><div><br></div><div>Yes, I'll commit the NFC change I outlined above and will take out</div><div>the change to the 'skip' method in KeyValueNode as well;.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
- How many new features/bugfixes are you rolling into this? Can any<br>
of them be split apart into separate commits? If not, can you<br>
enumerate them on the list (and, of course, in the commit message)?<br></blockquote><div><br></div><div>This should be a one feature commit really, as it implements parsing</div><div>of block scalars.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
- You haven't added any new tests (although I see you've modified the<br>
old ones). Do the current tests really cover everything you've<br>
changed here? If not, please add some targeted ones. </blockquote><div><br></div><div>I did add a new test in the yaml parser unittest. I also did change the existing YAMLParser tests,</div><div>but since they were never actually ran before I think that it's kind of equivalent to the</div><div>addition of new tests.</div><div><br></div></div><br></div></div>