[PATCH] YAML: Implement block scalar parsing
Alex L
arphaman at gmail.com
Wed May 6 14:57:24 PDT 2015
2015-05-06 13:27 GMT-07:00 Duncan P. N. Exon Smith <dexonsmith at apple.com>:
>
> > On 2015 May 5, at 15:57, Alex Lorenz <arphaman at gmail.com> wrote:
> >
> > Hi bogner, bob.wilson, dexonsmith,
> >
> > This patch implements the parsing of YAML block scalars. Some code
> existed there before, but it couldn't parse block scalars.
> > This patch is based on the patch 'http://reviews.llvm.org/D9448' that
> enables the YAMLParser tests.
> > I've added a new yaml node type to represent the block scalar values.
> > I've deleted the files 'spec-09-27.test' and 'spec-09-28.test' as they
> are identical to the file 'spec-09-26.test'.
> >
> > REPOSITORY
> > rL LLVM
> >
> > http://reviews.llvm.org/D9503
> >
> > Files:
> > include/llvm/Support/YAMLParser.h
> > lib/Support/YAMLParser.cpp
> > test/YAMLParser/spec-09-14.test
> > test/YAMLParser/spec-09-18.test
> > test/YAMLParser/spec-09-19.test
> > test/YAMLParser/spec-09-20.test
> > test/YAMLParser/spec-09-21.test
> > test/YAMLParser/spec-09-22.test
> > test/YAMLParser/spec-09-24.test
> > test/YAMLParser/spec-09-25.test
> > test/YAMLParser/spec-09-26.test
> > test/YAMLParser/spec-09-27.test
> > test/YAMLParser/spec-09-28.test
> > test/YAMLParser/spec-09-29.test
> > test/YAMLParser/spec-09-30.test
> > test/YAMLParser/spec-09-31.test
> > test/YAMLParser/spec-09-32.test
> > test/YAMLParser/spec-09-33.test
> > unittests/Support/YAMLParserTest.cpp
> > utils/yaml-bench/YAMLBench.cpp
> >
> > EMAIL PREFERENCES
> > http://reviews.llvm.org/settings/panel/emailpreferences/
> > <D9503.24963.patch>
> > Index: include/llvm/Support/YAMLParser.h
> > ===================================================================
> > --- include/llvm/Support/YAMLParser.h
> > +++ include/llvm/Support/YAMLParser.h
> > @@ -107,6 +107,7 @@
> > enum NodeKind {
> > NK_Null,
> > NK_Scalar,
> > + NK_BlockScalar,
> > NK_KeyValue,
> > NK_Mapping,
> > NK_Sequence,
> > @@ -222,6 +223,36 @@
> > SmallVectorImpl<char> &Storage) const;
> > };
> >
> > +/// \brief A block scalar node is an opaque datum that can be presented
> as a
> > +/// series of zero or more Unicode scalar values.
> > +///
> > +/// Example:
> > +/// |
> > +/// Hello
> > +/// World
> > +class BlockScalarNode : public Node {
> > + void anchor() override;
> > +
> > +public:
> > + BlockScalarNode(std::unique_ptr<Document> &D, StringRef Anchor,
> StringRef Tag,
> > + std::string &Value, StringRef RawVal)
> > + : Node(NK_BlockScalar, D, Anchor, Tag), Value(std::move(Value)) {
> > + SMLoc Start = SMLoc::getFromPointer(RawVal.begin());
> > + SMLoc End = SMLoc::getFromPointer(RawVal.end());
> > + SourceRange = SMRange(Start, End);
> > + }
> > +
> > + /// \brief Gets the value of this node as a StringRef.
> > + StringRef getValue() const { return Value; }
> > +
> > + static inline bool classof(const Node *N) {
> > + return N->getType() == NK_BlockScalar;
> > + }
> > +
> > +private:
> > + std::string Value;
> > +};
> > +
> > /// \brief A key and value pair. While not technically a Node under the
> YAML
> > /// representation graph, it is easier to treat them this way.
> > ///
> > @@ -252,8 +283,10 @@
> > Node *getValue();
> >
> > void skip() override {
> > - getKey()->skip();
> > - getValue()->skip();
> > + if (Node *Key = getKey())
> > + Key->skip();
> > + if (Node *Val = getValue())
> > + Val->skip();
>
> I'm probably missing something obvious, but why do you need to
> conditionalize these? Do they both need to be conditional? (If not,
> can you change the other to an `assert()`?)
>
> If it's possible to commit this separately along with a testcase (or
> unit test) that exposes the bug, please do. Otherwise please indicate
> in this commit message which test required the change.
>
This came up in one of my block scalar parsing unit tests. In particular,
this test case failed:
ExpectParseError("Invalid block scalar header", "test: | failure");
Because the block scalar failed and returned false, the key value node for
the key 'test' didn't
actually have a value, so it crashed when the method skip was called during
unit testing.
I think that you're right, it would be better to create a separate patch
for this issue. I will
create a new patch with a test case that deals with just this problem so
that this patch
won't have to fix this bug.
>
> > }
> >
> > static inline bool classof(const Node *N) {
> > Index: lib/Support/YAMLParser.cpp
> > ===================================================================
> > --- lib/Support/YAMLParser.cpp
> > +++ lib/Support/YAMLParser.cpp
> > @@ -101,6 +101,7 @@
> > void Node::anchor() {}
> > void NullNode::anchor() {}
> > void ScalarNode::anchor() {}
> > +void BlockScalarNode::anchor() {}
> > void KeyValueNode::anchor() {}
> > void MappingNode::anchor() {}
> > void SequenceNode::anchor() {}
> > @@ -128,6 +129,7 @@
> > TK_Key,
> > TK_Value,
> > TK_Scalar,
> > + TK_BlockScalar,
> > TK_Alias,
> > TK_Anchor,
> > TK_Tag
> > @@ -137,6 +139,9 @@
> > /// of the token in the input.
> > StringRef Range;
> >
> > + /// The value of a block scalar node.
> > + std::string Value;
> > +
> > Token() : Kind(TK_Error) {}
> > };
> > }
> > @@ -348,6 +353,14 @@
> > /// b-break.
> > StringRef::iterator skip_b_break(StringRef::iterator Position);
> >
> > + /// @brief Skip a single s-space[31] starting at Position.
> > + ///
> > + /// An s-space is 0x20
> > + ///
> > + /// @returns The code unit after the s-space, or Position if it's not
> a
> > + /// s-space.
> > + StringRef::iterator skip_s_space(StringRef::iterator Position);
> > +
> > /// @brief Skip a single s-white[33] starting at Position.
> > ///
> > /// A s-white is 0x20 | 0x9
> > @@ -417,6 +430,9 @@
> > , Token::TokenKind Kind
> > , TokenQueueT::iterator InsertPoint);
> >
> > + /// @brief Skip a single-line comment if it's present on the current
> line.
> > + void skipComment();
> > +
> > /// @brief Skip whitespace and comments until the start of the next
> token.
> > void scanToNextToken();
> >
> > @@ -608,6 +624,9 @@
> > case Token::TK_Scalar:
> > OS << "Scalar: ";
> > break;
> > + case Token::TK_BlockScalar:
> > + OS << "Block Scalar: ";
> > + break;
> > case Token::TK_Alias:
> > OS << "Alias: ";
> > break;
> > @@ -812,6 +831,13 @@
> > return Position;
> > }
> >
> > +StringRef::iterator Scanner::skip_s_space(StringRef::iterator Position)
> {
> > + if (Position == End)
> > + return Position;
> > + if (*Position == ' ')
> > + return Position + 1;
> > + return Position;
> > +}
> >
> > StringRef::iterator Scanner::skip_s_white(StringRef::iterator Position)
> {
> > if (Position == End)
> > @@ -967,24 +993,27 @@
> > return true;
> > }
> >
> > +void Scanner::skipComment() {
> > + if (*Current != '#')
> > + return;
> > + while (true) {
> > + // This may skip more than one byte, thus Column is only incremented
> > + // for code points.
> > + StringRef::iterator i = skip_nb_char(Current);
>
> LLVM style would spell this with an uppercase, as in `I`. I know you
> copy/pasted, but you can fix it in transit ;).
>
> > + if (i == Current)
> > + break;
> > + Current = i;
> > + ++Column;
> > + }
> > +}
>
> Looks like you pulled this directly from `scanToNextToken()`. Please do
> that in a prep-commit.
>
> Also see my note below about whether you pulled the right bits.
>
Yes, I extracted the method 'skipComment' from 'scanToNextToken' as I
wanted to reuse it in 'scanBlockScalar'.
You're right, It would be better to make it a separate NFC commit. I will
update the patch and commit the NFC extraction
of the 'skipComment' method with updates to 'i' to conform to the LLVM
style.
>
> > +
> > void Scanner::scanToNextToken() {
> > while (true) {
> > while (*Current == ' ' || *Current == '\t') {
> > skip(1);
> > }
> >
> > - // Skip comment.
> > - if (*Current == '#') {
> > - while (true) {
> > - // This may skip more than one byte, thus Column is only
> incremented
> > - // for code points.
> > - StringRef::iterator i = skip_nb_char(Current);
> > - if (i == Current)
> > - break;
> > - Current = i;
> > - ++Column;
> > - }
> > - }
> > + skipComment();
> >
> > // Skip EOL.
> > StringRef::iterator i = skip_b_break(Current);
> > @@ -1368,37 +1397,196 @@
> > }
> >
> > bool Scanner::scanBlockScalar(bool IsLiteral) {
> > - StringRef::iterator Start = Current;
> > - skip(1); // Eat | or >
> > - while(true) {
> > - StringRef::iterator i = skip_nb_char(Current);
> > - if (i == Current) {
> > - if (Column == 0)
> > - break;
> > - i = skip_b_break(Current);
> > - if (i != Current) {
> > - // We got a line break.
> > - Column = 0;
> > - ++Line;
> > - Current = i;
> > - continue;
> > - } else {
> > - // There was an error, which should already have been printed
> out.
> > - return false;
> > - }
> > - }
> > - Current = i;
> > - ++Column;
> > + std::string Str;
> > + raw_string_ostream OS(Str);
> > + auto BlockStart = Current;
> > + // If this is true, then the scanner has to look for a first
> non-blank line
> > + // to determine the indenation of the block scalar.
> > + bool LookingForIndent = true;
> > + unsigned BlockIndent = 0;
> > + unsigned BlockExitIndent = Indent < 0 ? 0 : (unsigned)Indent;
> > + unsigned LineBreaks = 0;
> > +
> > + unsigned MaxAllSpaceLineCharacters = 0;
> > + StringRef::iterator LongestAllSpaceLine;
> > +
> > + bool FoldNextLine = false;
> > +
> > + skip(1); // Eat '|' or '>'
> > +
> > + // Block Header ([162] c-b-block-header)
> > + // Chomping indicator ([164] c-chomping-indicator)
> > + char ChompingIndicator = ' ';
> > + if (Current != End && (*Current == '+' || *Current == '-')) {
> > + ChompingIndicator = *Current;
> > + skip(1);
> > + }
> > + // Indentation indicator ([163] c-indentation-indicator)
> > + if (Current != End && (*Current >= '1' && *Current <= '9')) {
> > + BlockIndent = unsigned(*Current - '0');
> > + LookingForIndent = false;
> > + skip(1);
> > }
> > + // Check for chomping indicator once again.
> > + if (Current != End && (*Current == '+' || *Current == '-')) {
> > + ChompingIndicator = *Current;
> > + skip(1);
> > + }
> > + Current = skip_while(&Scanner::skip_s_white, Current);
> > + skipComment();
>
> These two lines are both part of `scanToNextToken()`. I wonder if you
> picked the right thing to extract?
>
I'm not sure what you mean by this. 'skipComment' is used here as well
as in 'scanToNextToken', but the rest is different.
>
> >
> > - if (Start == Current) {
> > - setError("Got empty block scalar", Start);
> > + if (Current == End) {
> > + // End of file after block header, we have an empty scalar.
> > + Token T;
> > + T.Kind = Token::TK_BlockScalar;
> > + T.Range = StringRef(BlockStart, Current - BlockStart);
> > + TokenQueue.push_back(T);
> > + return true;
> > + }
> > + auto I = skip_b_break(Current);
> > + if (I == Current) {
> > + setError("Expected a line break after block scalar header",
> Current);
> > return false;
> > }
>
> ...
>
> A few general questions about the rest (which I haven't taken the time
> to read carefully yet):
>
> - Is it possible to separate out the restructuring you've done into a
> series of NFC ("no functionality change") commits? That would make
> the functional changes easier to review.
>
Yes, I'll commit the NFC change I outlined above and will take out
the change to the 'skip' method in KeyValueNode as well;.
- How many new features/bugfixes are you rolling into this? Can any
> of them be split apart into separate commits? If not, can you
> enumerate them on the list (and, of course, in the commit message)?
>
This should be a one feature commit really, as it implements parsing
of block scalars.
> - You haven't added any new tests (although I see you've modified the
> old ones). Do the current tests really cover everything you've
> changed here? If not, please add some targeted ones.
I did add a new test in the yaml parser unittest. I also did change the
existing YAMLParser tests,
but since they were never actually ran before I think that it's kind of
equivalent to the
addition of new tests.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150506/6ee2f354/attachment.html>
More information about the llvm-commits
mailing list