[llvm] 32b3f13 - [YAML] Trim trailing whitespace from plain scalars

Rahul Kayaith via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 9 18:57:02 PST 2023


Author: rkayaith
Date: 2023-02-09T21:56:57-05:00
New Revision: 32b3f13337ef0bf747705d058f4772c7fdabd736

URL: https://github.com/llvm/llvm-project/commit/32b3f13337ef0bf747705d058f4772c7fdabd736
DIFF: https://github.com/llvm/llvm-project/commit/32b3f13337ef0bf747705d058f4772c7fdabd736.diff

LOG: [YAML] Trim trailing whitespace from plain scalars

In some cases plain scalars are currently parsed with a trailing
newline. In particular this shows up often when parsing JSON files, e.g.
note the `\n` after `456` below:
```
$ cat test.yaml
{
  "foo": 123,
  "bar": 456
}
$ yaml-bench test.yaml -canonical
%YAML 1.2
---
!!map {
  ? !!str "foo"
  : !!str "123",
  ? !!str "bar"
  : !!str "456\n",
}
...
```
The trailing whitespace ends up causing the conversion of the scalar to
int/bool/etc. to fail, causing the issue seen here:
https://github.com/llvm/llvm-project/issues/15877

>From reading the YAML spec (https://yaml.org/spec/1.2.2/#733-plain-style)
it seems like plain scalars should never end with whitespace, so this
change trims all trailing whitespace characters from the
value (specifically `b-line-feed`, `b-carriage-return`, `s-space`, and
`s-tab`).

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D137118

Added: 
    llvm/test/YAMLParser/json.test

Modified: 
    llvm/lib/Support/YAMLParser.cpp
    llvm/unittests/Support/YAMLIOTest.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Support/YAMLParser.cpp b/llvm/lib/Support/YAMLParser.cpp
index b85b1eb83ef89..6ac2c6aeeb46a 100644
--- a/llvm/lib/Support/YAMLParser.cpp
+++ b/llvm/lib/Support/YAMLParser.cpp
@@ -2041,8 +2041,11 @@ StringRef ScalarNode::getValue(SmallVectorImpl<char> &Storage) const {
     }
     return UnquotedValue;
   }
-  // Plain or block.
-  return Value.rtrim(' ');
+  // Plain.
+  // Trim whitespace ('b-char' and 's-white').
+  // NOTE: Alternatively we could change the scanner to not include whitespace
+  //       here in the first place.
+  return Value.rtrim("\x0A\x0D\x20\x09");
 }
 
 StringRef ScalarNode::unescapeDoubleQuoted( StringRef UnquotedValue

diff  --git a/llvm/test/YAMLParser/json.test b/llvm/test/YAMLParser/json.test
new file mode 100644
index 0000000000000..7d1b24caed987
--- /dev/null
+++ b/llvm/test/YAMLParser/json.test
@@ -0,0 +1,13 @@
+# RUN: yaml-bench -canonical %s | FileCheck %s
+
+# CHECK: !!map {
+# CHECK:   ? !!str "foo"
+# CHECK:   : !!str "123",
+# CHECK:   ? !!str "bar"
+# CHECK:   : !!str "456",
+# CHECK: }
+
+{
+  "foo": 123,
+  "bar": 456
+}

diff  --git a/llvm/unittests/Support/YAMLIOTest.cpp b/llvm/unittests/Support/YAMLIOTest.cpp
index 2ed79cae31edc..f282d23dc500b 100644
--- a/llvm/unittests/Support/YAMLIOTest.cpp
+++ b/llvm/unittests/Support/YAMLIOTest.cpp
@@ -96,6 +96,15 @@ TEST(YAMLIO, TestMapRead) {
     EXPECT_EQ(doc.foo, 3);
     EXPECT_EQ(doc.bar, 5);
   }
+
+  {
+    Input yin("{\"foo\": 3\n, \"bar\": 5}");
+    yin >> doc;
+
+    EXPECT_FALSE(yin.error());
+    EXPECT_EQ(doc.foo, 3);
+    EXPECT_EQ(doc.bar, 5);
+  }
 }
 
 TEST(YAMLIO, TestMalformedMapRead) {


        


More information about the llvm-commits mailing list