[PATCH] D137118: [YAML] Trim trailing whitespace from plain scalars

Rahul Kayaith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 31 19:14:53 PDT 2022


rkayaith created this revision.
Herald added a subscriber: hiraditya.
Herald added a project: All.
rkayaith added inline comments.
rkayaith published this revision for review.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.


================
Comment at: llvm/lib/Support/YAMLParser.cpp:2045-2046
   // Plain or block.
-  return Value.rtrim(' ');
+  // Trim whitespace ('b-char' and 's-white').
+  return Value.rtrim("\x0A\x0D\x20\x09");
 }
----------------
An alternative to this might be to change the [[ https://github.com/llvm/llvm-project/blob/cc2cf8b22b35a9cf87a61bf66c259455185fa6e3/llvm/lib/Support/YAMLParser.cpp#L1427 | scanner ]] to match the spec more closely, but I'm not very familiar with the spec or that code so I couldn't tell if that would be straightforward to do.


In some cases plain scalars are currently parsed with a trailing newline. In
particular this shows up often when parsing JSON files, e.g. below note the `\n`
after `456`:

  $ cat test.yaml
  {
    "foo": 123,
    "bar": 456
  }
  $ yaml-bench test.yaml -canonical
  %YAML 1.2
  ---
  !!map {
    ? !!str "foo"
    : !!str "123",
    ? !!str "bar"
    : !!str "456\n",
  }
  ...

The trailing whitespace ends up causing the conversion of the scalar to
int/bool/etc. to fail, and is inconsistent with the YAML spec:
https://yaml.org/spec/1.2.2/#733-plain-style

This change trims any trailing whitespace characters from the scalar value
(specifically `b-line-feed`, `b-carriage-return`, `s-space`, and `s-tab`).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D137118

Files:
  llvm/lib/Support/YAMLParser.cpp
  llvm/test/YAMLParser/json.test
  llvm/unittests/Support/YAMLIOTest.cpp


Index: llvm/unittests/Support/YAMLIOTest.cpp
===================================================================
--- llvm/unittests/Support/YAMLIOTest.cpp
+++ llvm/unittests/Support/YAMLIOTest.cpp
@@ -96,6 +96,15 @@
     EXPECT_EQ(doc.foo, 3);
     EXPECT_EQ(doc.bar, 5);
   }
+
+  {
+    Input yin("{\"foo\": 3\n, \"bar\": 5}");
+    yin >> doc;
+
+    EXPECT_FALSE(yin.error());
+    EXPECT_EQ(doc.foo, 3);
+    EXPECT_EQ(doc.bar, 5);
+  }
 }
 
 TEST(YAMLIO, TestMalformedMapRead) {
Index: llvm/test/YAMLParser/json.test
===================================================================
--- /dev/null
+++ llvm/test/YAMLParser/json.test
@@ -0,0 +1,13 @@
+# RUN: yaml-bench -canonical %s | FileCheck %s
+
+# CHECK: !!map {
+# CHECK:   ? !!str "foo"
+# CHECK:   : !!str "123",
+# CHECK:   ? !!str "bar"
+# CHECK:   : !!str "456",
+# CHECK: }
+
+{
+  "foo": 123,
+  "bar": 456
+}
Index: llvm/lib/Support/YAMLParser.cpp
===================================================================
--- llvm/lib/Support/YAMLParser.cpp
+++ llvm/lib/Support/YAMLParser.cpp
@@ -2042,7 +2042,8 @@
     return UnquotedValue;
   }
   // Plain or block.
-  return Value.rtrim(' ');
+  // Trim whitespace ('b-char' and 's-white').
+  return Value.rtrim("\x0A\x0D\x20\x09");
 }
 
 StringRef ScalarNode::unescapeDoubleQuoted( StringRef UnquotedValue


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D137118.472148.patch
Type: text/x-patch
Size: 1328 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221101/27756159/attachment.bin>


More information about the llvm-commits mailing list