[llvm] [YAMLParser] Improve plain scalar spec compliance (PR #68946)

Scott Linder via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 17 09:34:01 PDT 2023


================
@@ -1848,13 +1869,15 @@ bool Scanner::fetchMoreTokens() {
   if (*Current == ',')
     return scanFlowEntry();
 
-  if (*Current == '-' && isBlankOrBreak(Current + 1))
+  if (*Current == '-' && (isBlankOrBreak(Current + 1) || Current + 1 == End))
     return scanBlockEntry();
 
-  if (*Current == '?' && (FlowLevel || isBlankOrBreak(Current + 1)))
+  if (*Current == '?' && (Current + 1 == End || isBlankOrBreak(Current + 1)))
     return scanKey();
 
-  if (*Current == ':' && (FlowLevel || isBlankOrBreak(Current + 1)))
+  if (*Current == ':' && ((FlowLevel && (IsAdjacentValueAllowed ||
----------------
slinder1 wrote:

> > Isn't checking `FlowLevel` here redundant assuming `IsAdjacentValueAllowed` is set correctly?
> 
> That's tricky. Consider the input `"a":"b"`. This is not valid YAML -- while the key is JSON-style (half of what is needed to allow adjacent values), it is not in a flow context, so we are not allowed to have an adjacent value. I suppose `IsAdjacentValueAllowed` is misleadingly named -- I have renamed it to `IsAdjacentValueAllowedInFlow`.
> 
> That said, it may not matter that much, since we mis-parse it whether or not we consider `FlowLevel`:
> 
> * If we do not check `FlowLevel`, then we allow `"a":"b"` to parse as `{"a": "b"}`, which is not allowed per the YAML spec (bug in scanner).
> * If we do check `FlowLevel`, then the `:` in `"a":"b"` does not get parsed as a key… but it does get parsed as a plain literal, so we end up parsing the document sequence `["a", ":\"b\""]`. The scanner is actually doing the right thing here, but the parser is not -- two `TK_Scalar`s in a row are not valid, but the parser accepts it anyway.
> 
> Neither of these are great outcomes (both violate the YAML spec), but I guess the first one is less surprising, so I think I will remove the `FlowLevel` check as per your suggestion. If the parser were to be fixed to disallow consecutive bare documents as the spec says it should, then we would want to reintroduce the `FlowLevel` check here in the scanner.
> 
Ah, thank you for the explanation, I think it makes sense to me now. Sorry again for confusion on my part, I always struggle with YAML every time I return to it, there are just a lot of possibilities to consider.
> > I'm also confused about the condition `!isPlainSafeNonBlank`. Shouldn't we always parse a (possibly empty) value following the `:` when an adjacent value is allowed?
> 
> The condition is `||`, not `&&`. We already have an example in the test suite, `llvm/test/YAMLParser/spec-08-13.test`:
> 
Ah, woops, this was just me working a bit too fast I think. Thank you for the explanation below!
> ```yaml
> {
>   ? foo :,
>   ? : bar,
> }
> ```
> 
> Consider the line `? foo :,`, when `Current` points at the `:`. If we had only the condition `isBlankOrBreak(Current + 1) || (FlowLevel && IsAdjacentValueAllowed)`, we would incorrectly return `false` here, since `*(Current + 1) == ','` (`isBlankOrBreak(Current + 1)` false), we are in flow, and `IsAdjacentValueAllowed == false` since `foo` is YAML-style.
> 
> Hence, we do need this condition. That said… I think `!isPlainSafeNonBlank(Current + 1)` subsumes `isBlankOrBreak(Current + 1)`, so we could remove the latter.
> 
> > I also think the order should make the "normal" case of a blank/break following the indicator come first
> 
> Good point, done.
> 
> > all together is the following still correct, or am I missing something?
> > ```
> > if (*Current == ':' && (isBlankOrBreak(Current + 1) || IsAdjacentValueAllowed))
> > ```
> 
> This doesn't work due to the `llvm/test/YAMLParser/spec-08-13.test` failure, but after your feedback, I think it can all be simplified down to
> 
> ```c++
> if (*Current == ':' &&
>     (!isPlainSafeNonBlank(Current + 1) || IsAdjacentValueAllowedInFlow))
> ```

This looks great, I think removing one level of conditions alone is a big win to make it readable at a glance. Thank you for bearing with me!

https://github.com/llvm/llvm-project/pull/68946


More information about the llvm-commits mailing list