[PATCH] D130550: [pseudo] Start rules are `_ := start-symbol EOF`, improve recovery.

Fri Jul 29 01:27:31 PDT 2022

hokein added inline comments.

================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/GLR.h:74
     bool GCParity;
+    // Have we already used this node for error recovery? (prevents loops)
+    mutable bool Recovered = false;
----------------
sammccall wrote:
> hokein wrote:
> > haven't look at it deeply -- is this bug related to this eof change? This looks like a different bug in recovery.
> They're "related" in that they both fix repeated-recovery scenarios.
> This change fixes that we can hit an infinite loop when applying recovery repeatedly.
> The eof change fixes that recovery is (erroneously) only applied once at eof.
> 
> I hoped to cover them with the same testcase, which tests repeated recovery at EOF. I can extract this change with a separate test if you like, though it will be very similar to the one I have here.
> This change fixes that we can hit an infinite loop when applying recovery repeatedly.

I'm more worried about this bug, I think this is an important bug, worth a separate patch to fix it, right now it looks like a join-effort in the eof change. 

> The eof change fixes that recovery is (erroneously) only applied once at eof.

Not sure I follow this. I think the eof change is basically to remove a technical debt (avoid the special case and repeated code after main parsing loop).  Am I missing something?

================
Comment at: clang-tools-extra/pseudo/lib/Forest.cpp:191
+  // This is important to drive the final shift/recover/reduce loop.
+  new (&Terminals[Index])
+      ForestNode(ForestNode::Terminal, tokenSymbol(tok::eof),
----------------
sammccall wrote:
> hokein wrote:
> > nit: in the underlying TokenStream implementation, `tokens()` has a trailing eof token, I think we can fold this into the above loop (if we expose a `token_eof()` method in TokenStream). Not sure we should do this. 
> I think this doesn't generalize well... at the moment we're parsing the whole stream, but in future we likely want to parse a subrange (pp-disabled regions?). In such a case we would still want the terminating EOF terminal as a device for parsing, even though there's no corresponding token.
oh, ok, that's fair enough.

================
Comment at: clang-tools-extra/pseudo/test/lr-build-basic.test:16
 # GRAPH-NEXT: State 3
 # GRAPH-NEXT:     id := IDENTIFIER • 

----------------
there should be a State 4 (with a `_ := expr EOF •` item)

================
Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:622
+  TestLang.Table = LRTable::buildSLR(TestLang.G);
+  llvm::errs() << TestLang.Table.dumpForTests(TestLang.G);
+  TestLang.RecoveryStrategies.try_emplace(
----------------
remove the debugging statement.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130550/new/

https://reviews.llvm.org/D130550