[PATCH] D128486: [pseudo] Add error-recovery framework & brace-based recovery

Tue Jul 5 11:49:12 PDT 2022

sammccall marked 5 inline comments as done.
sammccall added a comment.

Thanks in particular for flagging the issue with duplicate forest nodes, you've found a hole in the model.
That said, I've left a big FIXME and I think we should patch it later.

================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRGraph.h:148
+    RecoveryStrategy Strategy; // Heuristic choosing the tokens to match.
+    SymbolID Result;           // The symbol that is produced.
+  };
----------------
hokein wrote:
> nit: mention the `Result` must be a nonterminal.
Hmm, I don't think it has to be?

(To deduce a brace recovery rule we require it, but that doesn't mean it needs to be true in general)

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:178
+    const ForestNode &Placeholder =
+        Params.Forest.createOpaque(Option->Symbol, Option->Position);
+    const GSS::Node *NewHead = Params.GSStack.addNode(
----------------
hokein wrote:
> should we worry about the case where we create a duplicated forest node? e.g. we have two best options and both recover to the same nonterminal.
Oh no, you're right... we also have all the options for the GSS node being either the same/different in that case.

And it gets worse, what if we have one recovery -> X, and another recovery -> Y, and a rule `X := Y`. The following `glrReduce` will produce a duplicate `X`, instead of an `AmbiguousX{ OpaqueX, SequenceX{ OpaqueY } }`.

As much as I hate this, I think we should slap a FIXME on it and move on. I don't think multiple tied recoveries is common, and the solution here is just as likely to break the tie as it is to fix this with fancy algorithms.
However I don't think we have enough data to make a decision on exactly what to do yet.

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:44
+
+void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads,
+                unsigned &TokenIndex, const TokenStream &Tokens,
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > The `GLR.cpp` file is growing significantly, I think the recovery part is large enough to be lived in a separate file `GLRRecovery.cpp`, but the declaration can still be in the `GLR.h`.
> > This is interesting, recover/shift/reduce/parse are (vertically) self-contained enough that it didn't seem like a big problem yet...
> > 
> > If the concern is file length, maybe we'd thather start with `reduce`; if it's relatedness, `GSS`?
> > 
> > My line count is:
> >  - recover: 156
> >  - shift: 47
> >  - reduce: 319
> >  - parse: 87
> >  - GSS: 66
> Yeah, indeed these pieces can be split, my main concern is not file length -- I would prefer keeping shift/reduce/parse into a single file, as they form a complete GLR algorithm (splitting them would make it harder to read and follow).
> 
> Recovery seems like a different thing, I would image this part of code will grow more in the future
> - the GLR algorithm has the core recovery mechanism framework with some fallback recovery strategies (eof, brackets)
> - we have different recovery strategies implemented (some of them might be cxx-specific, probably be part of pseudoCXX library);
> 
> 
It's hard to tell yet, but I guess I don't really see why it's going to be easier to reason about shift/recover in isolation vs shift/reduce. Let's see how this goes past the initial patch (in particular once we move stuff into extensions)

================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:130
+
+  assert(NewHeads.empty()); // We may repeatedly populate and clear it.
+  llvm::Optional<Token::Range> RecoveryRange;
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > nit: might be clearer to move the assertion to the beginning of the function.
> > The purpose of the assertion is to make it obvious that NewHeads.clear() only discards items added during the loop below.
> > An assertion at the top of the function would be less useful for this, both because it's not on the screen when reading NewHeads.clear() and because it's much less locally obvious that nothing has been pushed onto NewHeads at some prior point in the function.
> Yeah, but I'd treat it as a contract of the API (the `NewHeads` argument passed to `glrRecover` must be empty).
> 
> btw, the empty assertion is missing in the latest version.
Yes, I think this is obsolete - the latest version of glrRecover no longer requires NewHeads to be empty at all, as it doesn't use it as scratch space.

================
Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:558

+TEST_F(GLRTest, RecoveryEndToEnd) {
+  // Simple example of brace-based recovery showing:
----------------
hokein wrote:
> nit: not sure the intention having the `RecoveryEndToEnd` separated from the above recover-related tests, why not grouping them together?
this tests glrParse, and it's grouped with the other glrParse tests consistent with this file (e.g. GLRReduceOrder is with the glrParse tests, not with the glrReduce tests)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128486/new/

https://reviews.llvm.org/D128486