[PATCH] D118196: [syntax][pseudo] Implement LR parsing table.

Thu Feb 3 03:19:49 PST 2022

sammccall added a comment.

OK, a bunch of random notes but I'm getting a higher level picture.

I'm not sure about the split between LRAutomaton and LRTable. I suppose doing it in two stages simplifies the implementation, but I don't think the first stage particularly needs to be public.
Also I think the names are confusing: Automaton vs Table isn't a clear contrast as one describes behavior and one a data structure. And the "LRTable" seems more automaton-like to me!

I think my suggestion would be:

- call the final structure LRAutomaton
- call the LRAutomaton::build()::Builder structure `LRGraph` or something
- I'm not sure the LRAutomaton structure needs to exist exactly, seems like the current LRTable::build() could operate on `LRGraph` directly

but I'm going to ponder this more.

================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:32
+public:
+  Item(RuleID ID, uint8_t RuleLength, uint8_t DotPos)
+      : RID(ID), RuleLength(RuleLength), DotPos(DotPos) {}
----------------
accepting the grammar as a parameter rather than the rule length would let us ensure the length is correct locally rather than relying on the caller.
Essentially all the callers need to do a lookup anyway.

================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:45
+  }
+  const pseudo::Rule &rule(const Grammar &G) const { return G.lookupRule(RID); }
+  RuleID ruleID() const { return RID; }
----------------
nit: redundant pseudo::qualifier

================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:47
+  RuleID ruleID() const { return RID; }
+  int dotPos() const { return DotPos; }
+  std::string dump(const Grammar &G) const;
----------------
nit: unsigned.

just dot()?
and similarly ruleID()->rule()?
Not sure these would really be unclear

================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:79
+
+// A deterministic finite state automaton for LR parsing.
+//
----------------
Hmm, isn't the point of GLR that it's a nondeterministic automaton?

We've done the LR-style NFA->DFA conversion, but it's not complete (when we hit a conflict we continue rather than giving up)

================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:144
+           "augmented symbol rule must be _ := start_symbol.");
+    auto StartState = closure({{{RID, Rule.size(), 0}}}, G);
+    auto Result = Builder.insert(std::move(StartState));
----------------
nit: too many braces - please spell out some of the types
(and Auto should probably be State here?)

================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:215
+    for (const Item &I : Automaton.states()[SID]) {
+      // If "_ := start_symbol ." in the State, set Action[State, eof] to
+      // accept.
----------------
These comments are echoing the code - saying what, not why.

"If we've just parsed the start symbol, we can accept the input".

================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:220
+        Builder.addAction(SID, tokenSymbol(tok::eof),
+                          LRTable::Action{LRTable::Action::Kind::Accept, {}});
+        continue;
----------------
Seems like we should maybe have the accepted symbol as a payload on the Accept action?

Actually why do we need an Accept action, as opposed to just a reduce action and recognize it because it's the symbol we want and the stack is empty?

================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:224
+      if (!I.hasNext()) {
+        // If a state i has an item "A := x.", set Action[i, a] to reduce
+        // "A := x" for all a in FOLLOW(A).
----------------
"If we've reached the end of a rule A := ..., then we can reduce if the next token is in the follow set of A"

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118196/new/

https://reviews.llvm.org/D118196