[PATCH] D118196: [syntax][pseudo] Implement LR parsing table.
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Feb 3 03:19:49 PST 2022
sammccall added a comment.
OK, a bunch of random notes but I'm getting a higher level picture.
I'm not sure about the split between LRAutomaton and LRTable. I suppose doing it in two stages simplifies the implementation, but I don't think the first stage particularly needs to be public.
Also I think the names are confusing: Automaton vs Table isn't a clear contrast as one describes behavior and one a data structure. And the "LRTable" seems more automaton-like to me!
I think my suggestion would be:
- call the final structure LRAutomaton
- call the LRAutomaton::build()::Builder structure `LRGraph` or something
- I'm not sure the LRAutomaton structure needs to exist exactly, seems like the current LRTable::build() could operate on `LRGraph` directly
but I'm going to ponder this more.
================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:32
+public:
+ Item(RuleID ID, uint8_t RuleLength, uint8_t DotPos)
+ : RID(ID), RuleLength(RuleLength), DotPos(DotPos) {}
----------------
accepting the grammar as a parameter rather than the rule length would let us ensure the length is correct locally rather than relying on the caller.
Essentially all the callers need to do a lookup anyway.
================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:45
+ }
+ const pseudo::Rule &rule(const Grammar &G) const { return G.lookupRule(RID); }
+ RuleID ruleID() const { return RID; }
----------------
nit: redundant pseudo::qualifier
================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:47
+ RuleID ruleID() const { return RID; }
+ int dotPos() const { return DotPos; }
+ std::string dump(const Grammar &G) const;
----------------
nit: unsigned.
just dot()?
and similarly ruleID()->rule()?
Not sure these would really be unclear
================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRAutomaton.h:79
+
+// A deterministic finite state automaton for LR parsing.
+//
----------------
Hmm, isn't the point of GLR that it's a nondeterministic automaton?
We've done the LR-style NFA->DFA conversion, but it's not complete (when we hit a conflict we continue rather than giving up)
================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:144
+ "augmented symbol rule must be _ := start_symbol.");
+ auto StartState = closure({{{RID, Rule.size(), 0}}}, G);
+ auto Result = Builder.insert(std::move(StartState));
----------------
nit: too many braces - please spell out some of the types
(and Auto should probably be State here?)
================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:215
+ for (const Item &I : Automaton.states()[SID]) {
+ // If "_ := start_symbol ." in the State, set Action[State, eof] to
+ // accept.
----------------
These comments are echoing the code - saying what, not why.
"If we've just parsed the start symbol, we can accept the input".
================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:220
+ Builder.addAction(SID, tokenSymbol(tok::eof),
+ LRTable::Action{LRTable::Action::Kind::Accept, {}});
+ continue;
----------------
Seems like we should maybe have the accepted symbol as a payload on the Accept action?
Actually why do we need an Accept action, as opposed to just a reduce action and recognize it because it's the symbol we want and the stack is empty?
================
Comment at: clang/lib/Tooling/Syntax/Pseudo/LRBuilder.cpp:224
+ if (!I.hasNext()) {
+ // If a state i has an item "A := x.", set Action[i, a] to reduce
+ // "A := x" for all a in FOLLOW(A).
----------------
"If we've reached the end of a rule A := ..., then we can reduce if the next token is in the follow set of A"
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D118196/new/
https://reviews.llvm.org/D118196
More information about the cfe-commits
mailing list