[PATCH] D118196: [syntax][pseudo] Implement LR parsing table.

Haojian Wu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 11 06:27:40 PST 2022


hokein added inline comments.


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:84
+
+    Kind K = Kind::Error;
+    // Value
----------------
sammccall wrote:
> This action struct can be squeezed to 16 bits.
> 
> Kind only needs 3 bits (I suspect it can be 2: Error and Accept aren't heavily used).
> RuleID is 12 bits, StateID would fit within 11 and 12 should be safe.
> 
> Combined with the optimization suggested below, this should get the whole table down to  335KB.
squeezed to 16 bits (3 bits for Kind, and 13 bits for the Value).


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:105
+    auto Goto = find({From, NonTerminal});
+    assert(Goto.size() == 1 && Goto.front().isGoTo());
+    return Goto.front().goTo();
----------------
sammccall wrote:
> Why is this assertion valid? Is it a precondition of the function that some item in From can advance over NonTerminal?
This is guaranteed by a DFA (a node can not have two out edges with a same label), the same reason why we don't have shift/shift conflicts. 


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:121
+private:
+  struct IndexKey {
+    StateID From;
----------------
sammccall wrote:
> I think there's an even more compact representation:
> Basically sort the index keys as symbol-major, but then don't store the actual symbol there but rather store the range for the symbol in an external array:
> 
> ```
> // index is lookahead tok::TokenKind. Values are indices into Index: the range for tok is StateIdx[TokenIdx[tok]...TokenIdx[tok+1]]
> vector<unsigned> TokenIdx;
> 
> // index is nonterminal SymbolID. Values are  indices into Index: the range for sym is StateIdx[NontermIdx[sym]...NontermIdx[sym+1]].
> vector<unsigned> NontermIdx;
> 
> // parallel to actions, value is the from state. Only subranges are sorted.
> vector<StateID> StateIdx;
> 
> vector<Action> Actions;
> ```
> 
> This should be 4 * (392 + 245) + (2 + 4) * 83k ~= 500k. And the one extra indirection should save 5-10 steps of binary search.
> 
> This is equivalent to 2 * `vector<unsigned>` + `vector<pair<StateID, Action>>` but keeping the searched index compact is probably a good thing.
This looks like a nice improvement, though the implementation is a little tricky -- we reduced the binary-search space from 2 dimension (State, Symbol) to 1 dimension Symbol.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118196/new/

https://reviews.llvm.org/D118196



More information about the cfe-commits mailing list