[PATCH] D118196: [syntax][pseudo] Implement LR parsing table.
Haojian Wu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Feb 11 06:27:40 PST 2022
hokein added inline comments.
================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:84
+
+ Kind K = Kind::Error;
+ // Value
----------------
sammccall wrote:
> This action struct can be squeezed to 16 bits.
>
> Kind only needs 3 bits (I suspect it can be 2: Error and Accept aren't heavily used).
> RuleID is 12 bits, StateID would fit within 11 and 12 should be safe.
>
> Combined with the optimization suggested below, this should get the whole table down to 335KB.
squeezed to 16 bits (3 bits for Kind, and 13 bits for the Value).
================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:105
+ auto Goto = find({From, NonTerminal});
+ assert(Goto.size() == 1 && Goto.front().isGoTo());
+ return Goto.front().goTo();
----------------
sammccall wrote:
> Why is this assertion valid? Is it a precondition of the function that some item in From can advance over NonTerminal?
This is guaranteed by a DFA (a node can not have two out edges with a same label), the same reason why we don't have shift/shift conflicts.
================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/LRTable.h:121
+private:
+ struct IndexKey {
+ StateID From;
----------------
sammccall wrote:
> I think there's an even more compact representation:
> Basically sort the index keys as symbol-major, but then don't store the actual symbol there but rather store the range for the symbol in an external array:
>
> ```
> // index is lookahead tok::TokenKind. Values are indices into Index: the range for tok is StateIdx[TokenIdx[tok]...TokenIdx[tok+1]]
> vector<unsigned> TokenIdx;
>
> // index is nonterminal SymbolID. Values are indices into Index: the range for sym is StateIdx[NontermIdx[sym]...NontermIdx[sym+1]].
> vector<unsigned> NontermIdx;
>
> // parallel to actions, value is the from state. Only subranges are sorted.
> vector<StateID> StateIdx;
>
> vector<Action> Actions;
> ```
>
> This should be 4 * (392 + 245) + (2 + 4) * 83k ~= 500k. And the one extra indirection should save 5-10 steps of binary search.
>
> This is equivalent to 2 * `vector<unsigned>` + `vector<pair<StateID, Action>>` but keeping the searched index compact is probably a good thing.
This looks like a nice improvement, though the implementation is a little tricky -- we reduced the binary-search space from 2 dimension (State, Symbol) to 1 dimension Symbol.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D118196/new/
https://reviews.llvm.org/D118196
More information about the cfe-commits
mailing list