[PATCH] D128472: [pseudo] Check follow-sets instead of tying reduce actions to lookahead tokens.
Haojian Wu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Jun 27 06:11:00 PDT 2022
hokein accepted this revision.
hokein added a comment.
This revision is now accepted and ready to land.
This looks a great improvement. A few nits, it looks good to me overall.
================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:136
+ // check whether a reduction should apply in the current context.
+ llvm::ArrayRef<RuleID> getReduceRules(StateID State) const {
+ return llvm::makeArrayRef(&Reduces[ReduceOffset[State]],
----------------
I think the API is motivated by the LR(0) experimentation, we're now decoupling the reduction look up from the lookahead token, this seems nice (LR(0) only needs `getReduceRules`, other LR(1) variant need `getReduceRules` + `canFollow`).
The API usage for SLR(1) is not quite straight forward at the first glance, would be better to add a small snippet in the comment.
================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:143
+ assert(isToken(Terminal));
+ assert(!isToken(Nonterminal));
+ return FollowSets.test(tok::NUM_TOKENS * Nonterminal +
----------------
nit: use `isNonterminal(Nonterminal)`? IMO it is clearer.
================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:209
+
+ // Reduces from state S: Reduces[ [ReduceOffset[S], ReduceOffset[S+1]) ]
+ std::vector<uint32_t> ReduceOffset;
----------------
nit: I think rephasing it as: `a half-open range of Reduces [ReduceOffset[S], ReduceOffset[S+1]).` is clearer -- the current syntax `Reduces[ [ReduceOffset[S], ReduceOffset[S+1]) ]` is a bit weird (My first attempt was to read it as an unmatch-brackets code snippet)
================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:212
+ std::vector<RuleID> Reduces;
+ // T in Follow(NT) == FollowSets[NT * NUM_TOKENS + tok::Kind(T)]
+ llvm::BitVector FollowSets;
----------------
The NT is not quite obvious, what is NT? I think it is the symbol ID, then I don't understand the `NT * NUM_TOKENS + tok::Kind(T)` part.
================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:213
+ // T in Follow(NT) == FollowSets[NT * NUM_TOKENS + tok::Kind(T)]
+ llvm::BitVector FollowSets;
};
----------------
Oh, I understand it now after reading the build code...
NT here is the non-terminal (symbol to be reduced), T is the lookahead. And FollowSets is conceptually a flat array of a 2d bool array [Nonterminal, Tokens].
Probably add some more comments.
================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:369
// (Roughly this means there's no local ambiguity, so the LR algorithm works).
bool popAndPushTrivial() {
if (!Sequences.empty() || Heads->size() != NextPopHead + 1)
----------------
nit: add a comment about what does return value indicate.
================
Comment at: clang-tools-extra/pseudo/lib/grammar/LRTableBuild.cpp:116
+ llvm::DenseMap<StateID, llvm::SmallSet<RuleID, 4>> Reduces;
+ std::vector<llvm::DenseSet<SymbolID>> FollowSets;
+
----------------
make these members as private -- having two public members seems to violate the pattern. (FollowSets can be initialized in the constructor, we might need a separate method for `Reduces`. or either change the Builder to a struct).
================
Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:28
+ auto G = Grammar::parseBNF(R"bnf(
+ _ := expr
+ expr := term
----------------
nit: we use the hard-coded ruleID in the following test, I'd suggest add a trailing comment for the rule ID for each rule here, for better code readability.
================
Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:38
+ SymbolID Eof = tokenSymbol(tok::eof);
+ SymbolID Semi = tokenSymbol(tok::semi);
+ SymbolID Int = tokenSymbol(tok::kw_int);
----------------
It would be better to change it to an `IDENTIFIER` to reflect the grammar above.
================
Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:39
+ SymbolID Semi = tokenSymbol(tok::semi);
+ SymbolID Int = tokenSymbol(tok::kw_int);
+ SymbolID Plus = tokenSymbol(tok::plus);
----------------
nit: I'd move this on Line67, as this token serves as an invalid token according to the grammar.
================
Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:42
+
+ // eof semi term reduce
+ // +-------+----+-------+------+-------
----------------
the row is usual for terminals, and the reduce is a separate table, which seems a bit confusing -- I would probably split the reduce table out (adding an empty col between term and reduce)
================
Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:45
+ // |state0 | | s0 | | r0
+ // |state1 | | | g3 | acc
+ // |state2 | | | | r1
----------------
nit: `acc` => `R2 (acc)` (sorry, I didn't update the comment in the removing-accept patch)
================
Comment at: clang-tools-extra/pseudo/unittests/LRTableTest.cpp:53
+ std::vector<LRTable::ReduceEntry> ReduceEntries = {
+ {/* State=*/0, 0},
+ {/* State=*/1, 2},
----------------
nit: add a `/*RuleID=*/` comment annotation for the second argument.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128472/new/
https://reviews.llvm.org/D128472
More information about the cfe-commits
mailing list