[PATCH] D118990: [pseudo] Add first and follow set computation in Grammar.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 4 08:46:58 PST 2022


sammccall accepted this revision.
sammccall added inline comments.
This revision is now accepted and ready to land.


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:146
+// derived from X. (Known as FIRST sets in grammar-based parsers).
+std::vector<llvm::DenseSet<SymbolID>> firstSets(const Grammar &);
+// For each nonterminal X, computes the set of terminals that could immediately
----------------
AFAICS, firstSets are iterated over in followSets, and followSets are iterated over when building SLR.

Seems like exposing `vector<vector<SymbolID>>` would be enough. Up to you.
(Or the `Epochs<SymbolID>` data structure from the prototype - there must be a name for that data structure)


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/Grammar.cpp:132
+    for (const auto &R : G.table().Rules) {
+      // Rule 2: for a rule X := ...YZ, we add all symbols from FIRST(Z) to
+      // FOLLOW(Y).
----------------
nit: `X := ... Y Z`

(no confusion that the ... is associated with Y or that YZ is one symbol)


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/Grammar.cpp:146
+      }
+      // Rule 3: for a rule X := ...Z, we add all symbols from FOLLOW(X) to
+      // FOLLOW(Z).
----------------
`X := ... Z`


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/Grammar.cpp:156
+
+SymbolID Grammar::startSymbol() const {
+  // start symbol is named _, binary search it.
----------------
Seems more natural to catch this problem in the constructor?
Then we'd stash the SymbolID in a member variable and just return it here.


================
Comment at: clang/unittests/Tooling/Syntax/Pseudo/GrammarTest.cpp:25
 
 MATCHER_P(TargetID, SID, "") { return arg.Target == SID; }
 template <typename... T> testing::Matcher<const Rule &> Sequence(T... IDs) {
----------------
whoops, I missed these in the previous review, these should really be `targetID` and `sequence` as they're functions.


================
Comment at: clang/unittests/Tooling/Syntax/Pseudo/GrammarTest.cpp:110
+  ASSERT_TRUE(Diags.empty());
+  auto ToPair = [](std::vector<llvm::DenseSet<SymbolID>> Input) {
+    std::vector<std::pair<SymbolID, llvm::DenseSet<SymbolID>>> Sets;
----------------
nit: ToPairs, plural


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118990/new/

https://reviews.llvm.org/D118990



More information about the cfe-commits mailing list