[PATCH] D114790: [syntax][pseudo] Add grammar facilities for the pseudo-parser

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Dec 21 02:58:49 PST 2021


sammccall added inline comments.


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:52
+// Terminal IDs correspond to the clang TokenKind enum.
+using SymbolID = uint16_t;
+// SymbolID is only 12 bits wide.
----------------
If we want strong types, we could use `enum class SymbolID : uint16_t {};`


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:74
+using RuleID = uint16_t;
+// RuleID is only 12 bits wide.
+static constexpr unsigned RuleBits = 12;
----------------
Maybe rather "there are at most 2^12 rules"


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:83
+
+  static constexpr unsigned SizeBits = 4;
+  static_assert(SizeBits + SymbolBits <= 16,
----------------
// Sequence can be at most 2^4 tokens long


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:110
+// Grammar that describes a programming language, C++ etc.
+// It parses BNF grammar rules, and is a building brick for constructing a
+// grammar-based parser.
----------------
nit: building block is more idiomatic


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:110
+// Grammar that describes a programming language, C++ etc.
+// It parses BNF grammar rules, and is a building brick for constructing a
+// grammar-based parser.
----------------
sammccall wrote:
> nit: building block is more idiomatic
Nit: it's rather the RuleSpec factories and Grammar::build (which are static) that parses BNF grammar.
The Grammar object itself doesn't have this responsibility.


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:133
+    static llvm::Expected<RuleSpec> parseRaw(llvm::StringRef Line);
+    // Exposed for testing.
+    // Inline all _opt symbols.
----------------
I think this should be explicit rather than implicit, and so not just "exposed for testing" but explicitly called


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:144
+
+  // Lookup non-terminal by name.
+  SymbolID lookupSymbol(llvm::StringRef Name) const;
----------------
The implementation also (inefficiently) supports looking up terminals. Intended?


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:154
+  // Dump symbol (terminal or non-terminal) name.
+  llvm::StringRef dumpSymbol(SymbolID) const;
+  std::string dumpRule(RuleID) const;
----------------
dump suggests an arbitrary debugging representation, but we rely on this being exactly the symbol name. `symbolName(SymbolID)`?

Maybe also doc what the names are (for terminals).
e.g. `Terminals have names like "," (kw_comma) or "OPERATOR" (kw_operator)`.


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:156
+  std::string dumpRule(RuleID) const;
+  std::string dumpRuleRecursively(RuleID) const;
+  // Dump all rules of the given nonterminal symbol.
----------------
in the prototype the "recursively" versions ended up being much less useful than I thought. You may want to drop the complexity


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:171
+
+// Underlying storage of the Grammar.
+struct Table {
----------------
Are compiled transition tables etc expected to live here, or is this a purely descriptive version of teh grammar?


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/Grammar.cpp:48
+
+void computeNullability(Table &T) {
+  // A nonterminal is nullable if some rule producing it is nullable.
----------------
I don't think it makes sense to compute and store nullability if our plan is to build a parser that doesn't support nullable symbols.

Rather we would warn in eliminateOptional() or diagnose() if there are nullable rules. (This doesn't require computing the nullability of every rule)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114790/new/

https://reviews.llvm.org/D114790



More information about the cfe-commits mailing list