[PATCH] D114790: [syntax][pseudo] Add grammar facilities for the pseudo-parser

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 2 07:47:28 PST 2022


sammccall accepted this revision.
sammccall added inline comments.
This revision is now accepted and ready to land.


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:85
+  static constexpr unsigned SizeBits = 4;
+  static_assert(SizeBits + SymbolBits <= 16,
+                "Must be able to store symbol ID + size efficiently");
----------------
We're being quite fiddly with layout but actually pretty wasteful.

wc says 652 with 1323 total RHS elements on the RHS - average ~2.
So we could aim for 6 bytes per rule = 4KB but we're using 34 bytes per rule = 22KB.

Can't see a nice concrete way to achieve that though.


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:87
+                "Must be able to store symbol ID + size efficiently");
+  static constexpr unsigned MaxElements = 1 << SizeBits;
+
----------------
well, we can set MaxElements to 9 instead of 16, that reduces the rule table from 22kB to 13K...


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:91
+  // SymbolID : 12 | Size : 4
+  struct {
+    SymbolID Target : SymbolBits;
----------------
Why is this wrapped in a struct?


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:123
+  // Return all rules of the given non-terminal symbol.
+  llvm::ArrayRef<Rule> lookupRules(SymbolID SID) const;
+  const Rule &lookupRule(RuleID RID) const;
----------------
maybe rulesFor(SymbolID)?

having lookupRules/lookupRule seems like a trap given SymbolID and RuleID have very different semantics and are typedefs for each other!


================
Comment at: clang/include/clang/Tooling/Syntax/Pseudo/Grammar.h:149
+    std::string Name;
+    // A [start, end) index range for the non-terminal.
+    struct {
----------------
This comment could be more explicit about the relationship between the rules and the symbol!


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/Grammar.cpp:49
+  OS << symbolName(R.target()) << " :=";
+  for (SymbolID SID : R.seq()) {
+    OS << " " << symbolName(SID);
----------------
nit: drop braces, consistent with rest of file


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/GrammarBNF.cpp:1
+//===--- GrammarBNF.cpp - build grammar from BNF files  ----------*- C++-*-===//
+//
----------------
I love the split of bnf parsing into its own file.

Given this file only deals with the parsing, I'm not sure so much has to be encapsulated into the GrammarBuilder class.
I'd probably move things like RuleSpec struct, the ParseLine function etc out to file level, making the (lack of) data/state flow more explicit.
Up to you though.


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/GrammarBNF.cpp:140
+
+    auto ParseLine = [this](llvm::StringRef Line, RuleSpec &Out) {
+      auto Parts = Line.split(":=");
----------------
parseline can be a method (or function)


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/GrammarBNF.cpp:155
+          continue; // skip empty
+        if (Chunk.startswith("#"))
+          break; // comment, skip anything coming after #
----------------
Seems like it'd be nicer to trim the trailing `# comment` off the line before calling ParseLine?


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/GrammarBNF.cpp:187
+              Diagnostics.push_back(llvm::formatv(
+                  "Rule '{0}' emits a nullable symbol", R.toString()));
+              return;
----------------
nit: "Rule '{0}' has nullable RHS"?


================
Comment at: clang/lib/Tooling/Syntax/Pseudo/GrammarBNF.cpp:236
+        Diagnostics.push_back(llvm::formatv(
+            "Nullable symbol: {0}", G.symbolName(T.Rules[RID].target())));
+    }
----------------
`"Rule '{0}' has empty RHS", G.dumpRule(RID)`?

this points us closer to the source of the error


================
Comment at: clang/unittests/Tooling/Syntax/Pseudo/GrammarTests.cpp:1
+//===--- GrammarTests.cpp - grammar tests  ------------------------*-
+//C++-*-===//
----------------
mangled comment


================
Comment at: clang/unittests/Tooling/Syntax/Pseudo/GrammarTests.cpp:44
+        return ID;
+    assert(false);
+  }
----------------
include a message here

or better, ADD_FAILURE("No such symbol: ") << Name first


================
Comment at: clang/unittests/Tooling/Syntax/Pseudo/GrammarTests.cpp:60
+  auto ExpressionID = lookup("expression");
+  // auto RuleID = 0;
+  EXPECT_EQ(G->symbolName(ExpressionID), "expression");
----------------
remove


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114790/new/

https://reviews.llvm.org/D114790



More information about the cfe-commits mailing list