[PATCH] D126536: [pseudo] Add grammar annotations support.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 3 15:36:02 PDT 2022


sammccall added a comment.

Thanks, this looks better.
Sorry about confusion with the names - I think annotation is a great name, and we should only use "extension" for the narrower "opaque thing that an annotation value refers to".
Happy to chat about this offline if you like



================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h:22
 //
+//  The grammar supports extensions, which have the syntax form of
+//  [key=value;key=value]. Extensions are associated with a grammar symbol (
----------------
Hmm, I think we misunderstood each other...

I think annotations is a good name for the syntactic [foo=bar] bit. I think the comment here is now harder to follow, and would prefer the old version.

It's specifically "bar" which indicates some identifier whose semantics will be provided externally, so "bar" is a reference to an extension.

If you want to mention the concept of an extension, then maybe at the end...
`Each unique annotation value creates an extension point. The Override guard is implemented in CXX.cpp by binding the ExtensionID for Override to a C++ function that performs the check.`


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h:86
 
+// An extension uniquely identifies an extension in a grammar.
+// It is the index into a table of extension values.
----------------
"an extension uniquely identifies an extension" is a tautology.

```
An extension is a piece of native code specific to a grammar that modifies the behavior of annotated rules.
One ExtensionID is assigned for each unique attribute value (all attributes share a namespace).
```


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h:88
+// It is the index into a table of extension values.
+// NOTE: value among extensions must be unique even within different keys!
+using ExtensionID = uint16_t;
----------------
I don't think this last sentence is useful as-is, it looks important but it's not clear who it constrains.
Also "attribute" rather than "key", an attribute is a thing that objects have, a key is a thing maps have.


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/Grammar.h:213
   std::vector<Nonterminal> Nonterminals;
+  // A table of extensions values, sorted by name.
+  // ExtensionID is the index of the table.
----------------
attribute values, or extension names

(attributes are syntactic and have string values, extensions are semantic and can be referred to by names).


================
Comment at: clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp:105
     }
+    applyExtension(Specs, *T);
+
----------------
doing this outside the loop above means this function has to handle all specs/rules and work out how they correspond, which is fragile.

Instead can we do it inside the loop, and just handle a single attribute at a time?

```
for (const auto &Spec : Specs) {
  // ... create rule
  for (const auto &Attribute : Spec.Attributes)
    applyAttribute(Attribute, T->Rules.back());
}
```


================
Comment at: clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp:184
       llvm::StringRef Symbol; // Name of the symbol
+      // Extensions that are associated to the sequence symbol or rule.
+      std::vector<std::pair<llvm::StringRef/*Key*/, llvm::StringRef/*Value*/>>
----------------
rather attributes :-(


================
Comment at: clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp:243
+    assert(Content.startswith("[") && Content.endswith("]"));
+    for (llvm::StringRef ExtText :
+         llvm::split(Content.drop_front().drop_back(), ';')) {
----------------
For simplicity we could also drop this loop. If we ever need to specify two attributes on the same token, `a := b [foo] [bar]` works fine


================
Comment at: clang-tools-extra/pseudo/lib/grammar/GrammarBNF.cpp:246
+      auto KV = ExtText.split('=');
+      if (KV.first == ExtText) { // no separator in Line
+        Diagnostics.push_back(
----------------
If we drop this line, then [foo] will be equivalent to [foo=], i.e. attribute is present and points at the empty string. This seems reasonable to me, and is a good syntax for boolean attributes (where [foo] denotes set and no attribute denotes unset)


================
Comment at: clang-tools-extra/pseudo/unittests/GrammarTest.cpp:104
+  build(R"bnf(
+    _ := IDENTIFIER [guard=override]
+  )bnf");
----------------
if you like you could add 3 more rules, with no annotation, guard=override, guard=somethingelse
and verify they're zero, equal, nonequal respectively


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126536/new/

https://reviews.llvm.org/D126536



More information about the cfe-commits mailing list