[PATCH] D129222: [pseudo] Implement a guard to determine function declarator.

Thu Jul 7 13:25:45 PDT 2022

hokein added inline comments.

================
Comment at: clang-tools-extra/pseudo/gen/Main.cpp:114
+      const clang::pseudo::Rule &R = G.table().Rules[RID];
+      // lhs$$rhs$rhs$rhs
+      std::string EnumName = symbolName(R.Target, G) + "$";
----------------
sammccall wrote:
> so the dollar signs are a practical problem: at least on my machine the warning is on by default, so we emit thousands of warnings and it's impossible to find the error among them.
> 
> At *minimum* we need to use pragmas or something to suppress the warning.
> 
> But using nonstandard C++ seems likely to cause other problems and limit portability for not-great reasons.
> 
> Some ideas:
>  -`Rule::ptr_declarator__EQUALS__ptr_operator__ptr_declarator` (this violates the internal `__` rule, though that one is widely ignored)
>  - `Rule::ptr_declarator_0ptr_operator_1ptr_declarator`
>  - `Rule::PtrDeclarator_EQ_PtrOperator_PtrDeclarator` (yet another spelling variation, but quite readable. We could even change the spellings in the BNF file...)
> so the dollar signs are a practical problem: at least on my machine the warning is on by default, so we emit thousands of warnings and it's impossible to find the error among them.

I didn't see this warning on my machine, I'm using the `g++`.

> Rule::ptr_declarator__EQUALS__ptr_operator__ptr_declarator (this violates the internal __ rule, though that one is widely ignored)

It is not quite readable to me. 

> Rule::PtrDeclarator_EQ_PtrOperator_PtrDeclarator (yet another spelling variation, but quite readable. We could even change the spellings in the BNF file...)

It is better, but I'm not willing to change the spelling to CamelCase in bnf grammar, I like the current `ptr-declarator` name, and it matches the standard style. We could keep them unchanged in the bnf, and do the conversion during the mangling.

> Rule::ptr_declarator_0ptr_operator_1ptr_declarator

I'm starting to like it now. I find that encoding the index for RHS is useful, especially when we  do some work on the ForestNode based on the rules. So I will vote this one.

================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/Language.h:29
 // Return true if the guard is satisfied.
 using RuleGuard = llvm::function_ref<bool(
+    RuleID RID, llvm::ArrayRef<const ForestNode *> RHS, const TokenStream &)>;
----------------
sammccall wrote:
> sammccall wrote:
> > This allows/encourages guards that dynamically consider multiple rules, which increases the amount of coupling between rules and grammar.
> > 
> > Can we express this as
> > function-declarator := declarator [guard=FunctionDeclarator]?
> > 
> > This does mangle the grammar a bit to our implementation, and introduces two names for the same concept (guard & nonterminal).
> > If this feels ugly and redundant, another option would be to change the guard to be specified by the rule ID instead of an extension ID:
> > ```
> > function-declarator := declarator [guard]
> > 
> > DenseMap<RuleID, RuleGuard> Guards;
> > ```
> Having played with this idea a bit, I do think we should strongly consider dispatching on the rule ID rather than a named extension. I went to add some more guards (on NUMERIC_CONSTANT etc) and they really are 1:1 with rules.
> This allows/encourages guards that dynamically consider multiple rules, which increases the amount of coupling between rules and grammar.

Yeah, this is the sad bit of this approach, but IMO it is not that bad.

> Can we express this as
> function-declarator := declarator [guard=FunctionDeclarator]?
> This does mangle the grammar a bit to our implementation, and introduces two names for the same concept (guard & nonterminal).

We can do this (we will also need another `non-function-declarator` nonterminal).

I feel uncomfortable of it, this seems a non-trivial&non-local change (we're changing an important nonterminal) -- `Declarator` is the only top-level declarator defined in the standard grammar, and now we add two more, and we need to update all related declaration rules. We're making it more complicated, which might introduce subtle grammar bugs.

There is a 3rd approach -- we could consider make the guard on symbol level rather than on Rule level.  The guard corresponds to one of the grammar symbols on RHS.

================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/Language.h:29
 // Return true if the guard is satisfied.
 using RuleGuard = llvm::function_ref<bool(
+    RuleID RID, llvm::ArrayRef<const ForestNode *> RHS, const TokenStream &)>;
----------------
hokein wrote:
> sammccall wrote:
> > sammccall wrote:
> > > This allows/encourages guards that dynamically consider multiple rules, which increases the amount of coupling between rules and grammar.
> > > 
> > > Can we express this as
> > > function-declarator := declarator [guard=FunctionDeclarator]?
> > > 
> > > This does mangle the grammar a bit to our implementation, and introduces two names for the same concept (guard & nonterminal).
> > > If this feels ugly and redundant, another option would be to change the guard to be specified by the rule ID instead of an extension ID:
> > > ```
> > > function-declarator := declarator [guard]
> > > 
> > > DenseMap<RuleID, RuleGuard> Guards;
> > > ```
> > Having played with this idea a bit, I do think we should strongly consider dispatching on the rule ID rather than a named extension. I went to add some more guards (on NUMERIC_CONSTANT etc) and they really are 1:1 with rules.
> > This allows/encourages guards that dynamically consider multiple rules, which increases the amount of coupling between rules and grammar.
> 
> Yeah, this is the sad bit of this approach, but IMO it is not that bad.
> 
> > Can we express this as
> > function-declarator := declarator [guard=FunctionDeclarator]?
> > This does mangle the grammar a bit to our implementation, and introduces two names for the same concept (guard & nonterminal).
> 
> We can do this (we will also need another `non-function-declarator` nonterminal).
> 
> I feel uncomfortable of it, this seems a non-trivial&non-local change (we're changing an important nonterminal) -- `Declarator` is the only top-level declarator defined in the standard grammar, and now we add two more, and we need to update all related declaration rules. We're making it more complicated, which might introduce subtle grammar bugs.
> 
> There is a 3rd approach -- we could consider make the guard on symbol level rather than on Rule level.  The guard corresponds to one of the grammar symbols on RHS.
> Having played with this idea a bit, I do think we should strongly consider dispatching on the rule ID rather than a named extension. I went to add some more guards (on NUMERIC_CONSTANT etc) and they really are 1:1 with rules.

This is an interesting idea. The annotation syntax `[guard]` in grammar is not required, as we can just use `{Rule::function-declarator$$declarator, guardFunctionDeclarator}` to build the map, this means we loose the connection between the grammar annotation and C++ binding code, which seems like a regression to me :(

It has a hard restriction: every grammar rule will have at most 1 guard. It is probably fine, I can't think of a case where we will use multiple guards for a single rule (even we have, we could solve it by introducing a new rule with guard).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129222/new/

https://reviews.llvm.org/D129222