[PATCH] D63835: [Syntax] Add nodes for most common statements

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Aug 5 04:20:32 PDT 2019


sammccall accepted this revision.
sammccall added inline comments.
This revision is now accepted and ready to land.


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:99
 /// An abstract node for C++ statements, e.g. 'while', 'if', etc.
 class Statement : public Tree {
 public:
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > Do you want to expose the statement-ending-semicolon here?
> > 
> > (Not all statements have it, but common enough you may want it in the base class instead of all children)
> Yes, only "leaf" (i.e. the ones not having any statement children) have it.
> I was thinking about:
>   - having a separate class for non-composite statements and providing an accessor there,
>   - providing an accessor in each of the leaf statements (would mean some duplication, but, arguably, better discoverability).
> 
> But, from an offline conversation, we seem to disagree that inheritance is a proper way to model this.
> Would it be ok to do this in a follow-up? I'll add a FIXME for now.
First: yes, let's not do this now.

Second: I'm wary of using standard is-a inheritance to model more than alternation in the grammar. That is, ForStatement is-a Statement is fine, ForStatement is-a LoopyStatement is suspect. This is to do with the fact that LoopyStatement is-a Statement seems obvious, and we may end up with diamond-shaped inheritance and some conceptual confusion.

This goes for all traits that aren't natural tree-shaped inheritance: HasTrailingSemicolon, LoopyStatement, ...

I think there are two concerns here:
 - we want to be able to get the trailing-semicolon if it exists
 - we want to be able to check if the trailing-semicolon is *expected* including via its static type

One way to do this (not the only one...):

```
// generic helper, or callers could even write this directly
Optional<Leaf> trailingSemi(Tree *Node) {
  return firstElement(Node->Children<Leaf>(NodeRole::TrailingSemi));
}

// mixin for trailing semi support. Note: doesn't inherit Statement!
// maybe need/want some CRTP magic
class TrailingSemicolon {
  Optional<Leaf> trailingSemi() const { return trailingSemi((const Node*)this; }
}

// Gets the trailingSemi() accessor.
ExprStmt : public Statement, TrailingSemicolon { ... }

```


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:193
+  syntax::Statement *thenStatement();
+  syntax::Leaf *elseKeyword();
+  syntax::Statement *elseStatement();
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > I think throughout it's important to mark which of these are:
> >  - nullable in correct code
> >  - nullable in code generated by recovery
> I would suggest to only mark the nodes that are nullable in the correct code. For recovery, I would assume the following rule (please tell me if I'm wrong):
> 
> On a construct whose parsing involved recovery:
> - if the node has an introducing token (`if`, `try`, etc.), the corresponding child cannot be null.
> - any other child can be null.
Agree with this strategy, and the fact that it doesn't need to be documented on every node/occurrence.

But it should definitely be documented somewhere at a high level! (With clang AST, this sort of thing feels like tribal knowledge)


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:24
 
 /// A kind of a syntax node, used for implementing casts.
 enum class NodeKind : uint16_t {
----------------
Can you add a comment here saying the ordering/blocks must correspond to the Node inheritance hierarchy? This is *kind of* common knowledge, but I think this is normally handled by tablegen.


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:79
+  ExpressionStatement_expression,
+  CompoundStatement_statement
 };
----------------
As discussed offline, there's some options about how abstract/concrete these roles should be.

e.g. for a list of function args, this could be FunctionOpenParen/FunctionArgExpr/FunctionArgComma/FunctionCloseParam (specific) <-> OpenParen/Arg/Comma/CloseParen <-> Open/Item/Separator/Close.

The more specific ones are somewhat redundant with the parent/child type (but easy to assign systematically), and the more generic ones are more orthogonal (but require more design and may by hard to always make consistent).

The concrete advantage of the generic roles is being able to write code like `getTrailingSemicolon(Tree*)` or `findLoopBreak(Stmt*)` or `removeListItem(Tree*, int)` in a fairly generic way, without resorting to adding a `Loop` base class or handling each case with separate code.

This is up to you, though.


================
Comment at: clang/include/clang/Tooling/Syntax/Nodes.h:265
+  syntax::Leaf *returnKeyword();
+  syntax::Expression *value();
+};
----------------
nullable, marked somehow

Optional<Expression*> is tempting as a systematic and hard-to-ignore way of documenting that.
And it reflects the fact that there are three conceptual states for children: present, legally missing, brokenly missing.

At the same time, I'm not sure how to feel about the fact that in practice this can't be present but null, and the fact that *other* non-optional pointers can be null.


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:99
+  /// semicolon when needed.
   llvm::ArrayRef<syntax::Token> getRange(const Stmt *S) const {
+    auto Tokens = getRange(S->getBeginLoc(), S->getEndLoc());
----------------
since Expr : Stmt, we need to be a bit wary of overloading based on static type.

It's tempting to say it's correct here: if we statically know E is an Expr, then maybe it's never correct to consume the semicolon. But is the converse true? e.g. if we're traversing using RAV and call getRange() in visitstmt...

(The alternatives seem to be a) removing the expr version of the function, and having the stmt version take a `bool ConsumeSemi` or b) change the stmt version to have (dynamic) expr behave like the expr overload, and handle it specially when forming exprstmt. More verbose, genuinely conflicted here)


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:270
+
+  bool TraverseCXXForRangeStmt(CXXForRangeStmt *S) {
+    // We override to traverse range initializer as VarDecl.
----------------
maybe group with corresponding `WalkUpFromCXXForRangeStmt`?
(Could also group all `Traverse*` together if you prefer. Current ordering seems a little random)


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:272
+    // We override to traverse range initializer as VarDecl.
+    // RAT traverses it as a statement, we produce invalid node kinds in that
+    // case.
----------------
nit: RAV?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63835/new/

https://reviews.llvm.org/D63835





More information about the cfe-commits mailing list