[cfe-dev] RFC: Easier AST Matching by Default

Tue Jun 23 06:35:27 PDT 2020

True, ParenExpr is the one oddball case: all syntax, no semantics, the exact opposite of most of the nodes that get in people’s way.  

But I think Type::getAs<T>() is again instructive: one can call getAs<T>() on a ParenType to get its underlying type, and on an ElaboratedType etc. — nodes that are purely syntactic.  So just allow the single getAs to also get a ParenExpr as its underlying expression.  I believe it is the only such oddball case.

What we would probably also want is an analog or two of Type’s getCanonicalType, except for getting the underlying syntax:

Stmt {
	…
	T *getAs() {
		if (isa<T>(this))
			return cast<T>(this);
		if (this->isImplicit() || isa<ParenExpr>(this)) {
			assert(children.size() == 1 && "Expected implicit nodes to have exactly one child");
			return (*children.begin())->getAs<T>();
		}
		return nullptr;
	}
	Expr *getAsWritten() {
		if (this->isImplicit())
			return (*children.begin())->getAsWritten());
		return this;
	}
	Expr *getAsWrittenIgnoreParens() {
		if (isa<ParenExpr>(this))
			return this;
		return getAsWritten();
	}
};

Decls too could at least in some cases benefit from the same interface:

struct A {
	union { int i; float f; }
};

I believe the union becomes an implicit FieldDecl, whose type is the anonymous union’s CXXRecordDecl; a user may well want to call 

  if (auto AnonUnion = someFieldDecl->getAs<CXXRecordDecl>())
	//...

to handle this case.

A getAs<T> in every AST node would provide a uniform way of handling implicit nodes across the AST, a definite benefit to beginners without any apparent cost to long time users.

Dave

> On Jun 23, 2020, at 2:30 AM, Richard Smith <richard at metafoo.co.uk> wrote:
> 
> On Mon, 22 Jun 2020 at 11:44, David Rector via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> > I would like to hear the opinions of others on these questions. I think
> > we've both described our perspectives and made our cases.
> > 
> > 
> > Just expanding on my position from earlier in this thread slightly: to
> > directly address the "where to from here?" question, I would like us to get
> > back to the state where, *in our default mode, the matcher for AST node X
> > is always able to match all instances of AST node X*. I think there are
> > various options for how we get there that seem acceptable, and that don't
> > sacrifice your noble goals of making AST matching easier and more
> > intuitive. In particular, I think it's fine (and probably good) if by
> > default we look through non-matching implicit nodes while looking for a
> > matching node (so long as we don't look past a matching node just because
> > it's implicit). And I think it's fine (and probably good) to expose an easy
> > way to explicitly control whether we automatically look through implicit
> > nodes or not. But I think if no-one is prepared to do the work to make our
> > default mode satisfy the above property while still looking through
> > implicit nodes if (and only if) they don't match, then I think we should
> > change the default back to the state we had before. (I don't have much of
> > an opinion on whether to keep or remove 'traverse' and its various modes,
> > other than that we've already caused a substantial amount of churn with the
> > changes thus far, and removing them again would cause further churn.)
> > 
> 
> 1. It would be helpful to rehash the examples one final time — give the code & the dump, a proposed matcher, then what option A would match, B, etc.  Others should try to consider at each as would a beginner.
> 
> 2. Perhaps this an opportunity to create a uniform interface to deal with implicit nodes in the AST.  I believe any implicit node only ever has at most one child node.  If so, consider this:
>         (a) Always have "Implicit*" versions of any implicit-able nodes (e.g. ImplicitCXXConstructExpr, in addition to CXXConstructExpr);
>         (c) Implement getAs<T>() for Decls and Stmts, the same we do for Types: e.g. E->getAs<CXXConstructExpr>() would skip over the ExprWCleanups, ImplicitCXXConstructExpr, etc.  But you could still call E->getAs<ImplicitCXXConstructExpr>() too if desired.
> 
> Then, in the case of ASTMatchers, the matchers would always match the implicit nodes as before, but Results.Nodes.getNodeAs<…> would call getAs<…> on each actual result.  I believe this aligns with the behavior you suggested Richard.  But I have not fully thought this through, so I defer to others.
> 
> I think that's a pretty interesting approach, that definitely has some appeal. I think we should also be thinking about how to serve the needs of clients that want a syntactic view or a semantic one -- some clients want to skip over implicit AST nodes (eg, nodes that describe semantics but no syntax), and others want to skip over what we call "parentheses" (eg, nodes that describe syntax but whose sole subexpression has the same semantic effect). A single getAs<...> might have a hard time serving both classes of user.
>  
> My perspective on ASTMatchers, FWIW: I never got into them, because I had begun learning the AST first and it seemed redundant since you could do the same stuff with RecursiveASTVisitor, and there seemed to be a lot of extra infrastructure you had to deal with.  I now see that the matchers can be considerably tighter and clearer -- but that is the only advantage over doing it the full way with RecursiveASTVisitor, so whatever the solution is, it must keep or increase that clarity advantage while retaining as much of the flexibility of RecursiveASTVisitor as possible.  E.g. it sounded like Billy, the guy who had issues with Stmt traversal order (owing to the queueing of child statements, instead of stack processing them as most would expect), decided to just use RecursiveASTVisitor instead of deal with the hassle — that is the outcome to avoid.
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200623/33323c94/attachment-0001.html>