[cfe-dev] RFC: Easier AST Matching by Default

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 24 00:15:48 PDT 2020


On Tue, 23 Jun 2020, 06:35 David Rector via cfe-dev, <cfe-dev at lists.llvm.org>
wrote:

> True, ParenExpr is the one oddball case: all syntax, no semantics, the
> exact opposite of most of the nodes that get in people’s way.
>
> But I think Type::getAs<T>() is again instructive: one can call getAs<T>()
> on a ParenType to get its underlying type, and on an ElaboratedType etc. —
> nodes that are purely syntactic.  So just allow the single getAs to also
> get a ParenExpr as its underlying expression.  I believe it is the only
> such oddball case.
>

It's not, there's a few types of node that we consider to be "parens", some
of them only conditionally. These include things like _Generic and
__builtin_choose_expr.

What we would probably also want is an analog or two of Type’s
> getCanonicalType, except for getting the underlying syntax:
>

We already have quite a few different flavours of that -- look at
Stmt::IgnoreImplicit and friends. There doesn't seem to be a good
one-size-fits-all solution here, given the number of different variants of
this we've invented. But maybe there's something good enough for most
matchers at least.

Stmt {
>> T *getAs() {
> if (isa<T>(this))
> return cast<T>(this);
> if (this->isImplicit() || isa<ParenExpr>(this)) {
> assert(children.size() == 1 && "Expected implicit nodes to have exactly
> one child");
> return (*children.begin())->getAs<T>();
> }
> return nullptr;
> }
>

Note that the analogy to Type::getAs has already broken down here; getAs
always returns a type with the same semantics but a possibly different
meaning, whereas this can return an expression with different semantics by
skipping implicit nodes.

Expr *getAsWritten() {
> if (this->isImplicit())
> return (*children.begin())->getAsWritten());
> return this;
> }
> Expr *getAsWrittenIgnoreParens() {
> if (isa<ParenExpr>(this))
> return this;
> return getAsWritten();
> }
> };
>
> Decls too could at least in some cases benefit from the same interface:
>
> struct A {
> union { int i; float f; }
> };
>
> I believe the union becomes an implicit FieldDecl, whose type is the
> anonymous union’s CXXRecordDecl; a user may well want to call
>
>   if (auto AnonUnion = someFieldDecl->getAs<CXXRecordDecl>())
> //...
>
> to handle this case.
>

Taking it this far seems questionable to me. The field and the union are
very different entities here, so I don't think the analogy to type sugar
and parentheses really holds.

A getAs<T> in every AST node would provide a uniform way of handling
> implicit nodes across the AST, a definite benefit to beginners without any
> apparent cost to long time users.
>
> Dave
>
>
> On Jun 23, 2020, at 2:30 AM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Mon, 22 Jun 2020 at 11:44, David Rector via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> > I would like to hear the opinions of others on these questions. I think
>> > we've both described our perspectives and made our cases.
>> >
>> >
>> > Just expanding on my position from earlier in this thread slightly: to
>> > directly address the "where to from here?" question, I would like us to
>> get
>> > back to the state where, *in our default mode, the matcher for AST node
>> X
>> > is always able to match all instances of AST node X*. I think there are
>> > various options for how we get there that seem acceptable, and that
>> don't
>> > sacrifice your noble goals of making AST matching easier and more
>> > intuitive. In particular, I think it's fine (and probably good) if by
>> > default we look through non-matching implicit nodes while looking for a
>> > matching node (so long as we don't look past a matching node just
>> because
>> > it's implicit). And I think it's fine (and probably good) to expose an
>> easy
>> > way to explicitly control whether we automatically look through implicit
>> > nodes or not. But I think if no-one is prepared to do the work to make
>> our
>> > default mode satisfy the above property while still looking through
>> > implicit nodes if (and only if) they don't match, then I think we should
>> > change the default back to the state we had before. (I don't have much
>> of
>> > an opinion on whether to keep or remove 'traverse' and its various
>> modes,
>> > other than that we've already caused a substantial amount of churn with
>> the
>> > changes thus far, and removing them again would cause further churn.)
>> >
>>
>> 1. It would be helpful to rehash the examples one final time — give the
>> code & the dump, a proposed matcher, then what option A would match, B,
>> etc.  Others should try to consider at each as would a beginner.
>>
>> 2. Perhaps this an opportunity to create a uniform interface to deal with
>> implicit nodes in the AST.  I believe any implicit node only ever has at
>> most one child node.  If so, consider this:
>>         (a) Always have "Implicit*" versions of any implicit-able nodes
>> (e.g. ImplicitCXXConstructExpr, in addition to CXXConstructExpr);
>>         (c) Implement getAs<T>() for Decls and Stmts, the same we do for
>> Types: e.g. E->getAs<CXXConstructExpr>() would skip over the ExprWCleanups,
>> ImplicitCXXConstructExpr, etc.  But you could still call
>> E->getAs<ImplicitCXXConstructExpr>() too if desired.
>>
>> Then, in the case of ASTMatchers, the matchers would always match the
>> implicit nodes as before, but Results.Nodes.getNodeAs<…> would call
>> getAs<…> on each actual result.  I believe this aligns with the behavior
>> you suggested Richard.  But I have not fully thought this through, so I
>> defer to others.
>>
>
> I think that's a pretty interesting approach, that definitely has some
> appeal. I think we should also be thinking about how to serve the needs of
> clients that want a syntactic view or a semantic one -- some clients want
> to skip over implicit AST nodes (eg, nodes that describe semantics but no
> syntax), and others want to skip over what we call "parentheses" (eg, nodes
> that describe syntax but whose sole subexpression has the same semantic
> effect). A single getAs<...> might have a hard time serving both classes of
> user.
>
>
>> My perspective on ASTMatchers, FWIW: I never got into them, because I had
>> begun learning the AST first and it seemed redundant since you could do the
>> same stuff with RecursiveASTVisitor, and there seemed to be a lot of extra
>> infrastructure you had to deal with.  I now see that the matchers can be
>> considerably tighter and clearer -- but that is the only advantage over
>> doing it the full way with RecursiveASTVisitor, so whatever the solution
>> is, it must keep or increase that clarity advantage while retaining as much
>> of the flexibility of RecursiveASTVisitor as possible.  E.g. it sounded
>> like Billy, the guy who had issues with Stmt traversal order (owing to the
>> queueing of child statements, instead of stack processing them as most
>> would expect), decided to just use RecursiveASTVisitor instead of deal with
>> the hassle — that is the outcome to avoid.
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200624/b03709b0/attachment-0001.html>


More information about the cfe-dev mailing list