[cfe-dev] RFC: Easier AST Matching by Default

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Sun Jun 21 11:59:13 PDT 2020


On Sat, 20 Jun 2020 at 09:17, Stephen Kelly <steveire at gmail.com> wrote:

> On 18/06/2020 18:38, Richard Smith wrote:
>
> On Wed, 17 Jun 2020 at 15:50, Stephen Kelly via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> On 17/06/2020 22:47, Richard Smith wrote:
>>
>> Consider the example from Yitzhak Mandelbaum -- the issue there is
>> actually not whether we have a template or not; rather, it's a difference
>> between one- and two-argument constructors:
>>
>> I hope you can agree that this change in behavior, for an example such as
>> this, is a regression for the ability for beginners to understand how
>> matchers work and effectively write them!
>>
>>
>> Yes, I see your point. The current state doesn't account well for this
>> case.
>>
>> Would it resolve the problem if the CXXConstructExpr was not excluded in
>> this case (in the initializer of a VarDecl), assuming that can be
>> implemented?
>>
>> I think it would remain better for newcomers not to see (and match on)
>> the CXXConstructExprs in https://godbolt.org/z/TvUTgy .
>>
> I think if someone writes a cxxConstructExpr matcher, they would expect it
> to match all cases that invoke a constructor. This case does, so it should
> match. I think it would be reasonable if, by default, both
> returnStmt(hasReturnValue(integerLiteral())) and
> returnStmt(hasReturnValue(cxxConstructExpr())) match this case. The former
> matches when viewed syntactically, and the latter matches when viewed
> semantically.
>
>
> This is the intention of the change. The RFC makes several references to
> the differences between syntax and semantics:
> https://docs.google.com/document/d/17Z6gAwwc3HoRXvsy0OdwU0X5MFQEuiGeSu3i6ICOB90/edit#heading=h.l8zml1r0ls7u
>
> Perhaps it would be better if the name of the two modes were different. ie
> rename:
>
> traverse(AsIs, ...)
>
> to
>
> traverse(Semantic, ...)
>
> and rename
>
> traverse(IgnoreUnlessSpelledInSource, ...)
>
> to
>
> traverse(Syntactic, ...)
>
> Thoughts on that?
>
I think if we want to expose a syntactic mode, we should do that with a set
of syntactic matchers (eg, a matcher that matches parenthesized
initialization). Suppose someone wants to match all direct initialization.
Right now, they need to do lots of checks: for a non-list, non-implicit
cxxConstructExpr, for a varDecl whose initialization kind is direct init,
for a cxxTemporaryObjectExpr, for a functionalCastExpr, and there'll likely
be other kinds that I forgot and more added in the future.

I don't think it's intuitive to call IgnoreUnlessSpelledInSource mode
Syntactic, because it isn't really that -- it doesn't let you match syntax,
it lets you match semantics-with-associated-tokens (and even that is only
an approximate description). I think it would be really interesting to add
a way to actually match syntax in a semantics-free way, but it seems like a
big project.

Also the mode you're calling Semantic is also not a semantic match, because
it also matches syntax-only nodes. (It doesn't implicitly IgnoreParens or
anything like that.)

That said, I think neither Semantic nor Syntactic is the right default. By
default we should be assuming that matchers want to match a combination of
syntax and semantics -- usually a matcher will want to key off some
semantic effect obtained in a particular syntactic context.

> I think the best refinement for now would be to restore the
>> CXXConstructExpr in the case of a varDecl initializer, if that is possible
>> (may not be).
>>
> I think that is addressing a symptom rather than the cause. I think the
> root cause is that a matcher that is explicitly asking to match a certain
> implicit (or sometimes-implicit) AST node does not match.
>
>
> That is the purpose of AsIs/Semantic mode if I understood you correctly.
>
> The intention of the IgnoreUnlessSpelledInSource/Syntactic node is to
> *not* match certain implicit (or sometimes-implicit) AST nodes.
>
>
> For example, I would expect that implicitCastExpr() *never* matches under
> the new default behavior.
>
>
> That is correct.
>
>
> And I think that users, and especially beginner users, will see that as
> being simply broken -- if someone tries to write an implicitCastExpr()
> matcher, it's obvious that they want to match implicit casts, and we are
> not doing the user any favors by making that matcher not match by default.
>
>
> Hmm, if someone is the kind of newcomer that they've never encountered a
> CallExpr, FunctionDecl or any other AST node in their life before, I don't
> see why implicitCastExpr() would be the first, or one of the first, things
> they try.
>
Perhaps because they want to match an implicit cast, and they find it in
the documentation. Perhaps because they read one of the guides that says
"look at the AST dump and write matchers to match what you see there".

> There may be different/multiple levels of newcomer that we each have in
> mind here.
>
> If someone is a kind of newcomer who wants to match on an
> implicitCastExpr(), then they're probably also the kind of newcomer who
> knows that's an implicit node and they should use Semantic node instead of
> Syntactic mode. Do you agree?
>
No. I think it's bad API design to have any mode in which implicitCastExpr
compiles but doesn't ever match.

If I understand your suggestion, given
>>
>>
>> ```
>> struct B
>> {
>>   B(int);
>> };
>>
>> B func1() {
>>   return 42;
>> }
>> ```
>>
>> a matcher like `returnStmt(hasReturnValue(fooExpr()))`
>>
>> would match for `fooExpr` as any of
>>
>> * exprWithCleanups()
>> * cxxConstructExpr() (twice)
>> * materializeTemporaryExpr()
>> * implicitCastExpr()
>> * integerLiteral()
>>
>> right?
>>
> Yes.
>
>> and a matcher like
>> `returnStmt(hasReturnValue(expr().bind("theReturnVal")))`
>>
>> would bind "theReturnVal" to the `exprWithCleanups()` right?
>>
>> Your suggestion may be that it should bind to the integerLiteral() in
>> that case. That may have more scope for confusion though?
>>
> I think that's an interesting question. My inclination would be to bind to
> the ExprWithCleanups. Simple syntax-oriented transformations should be fine
> with this; the source range of the implicit nodes should be the same as
> that of their inner nodes. And semantics-oriented transformations need
> this, in order to be able to inspect the semantics of the node they
> captured. (Fancy syntax-oriented transformations that are going to inspect
> the clang AST node they get back, rather than only applying matchers and
> rewrites, will need to be aware that they might get an implicit Expr* node.
> But once you're looking at the Expr* yourself, you've stepped out of the
> "beginner" area and are going to need to know more about how the Clang AST
> works in general.)
>
>
> This might be an area for further research, and possibly a third mode in
> addition to the Syntactic and Semantic modes.
>
> Thanks,
>
> Stephen.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200621/09c96117/attachment.html>


More information about the cfe-dev mailing list