[cfe-dev] RFC: Easier AST Matching by Default

Stephen Kelly via cfe-dev cfe-dev at lists.llvm.org
Sat Jun 20 09:17:00 PDT 2020


On 18/06/2020 18:38, Richard Smith wrote:
> On Wed, 17 Jun 2020 at 15:50, Stephen Kelly via cfe-dev 
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
>     On 17/06/2020 22:47, Richard Smith wrote:
>
>>     Consider the example from Yitzhak Mandelbaum -- the issue there
>>     is actually not whether we have a template or not; rather, it's a
>>     difference between one- and two-argument constructors:
>>     I hope you can agree that this change in behavior, for an example
>>     such as this, is a regression for the ability for beginners to
>>     understand how matchers work and effectively write them!
>
>
>     Yes, I see your point. The current state doesn't account well for
>     this case.
>
>     Would it resolve the problem if the CXXConstructExpr was not
>     excluded in this case (in the initializer of a VarDecl), assuming
>     that can be implemented?
>
>     I think it would remain better for newcomers not to see (and match
>     on) the CXXConstructExprs in https://godbolt.org/z/TvUTgy .
>
> I think if someone writes a cxxConstructExpr matcher, they would 
> expect it to match all cases that invoke a constructor. This case 
> does, so it should match. I think it would be reasonable if, by 
> default, both returnStmt(hasReturnValue(integerLiteral())) and 
> returnStmt(hasReturnValue(cxxConstructExpr())) match this case. The 
> former matches when viewed syntactically, and the latter matches when 
> viewed semantically.


This is the intention of the change. The RFC makes several references to 
the differences between syntax and semantics: 
https://docs.google.com/document/d/17Z6gAwwc3HoRXvsy0OdwU0X5MFQEuiGeSu3i6ICOB90/edit#heading=h.l8zml1r0ls7u

Perhaps it would be better if the name of the two modes were different. 
ie rename:

traverse(AsIs, ...)

to

traverse(Semantic, ...)

and rename

traverse(IgnoreUnlessSpelledInSource, ...)

to

traverse(Syntactic, ...)

Thoughts on that?


>     I think the best refinement for now would be to restore the
>     CXXConstructExpr in the case of a varDecl initializer, if that is
>     possible (may not be).
>
> I think that is addressing a symptom rather than the cause. I think 
> the root cause is that a matcher that is explicitly asking to match a 
> certain implicit (or sometimes-implicit) AST node does not match.


That is the purpose of AsIs/Semantic mode if I understood you correctly.

The intention of the IgnoreUnlessSpelledInSource/Syntactic node is to 
*not* match certain implicit (or sometimes-implicit) AST nodes.


> For example, I would expect that implicitCastExpr() *never* matches 
> under the new default behavior.


That is correct.


> And I think that users, and especially beginner users, will see that 
> as being simply broken -- if someone tries to write an 
> implicitCastExpr() matcher, it's obvious that they want to match 
> implicit casts, and we are not doing the user any favors by making 
> that matcher not match by default.


Hmm, if someone is the kind of newcomer that they've never encountered a 
CallExpr, FunctionDecl or any other AST node in their life before, I 
don't see why implicitCastExpr() would be the first, or one of the 
first, things they try.

There may be different/multiple levels of newcomer that we each have in 
mind here.

If someone is a kind of newcomer who wants to match on an 
implicitCastExpr(), then they're probably also the kind of newcomer who 
knows that's an implicit node and they should use Semantic node instead 
of Syntactic mode. Do you agree?


>     If I understand your suggestion, given
>
>
>     ```
>     struct B
>     {
>       B(int);
>     };
>
>     B func1() {
>       return 42;
>     }
>     ```
>
>     a matcher like `returnStmt(hasReturnValue(fooExpr()))`
>
>     would match for `fooExpr` as any of
>
>
>     * exprWithCleanups()
>     * cxxConstructExpr() (twice)
>     * materializeTemporaryExpr()
>     * implicitCastExpr()
>     * integerLiteral()
>
>     right?
>
> Yes.
>
>     and a matcher like
>     `returnStmt(hasReturnValue(expr().bind("theReturnVal")))`
>
>     would bind "theReturnVal" to the `exprWithCleanups()` right?
>
>     Your suggestion may be that it should bind to the integerLiteral()
>     in that case. That may have more scope for confusion though?
>
> I think that's an interesting question. My inclination would be to 
> bind to the ExprWithCleanups. Simple syntax-oriented transformations 
> should be fine with this; the source range of the implicit nodes 
> should be the same as that of their inner nodes. And 
> semantics-oriented transformations need this, in order to be able to 
> inspect the semantics of the node they captured. (Fancy 
> syntax-oriented transformations that are going to inspect the clang 
> AST node they get back, rather than only applying matchers and 
> rewrites, will need to be aware that they might get an implicit Expr* 
> node. But once you're looking at the Expr* yourself, you've stepped 
> out of the "beginner" area and are going to need to know more about 
> how the Clang AST works in general.)


This might be an area for further research, and possibly a third mode in 
addition to the Syntactic and Semantic modes.

Thanks,

Stephen.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200620/0a025074/attachment-0001.html>


More information about the cfe-dev mailing list