[cfe-dev] RFC: Easier AST Matching by Default

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Thu Jun 18 10:38:37 PDT 2020


On Wed, 17 Jun 2020 at 15:50, Stephen Kelly via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 17/06/2020 22:47, Richard Smith wrote:
>
> Consider the example from Yitzhak Mandelbaum -- the issue there is
> actually not whether we have a template or not; rather, it's a difference
> between one- and two-argument constructors:
>
> I hope you can agree that this change in behavior, for an example such as
> this, is a regression for the ability for beginners to understand how
> matchers work and effectively write them!
>
>
> Yes, I see your point. The current state doesn't account well for this
> case.
>
> Would it resolve the problem if the CXXConstructExpr was not excluded in
> this case (in the initializer of a VarDecl), assuming that can be
> implemented?
>
> I think it would remain better for newcomers not to see (and match on) the
> CXXConstructExprs in https://godbolt.org/z/TvUTgy .
>
I think if someone writes a cxxConstructExpr matcher, they would expect it
to match all cases that invoke a constructor. This case does, so it should
match. I think it would be reasonable if, by default, both
returnStmt(hasReturnValue(integerLiteral())) and
returnStmt(hasReturnValue(cxxConstructExpr())) match this case. The former
matches when viewed syntactically, and the latter matches when viewed
semantically.

> Perhaps as an alternative, we should not skip implicit nodes when trying
>> to match a node that might be implicit (analogous to how Type::getAs steps
>> over sugar nodes except when looking for that type sugar node), and instead
>> of hiding implicit nodes in the AST dump we should dim them out but still
>> show them?
>>
>>
>> Color wouldn't be particularly distinctive, especially in godbolt for
>> example, but it also would mean that the confusing stuff is still *there*
>> in the output. It can be overwhelming when absolutely everything in the
>> output is new and confusing.
>>
>> (I also note that newcomers to Matchers don't need to see the AST at all
>> if the clang-query shows them the matchers they can use instead:
>> http://ce.steveire.com/z/9EF8x0 see my other posts on the topic for more)
>>
> Perhaps we should consider the AST dump to be an expert-level feature and
> not try to optimize it for beginner usage, especially if we have better
> output methods. I'm not sure, though; I think telling beginners to look at
> the AST dump and match what they see there has been somewhat effective.
>
>
> Yes, I think it makes sense for newcomers to be able to get
> non-overwhelming exposure to the AST but not have to be confronted with it
> from the very beginning. That is where making clang-query output matchers
> could be useful as a beginning point. That is not yet upstream yet however.
>
>
> I think we should assume that we can ask the maintainers of compiler
> explorer to render the output nicely, if we find that's a problem. (They
> already apply some postprocessing to it, to remove pointer values and the
> like.)
>
> The "AST Output" on Compiler Explorer filters out pointers, but the
> clang-query tool does not. I have designs to have a way to make clang-query
> omit them anyway (using enable output simple-ast instead of detailed-ast -
> that is not upstream yet, but I think it means we don't need to do that
> kind of filtering in Compiler Explorer).
>
> I think colors in the output might already show up properly, but that
> would still leave the problem of the things being *there*, even if they are
> greyed out or surrounded in square brackets (which I would find even more
> confusing and noisy).
>
> Nevertheless, this change in the direction that I consider easier hasn't
>> been 100% popular among other clang developers (though I wonder how much of
>> that is due to being experts?). At this point, if someone wishes to reverse
>> the change of default I don't think I would attempt to oppose it. It's not
>> clear to me whether my vision for making things easier for newcomers (which
>> I set out in EuroLLVM and which included ignoring these nodes) is shared.
>> It might be better at this point for others to decide.
>>
> I think it's a good goal. But I think the implementation needs some
> refinement. Did you have any thoughts on my suggestion of how to fix this?
> (That is: try matching against implicit nodes, but skip them if they don't
> match.) I don't think you commented on it.
>
>
> I think the best refinement for now would be to restore the
> CXXConstructExpr in the case of a varDecl initializer, if that is possible
> (may not be).
>
I think that is addressing a symptom rather than the cause. I think the
root cause is that a matcher that is explicitly asking to match a certain
implicit (or sometimes-implicit) AST node does not match. For example, I
would expect that implicitCastExpr() *never* matches under the new default
behavior. And I think that users, and especially beginner users, will see
that as being simply broken -- if someone tries to write an
implicitCastExpr() matcher, it's obvious that they want to match implicit
casts, and we are not doing the user any favors by making that matcher not
match by default.

> If I understand your suggestion, given
>
>
> ```
> struct B
> {
>   B(int);
> };
>
> B func1() {
>   return 42;
> }
> ```
>
> a matcher like `returnStmt(hasReturnValue(fooExpr()))`
>
> would match for `fooExpr` as any of
>
> * exprWithCleanups()
> * cxxConstructExpr() (twice)
> * materializeTemporaryExpr()
> * implicitCastExpr()
> * integerLiteral()
>
> right?
>
Yes.

> and a matcher like
> `returnStmt(hasReturnValue(expr().bind("theReturnVal")))`
>
> would bind "theReturnVal" to the `exprWithCleanups()` right?
>
> Your suggestion may be that it should bind to the integerLiteral() in that
> case. That may have more scope for confusion though?
>
I think that's an interesting question. My inclination would be to bind to
the ExprWithCleanups. Simple syntax-oriented transformations should be fine
with this; the source range of the implicit nodes should be the same as
that of their inner nodes. And semantics-oriented transformations need
this, in order to be able to inspect the semantics of the node they
captured. (Fancy syntax-oriented transformations that are going to inspect
the clang AST node they get back, rather than only applying matchers and
rewrites, will need to be aware that they might get an implicit Expr* node.
But once you're looking at the Expr* yourself, you've stepped out of the
"beginner" area and are going to need to know more about how the Clang AST
works in general.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200618/c5c7cfe1/attachment.html>


More information about the cfe-dev mailing list