[cfe-dev] RFC: Easier AST Matching by Default

Mon Jun 22 06:54:09 PDT 2020

Manuel~

I am having trouble reconstructing the discussion and proposal from just
the email chain here.  Is there a proposal link I can read?

Perhaps I can explain my desires a bit more clearly and it will make better
sense what I am hoping to get out of matchers.

Matchers often trigger on the desugared forms of things that really seem
unusual.

struct Foo : Bar {
  Foo() = default;  // <-- the `t` here will trigger for a bunch of TypeLoc
related things that have to do with what is inside the Bar
};

void foo() {
  int i;
  auto a = [i] () {};  // <-- the lambda here will trigger for a full
struct definition with field decls (simulatenously there are no lambda
matchers to match the i in the capture list.
}

Similarly, things one might expect to match end up not match because if
implicit cast expressions.

Someone in the above thread said:

if by default we look through non-matching implicit nodes while looking for
> a matching node (so long as we don't look past a matching node just because
> it's implicit).

which seems closer to what I want.

Matt

On Mon, Jun 22, 2020 at 6:38 AM Sam McCall <sammccall at google.com> wrote:

> I do wish I'd been more involved in this conversation. Having missed the
> boat on it the first time around, I wasn't sure how appropriate it was to
> throw in opinions (outside of working out how to deal with the immediate
> churn). I also think it's fair to say that I've been a bit put off by the
> tone at times.
>
> > Sam I believe raised a similar idea as has been brought up in this
> thread, that if we want a syntactic matching mode, we should wait for
> SyntaxTrees to land
> This isn't just a question of waiting, someone needs to decide it's a top
> priority and build it out. I think it's a great opportunity, and so are a
> lot of other things.
>
> In a broader sense, this change feels to me more like replacing rather
> than adding something, and (with hindsight) I think it's not the best point
> on the continuum.
> The new "major" modes create a gulf that feels to me like "matchers v1"
> and "matchers v2", and the change has prioritized vocal people unsatisfied
> with the status quo over those who are happy and invested in the current
> model.
> Building out syntax trees and defining matchers for them might be too much
> ocean to boil. But the new option could have been introduced with an API
> that puts less emphasis on defaults and more on options, for example. And
> could have names reflecting a careful design around (as far as possible)
> syntactic vs semantic matching, rather than describing the heuristics that
> the "matchers v2" proponents think are best.
>
> There's also a simple value conflict here between people placing different
> values on (code and conceptual) stability.
> From where I sit, requiring opt-in to newly-introduced behavior, or
> forcing an explicit choice one way or the other, would be a small
> concession to make to avoid silently changing the meaning of 8-year-old
> code and silently invalidating mental models.
> I understand people have different motivations and situations, and not
> everyone wants to make any concession to that at all.
>
> On Mon, Jun 22, 2020 at 11:20 AM Manuel Klimek <klimek at google.com> wrote:
>
>> Pulling in more folks who had opinions in the original decision.
>> (y'all let me know if I misrepresented somebody's stance here)
>>
>> Matt was the strongest one in favor of the new default mode, I believe,
>> and he argued that this mode made several of his matchers more precise that
>> were previously basically buggy, so I'd especially be interested whether
>> your view changed based on this thread, Matt.
>>
>> Sam I believe raised a similar idea as has been brought up in this
>> thread, that if we want a syntactic matching mode, we should wait for
>> SyntaxTrees to land before making matchers that work on SyntaxTrees.
>>
>> Yitzhak was somewhere in between iirc.
>>
>> Dmitri iiuc mainly would have liked a more thorough analysis of the
>> possible actions before we switch.
>>
>> I personally was convinced about the new default mode by Matt's apparent
>> excitement, as well as the data from public clang-tidy matchers. I do think
>> that I made a mistake by not pulling in the right set of folks earlier,
>> though, for which I'm sorry :(
>>
>> On Mon, Jun 22, 2020 at 8:52 AM Richard Smith via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> On Sun, 21 Jun 2020 at 13:51, Stephen Kelly via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> On 21/06/2020 19:59, Richard Smith wrote:
>>>>
>>>> I think if we want to expose a syntactic mode, we should do that with a
>>>> set of syntactic matchers (eg, a matcher that matches parenthesized
>>>> initialization). Suppose someone wants to match all direct initialization.
>>>> Right now, they need to do lots of checks: for a non-list, non-implicit
>>>> cxxConstructExpr, for a varDecl whose initialization kind is direct init,
>>>> for a cxxTemporaryObjectExpr, for a functionalCastExpr, and there'll likely
>>>> be other kinds that I forgot and more added in the future.
>>>>
>>>>
>>>> I agree there are missing matchers. I'd like to add them, but I don't
>>>> think that's enough to make the matchers framework easier to use for
>>>> newcomers.
>>>>
>>>>
>>>>
>>>> I don't think it's intuitive to call IgnoreUnlessSpelledInSource mode
>>>> Syntactic, because it isn't really that -- it doesn't let you match syntax,
>>>> it lets you match semantics-with-associated-tokens (and even that is only
>>>> an approximate description).
>>>>
>>>>
>>>> Ok.
>>>>
>>>>
>>>> I think it would be really interesting to add a way to actually match
>>>> syntax in a semantics-free way, but it seems like a big project.
>>>>
>>>> Also the mode you're calling Semantic is also not a semantic match,
>>>> because it also matches syntax-only nodes. (It doesn't implicitly
>>>> IgnoreParens or anything like that.)
>>>>
>>>> That said, I think neither Semantic nor Syntactic is the right default.
>>>> By default we should be assuming that matchers want to match a combination
>>>> of syntax and semantics -- usually a matcher will want to key off some
>>>> semantic effect obtained in a particular syntactic context.
>>>>
>>>>
>>>> I'm not certain I agree about that, or that it is what a newcomer
>>>> wants, but ok.
>>>>
>>>>
>>>> I think the best refinement for now would be to restore the
>>>>>> CXXConstructExpr in the case of a varDecl initializer, if that is possible
>>>>>> (may not be).
>>>>>>
>>>>> I think that is addressing a symptom rather than the cause. I think
>>>>> the root cause is that a matcher that is explicitly asking to match a
>>>>> certain implicit (or sometimes-implicit) AST node does not match.
>>>>>
>>>>>
>>>>> That is the purpose of AsIs/Semantic mode if I understood you
>>>>> correctly.
>>>>>
>>>>> The intention of the IgnoreUnlessSpelledInSource/Syntactic node is to
>>>>> *not* match certain implicit (or sometimes-implicit) AST nodes.
>>>>>
>>>>>
>>>>> For example, I would expect that implicitCastExpr() *never* matches
>>>>> under the new default behavior.
>>>>>
>>>>>
>>>>> That is correct.
>>>>>
>>>>>
>>>>> And I think that users, and especially beginner users, will see that
>>>>> as being simply broken -- if someone tries to write an implicitCastExpr()
>>>>> matcher, it's obvious that they want to match implicit casts, and we are
>>>>> not doing the user any favors by making that matcher not match by default.
>>>>>
>>>>>
>>>>> Hmm, if someone is the kind of newcomer that they've never encountered
>>>>> a CallExpr, FunctionDecl or any other AST node in their life before, I
>>>>> don't see why implicitCastExpr() would be the first, or one of the first,
>>>>> things they try.
>>>>>
>>>> Perhaps because they want to match an implicit cast, and they find it
>>>> in the documentation. Perhaps because they read one of the guides that says
>>>> "look at the AST dump and write matchers to match what you see there".
>>>>
>>>>
>>>> I'm not sure which guide says that, but perhaps it should be updated to
>>>> point people at clang-query. At least the guide I wrote does that:
>>>> https://devblogs.microsoft.com/cppblog/exploring-clang-tooling-part-2-examining-the-clang-ast-with-clang-query/
>>>>
>>>>
>>>> There may be different/multiple levels of newcomer that we each have in
>>>>> mind here.
>>>>>
>>>>> If someone is a kind of newcomer who wants to match on an
>>>>> implicitCastExpr(), then they're probably also the kind of newcomer who
>>>>> knows that's an implicit node and they should use Semantic node instead of
>>>>> Syntactic mode. Do you agree?
>>>>>
>>>> No. I think it's bad API design to have any mode in which
>>>> implicitCastExpr compiles but doesn't ever match.
>>>>
>>>>
>>>> Hmm, yes. Perhaps if the IgnoreUnlessSpelledInSource mode survives it
>>>> should reject a matcher like that.
>>>>
>>>> ----
>>>> You've suggested a different behavior for matchers which I don't think
>>>> anyone is working on (the design of or the implementation of).
>>>>
>>>> I continue to think the current behavior is sufficiently motivated by
>>>> the examples in the RFC.
>>>>
>>>> But, there's still tension about it.
>>>>
>>>> So, where to from here?
>>>>
>>>> Does the default have to be changed back to AsIs? Does
>>>> IgnoreUnlessSpelledInSource have to be removed? Does the traverse() matcher
>>>> have to be removed?
>>>>
>>> I would like to hear the opinions of others on these questions. I think
>>> we've both described our perspectives and made our cases.
>>>
>>>
>>> Just expanding on my position from earlier in this thread slightly: to
>>> directly address the "where to from here?" question, I would like us to get
>>> back to the state where, *in our default mode, the matcher for AST node
>>> X is always able to match all instances of AST node X*. I think there
>>> are various options for how we get there that seem acceptable, and that
>>> don't sacrifice your noble goals of making AST matching easier and more
>>> intuitive. In particular, I think it's fine (and probably good) if by
>>> default we look through non-matching implicit nodes while looking for a
>>> matching node (so long as we don't look past a matching node just because
>>> it's implicit). And I think it's fine (and probably good) to expose an easy
>>> way to explicitly control whether we automatically look through implicit
>>> nodes or not. But I think if no-one is prepared to do the work to make our
>>> default mode satisfy the above property while still looking through
>>> implicit nodes if (and only if) they don't match, then I think we should
>>> change the default back to the state we had before. (I don't have much of
>>> an opinion on whether to keep or remove 'traverse' and its various modes,
>>> other than that we've already caused a substantial amount of churn with the
>>> changes thus far, and removing them again would cause further churn.)
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200622/4c42b829/attachment-0001.html>