[cfe-dev] RFC: Easier AST Matching by Default

Fri May 8 07:47:17 PDT 2020

Manuel~

I favor skipping invisible nodes by default.  It produces more intuitive
results.

Matt

On Fri, May 8, 2020 at 6:55 AM Manuel Klimek <klimek at google.com> wrote:

> Update: Stephen updated the proposal with the results / fallout of making
> the change on clang-tidy's matchers. I believe that these results indicate
> that the risk of that change on downstream users is small (5 of > 300
> matchers needed fixes).
> I'd be curious whether others disagree with that assessment.
>
> On Sun, Jan 26, 2020 at 10:37 PM Dmitri Gribenko via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> On Sun, Jan 26, 2020 at 8:54 PM Stephen Kelly <steveire at gmail.com> wrote:
>>
>>>
>>> On 17/01/2020 12:54, Dmitri Gribenko wrote:
>>>
>>> On Fri, Jan 17, 2020 at 12:41 PM Stephen Kelly <steveire at gmail.com>
>>> wrote:
>>>
>>>>
>>>> On 13/01/2020 16:39, Dmitri Gribenko wrote:
>>>>
>>>> On Fri, Jan 10, 2020 at 4:34 PM Aaron Ballman via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> I've not seen much to suggest the community is not in favor of this
>>>>> direction, so I think you're all set to post patches for review. If
>>>>> there are community concerns, we can address them during the review
>>>>> process. Thank you for working on this!
>>>>>
>>>>
>>>> Sorry, I just came back from vacation. I'm not sure if it is a good
>>>> idea, in particular, for the following reasons:
>>>>
>>>> - The document specifies the goals in terms of a solution: "Goal: Users
>>>> should not have to account for implicit AST nodes when writing AST
>>>> Matchers".
>>>>
>>>>
>>>> The goal is to make AST Matchers more accessible/easy to use for
>>>> newcomers/non-experts in the clang AST by no-longer confronting them with
>>>> 100% of the complexity on first use.
>>>>
>>>> I think it's important to bridge that gap. See my EuroLLVM talk for
>>>> more.
>>>>
>>> I totally agree with the goal of making AST Matchers, refactoring, and
>>> tooling more accessible (which is why I'm helping with libraries like
>>> Stencil and Syntax trees).
>>>
>>>
>>> This change affects AST Matchers, but as far as I can see, it does not
>>> affect any other effort.
>>>
>>> I don't see why you mention those initiatives in this and your other
>>> email. We're only talking about AST Matchers. Unless AST Matchers are to be
>>> removed, we should improve them, regardless of Syntax Tree, Stencil and
>>> Transformer.
>>>
>>
>> I mention all of these libraries because people don't use AST matchers
>> just for the sake of it. People use AST matchers to achieve a high-level
>> goal. Therefore, we should be thinking of the whole user journey, not just
>> the individual piece. My point is that, we are working on better solutions
>> for the whole user journey. Those better solutions will allow to achieve
>> the same high-level goals more easily than today's tools, and (I think)
>> even more easily than *any* modification restricted to just AST matchers
>> would have allowed us to do.
>>
>>>
>>> However the goal that I quoted from the proposal on this thread ("Goal:
>>> Users should not have to account for implicit AST nodes when writing AST
>>> Matchers") constrains the solution to exactly what is being proposed.
>>>
>>>
>>> Perhaps there's a misunderstanding. That's the goal of changing the
>>> default.
>>>
>> The goal should be something high-level that the user cares about, for
>> example, "more easily write refactoring tools". Individual technical
>> details are not meaningful as goals, because they don't have intrinsic
>> value from the perspective of end users, maintainers of Clang, maintainers
>> of libraries etc.
>>
>> How do we evaluate whether the proposal to change the default traversal
>> is good or bad, if changing the default *is* the goal? It becomes a
>> circular argument: we want to change it because we want it so. The goal
>> should be stated in terms of a problem that the user is trying to solve.
>>
>>>
>>> - The complexity for implementers (although it also applies to
>>>> https://reviews.llvm.org/D61837, which landed a while ago, and I
>>>> didn't see it). Allowing to run the same matcher in multiple modes
>>>> increases our support and testing matrix.
>>>>
>>>>
>>>> I don't think it increases the test matrix more than any other feature.
>>>> Can you expand on this?
>>>>
>>> https://reviews.llvm.org/D61837 makes it possible to run every matcher
>>> with different traversal semantics. Our current tests only test one mode
>>> for each matcher, the default mode. There are two ways to define custom
>>> matchers: the advanced way (through the AST_MATCHER macro) that is needed
>>> when you want to traverse the AST yourself, and a simpler way -- by
>>> combining existing matchers.
>>>
>>>
>>> AFAICT, defining a matcher using the macro, or defining it as a
>>> composition function doesn't change much?
>>>
>> When you define a matcher using a macro, you usually traverse the AST
>> yourself, by directly using Clang AST APIs. However, when you define a
>> matcher as a composition of other matchers, the user of your matcher can
>> affect the traversal mode *within your matcher* by calling your matcher
>> within traverse(). In other words, there is no abstraction barrier between
>> the matcher and the user.
>>
>>> Consider the `makeIteratorLoopMatcher` defined in
>>> clang-tools-extra/clang-tidy/modernize/LoopConvertCheck.cpp. The caller can
>>> run the newly made matcher in any traversal mode, even though the matcher
>>> was probably written with the default mode in mind.
>>>
>>> Clang codebase does not have many matchers that are combinations of
>>> existing matchers
>>>
>>>
>>> It does actually - callee() for example. The fact that is uses a macro
>>> doesn't change anything AFAICT. Also alignOfExpr() which does not use a
>>> macro, but I don't see how that changes anything.
>>>
>>> Am I missing something?
>>>
>> I didn't say such matchers *don't exist* in Clang, I said there are not
>> many of them. And yes, these are the types of matchers I am concerned about
>> being affected by traverse().
>>
>> callee and alignOfExpr() are not affected by traverse() because they only
>> match one level of AST nodes -- the call expression, or the alignof
>> expression, and pass down the inner matcher. Had they been traversing
>> multiple levels of AST nodes, they would have been affected by traverse(),
>> if the user called these matchers within traverse().
>>
>>> , but code that uses Clang as a library (and often written by engineers
>>> who are not compiler experts and just need to get some refactoring done)
>>> does that very often in my experience working at Google.
>>>
>>>
>>> I don't think the traverse() matcher changes so much in this area. "Just
>>> need to get some refactoring done" sounds like writing a composition
>>> matcher within a check, not writing it for some library code.
>>>
>> We have lots of engineers who are not Clang experts and write reusable
>> refactoring code.
>>
>>> - The complexity cliff for users. Many non-trivial matchers need to
>>>> explicitly manage implicit AST nodes.
>>>>
>>>>
>>>> You just described the complexity cliff which is (after my change) not
>>>> the first thing a newcomer hits.
>>>>
>>>> It's good to move that cliff back so that the newcomer is not
>>>> confronted with it immediately.
>>>>
>>> As soon as you pretty-print the AST (which I think most people should do
>>> before writing a matcher), you see implicit nodes.
>>>
>>>
>>> If using clang-query, you can change the mode of how clang-query matches
>>> and dumps ASTs. But, in the future you won't need to dump the AST anyway,
>>> because clang-query will just tell you the available matchers and what they
>>> match.
>>>
>>> See my EuroLLVM talk for more, in particular the workflow diagram which
>>> does not include inspecting the AST:
>>> https://steveire.files.wordpress.com/2018/11/all-no-ast.png?w=900&h=590
>>>
>>>
>>> - The cost of ClangTidy. ClangTidy runs all matchers simultaneously in
>>>> one AST traversal. If we implement implicit/no-implicit modes as separate
>>>> AST traversals, ClangTidy would need to run two traversals, one for
>>>> "easier" matchers (skipping implicit code), and one for "expert" matchers
>>>> (traversing implicit code). We would also need to build two parent maps.
>>>>
>>>> As of https://reviews.llvm.org/D73388 we no longer create a parent map
>>> for each traversal kind used at runtime.
>>>
>>> This means I can no longer measure a runtime cost of introducing use of
>>> the traverse() matcher in clang-tidy.
>>>
>> Thank you, that's a great improvement! However, I'm not sure about the
>> absence of runtime cost -- I think it would be more fair to say that there
>> is no cost *right now*. This change adds extra logic
>> (AscendIgnoreUnlessSpelledInSource) that is only called in the case
>> of TK_IgnoreUnlessSpelledInSource traversal. Since most checkers in
>> ClangTidy don't use that mode, we see no increase in runtime cost today.
>> However, once the default is flipped, or once we start
>> using TK_IgnoreUnlessSpelledInSource more, we could see an increased cost
>> of AST matchers compared to the TK_AsIs traversal.
>>
>>> As long as the current AST Matchers APIs exist, they should be improved.
>>>> Perhaps they will be less important in the future when we have Syntax Trees
>>>> for these use-cases instead. However, for now, I don't think the ease of
>>>> use improvement in my patches should be prevented based existence of a
>>>> future alternative system.
>>>>
>>> I agree that we should not block improvements on existence of a future
>>> system. However, each specific change should be evaluated based on its
>>> merits and costs. I think there are non-trivial downsides to offering a
>>> different mode in existing AST matchers, which create maintenance and
>>> testing costs that we will be paying perpetually.
>>>
>>>
>>> We may not achieve your agreement on this change, but I think we have
>>> consensus already that this change should proceed.
>>>
>> I'm not sure how you measured the degree of consensus -- could you
>> clarify?
>>
>> Here's what I see: so far, there have been few people participating on
>> this thread, and outside of the authors of the proposal (you and Aaron),
>> only I have provided direct feedback about supporting/opposing the
>> proposal, other people (Artem, Gabor, and Ilya) only asked questions,
>> provided clarifications, and provided feedback on the syntax trees
>> developments. If you count me as objecting to the traverse() API in
>> general, and not objecting to this change, then there has been no feedback
>> from the community on this proposal at all. If you count me as objecting to
>> this proposal, then there's one person objecting. Based on these
>> observations, I don't see consensus yet, even if you exclude my feedback.
>>
>>>
>>> It instantly creates a huge testing gap for existing AST matchers (which
>>> we currently test only in one mode).
>>>
>>>
>>> The traverse() matcher does not introduce a need to test each AST
>>> matcher in each mode.
>>>
>> Why do you say so? Matchers composed out of other matchers are affected
>> by the traversal mode that the user invokes them in.
>>
>>> It also creates traps for existing code that will be able to start
>>> accidentally calling existing matchers in a traversal mode that they were
>>> not designed for.
>>>
>>>
>>> Yes, true. It is possible to write a matcher which works as someone
>>> expects in one mode but not the other, such as a matcher which specifically
>>> expects to match a implicitCastExpr().
>>>
>> Exactly.
>>
>>> This can matter for matchers which are in "libraries", but not for
>>> matchers which are implementation details inside checks. Unless a
>>> particular check exposes the traversal kind as an option, those matchers
>>> can not be affected. Any check which does expose that as an option has to
>>> do so consciously and explicitly and has to test it. Nothing new there.
>>>
>> I'm sorry if I have been unclear about this point, but during this whole
>> conversation I have been talking not about specific ClangTidy checkers, but
>> about libraries that provide matchers. Users of Clang (for example,
>> internal users at Google) have such libraries, and use them broadly outside
>> of ClangTidy. Previously in this thread I pointed at a matcher in ClangTidy
>> as an example because that's the only piece of open source code I could
>> easily find, that has similar complexity to libraries that we have at
>> Google. Introduction of traverse() has increased the testing matrix for
>> these matchers, so it is a new increase in complexity, and increase in
>> technical debt (missing tests).
>>
>> Let me reiterate: I understand that Clang or Clang Tooling does not
>> provide a stable API, so making breaking changes here is totally within the
>> policy.
>>
>> However, the specific aspects of this change (silently breaking existing
>> matchers, creating new traps for people who write reusable AST matchers,
>> increasing the testing matrix) have consequences that are very different
>> from a typical API change in Clang AST, where the user of that API will see
>> a build break, and can easily locally fix code. The cost of this change is
>> much higher for users of the API being changed.
>>
>> Let me point you at similar change that was done previously: the change
>> in elidable constructor representation in Clang AST between C++14 and
>> C++17. (TL;DR: AST built by Clang in C++11 mode has AST nodes for elidable
>> constructor calls, but in C++17 mode they are always elided.) Lots of
>> ClangTidy checkers were broken by this change, but nobody noticed, because
>> ClangTidy was not being tested in all language modes. I closed this testing
>> gap (https://reviews.llvm.org/D62125), and our team has fixed a bunch of
>> ClangTidy checkers, but other checkers still remain broken until this day
>> (run `git grep 'FIXME: Fix the checker to work in C++17 mode.'`) The worst
>> part of this story is that the breakage was noticed a long, long time after
>> the AST change was implemented, and that the cost of the change (expanding
>> testing, fixing checkers) has fallen onto people other than the ones who
>> changed the AST.
>>
>> The proposal has been implying (correct me if I'm wrong) that the
>> maintenance cost and complexity that is being put onto experts with a
>> codebase that uses AST matchers that they maintain long-term is worth the
>> added convenience for novice users who write one-off tools that don't need
>> to be maintained. If it is indeed the case, I would prefer this argument to
>> be made explicitly (as in, the proposal should describe the downsides of
>> making this change; right now it does not describe them).
>>
>> Dmitri
>>
>> --
>> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
>> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200508/cdf4b07d/attachment-0001.html>