[cfe-dev] RFC: Easier AST Matching by Default

Manuel Klimek via cfe-dev cfe-dev at lists.llvm.org
Fri May 8 03:55:19 PDT 2020


Update: Stephen updated the proposal with the results / fallout of making
the change on clang-tidy's matchers. I believe that these results indicate
that the risk of that change on downstream users is small (5 of > 300
matchers needed fixes).
I'd be curious whether others disagree with that assessment.

On Sun, Jan 26, 2020 at 10:37 PM Dmitri Gribenko via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On Sun, Jan 26, 2020 at 8:54 PM Stephen Kelly <steveire at gmail.com> wrote:
>
>>
>> On 17/01/2020 12:54, Dmitri Gribenko wrote:
>>
>> On Fri, Jan 17, 2020 at 12:41 PM Stephen Kelly <steveire at gmail.com>
>> wrote:
>>
>>>
>>> On 13/01/2020 16:39, Dmitri Gribenko wrote:
>>>
>>> On Fri, Jan 10, 2020 at 4:34 PM Aaron Ballman via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> I've not seen much to suggest the community is not in favor of this
>>>> direction, so I think you're all set to post patches for review. If
>>>> there are community concerns, we can address them during the review
>>>> process. Thank you for working on this!
>>>>
>>>
>>> Sorry, I just came back from vacation. I'm not sure if it is a good
>>> idea, in particular, for the following reasons:
>>>
>>> - The document specifies the goals in terms of a solution: "Goal: Users
>>> should not have to account for implicit AST nodes when writing AST
>>> Matchers".
>>>
>>>
>>> The goal is to make AST Matchers more accessible/easy to use for
>>> newcomers/non-experts in the clang AST by no-longer confronting them with
>>> 100% of the complexity on first use.
>>>
>>> I think it's important to bridge that gap. See my EuroLLVM talk for more.
>>>
>> I totally agree with the goal of making AST Matchers, refactoring, and
>> tooling more accessible (which is why I'm helping with libraries like
>> Stencil and Syntax trees).
>>
>>
>> This change affects AST Matchers, but as far as I can see, it does not
>> affect any other effort.
>>
>> I don't see why you mention those initiatives in this and your other
>> email. We're only talking about AST Matchers. Unless AST Matchers are to be
>> removed, we should improve them, regardless of Syntax Tree, Stencil and
>> Transformer.
>>
>
> I mention all of these libraries because people don't use AST matchers
> just for the sake of it. People use AST matchers to achieve a high-level
> goal. Therefore, we should be thinking of the whole user journey, not just
> the individual piece. My point is that, we are working on better solutions
> for the whole user journey. Those better solutions will allow to achieve
> the same high-level goals more easily than today's tools, and (I think)
> even more easily than *any* modification restricted to just AST matchers
> would have allowed us to do.
>
>>
>> However the goal that I quoted from the proposal on this thread ("Goal:
>> Users should not have to account for implicit AST nodes when writing AST
>> Matchers") constrains the solution to exactly what is being proposed.
>>
>>
>> Perhaps there's a misunderstanding. That's the goal of changing the
>> default.
>>
> The goal should be something high-level that the user cares about, for
> example, "more easily write refactoring tools". Individual technical
> details are not meaningful as goals, because they don't have intrinsic
> value from the perspective of end users, maintainers of Clang, maintainers
> of libraries etc.
>
> How do we evaluate whether the proposal to change the default traversal is
> good or bad, if changing the default *is* the goal? It becomes a circular
> argument: we want to change it because we want it so. The goal should be
> stated in terms of a problem that the user is trying to solve.
>
>>
>> - The complexity for implementers (although it also applies to
>>> https://reviews.llvm.org/D61837, which landed a while ago, and I didn't
>>> see it). Allowing to run the same matcher in multiple modes increases our
>>> support and testing matrix.
>>>
>>>
>>> I don't think it increases the test matrix more than any other feature.
>>> Can you expand on this?
>>>
>> https://reviews.llvm.org/D61837 makes it possible to run every matcher
>> with different traversal semantics. Our current tests only test one mode
>> for each matcher, the default mode. There are two ways to define custom
>> matchers: the advanced way (through the AST_MATCHER macro) that is needed
>> when you want to traverse the AST yourself, and a simpler way -- by
>> combining existing matchers.
>>
>>
>> AFAICT, defining a matcher using the macro, or defining it as a
>> composition function doesn't change much?
>>
> When you define a matcher using a macro, you usually traverse the AST
> yourself, by directly using Clang AST APIs. However, when you define a
> matcher as a composition of other matchers, the user of your matcher can
> affect the traversal mode *within your matcher* by calling your matcher
> within traverse(). In other words, there is no abstraction barrier between
> the matcher and the user.
>
>> Consider the `makeIteratorLoopMatcher` defined in
>> clang-tools-extra/clang-tidy/modernize/LoopConvertCheck.cpp. The caller can
>> run the newly made matcher in any traversal mode, even though the matcher
>> was probably written with the default mode in mind.
>>
>> Clang codebase does not have many matchers that are combinations of
>> existing matchers
>>
>>
>> It does actually - callee() for example. The fact that is uses a macro
>> doesn't change anything AFAICT. Also alignOfExpr() which does not use a
>> macro, but I don't see how that changes anything.
>>
>> Am I missing something?
>>
> I didn't say such matchers *don't exist* in Clang, I said there are not
> many of them. And yes, these are the types of matchers I am concerned about
> being affected by traverse().
>
> callee and alignOfExpr() are not affected by traverse() because they only
> match one level of AST nodes -- the call expression, or the alignof
> expression, and pass down the inner matcher. Had they been traversing
> multiple levels of AST nodes, they would have been affected by traverse(),
> if the user called these matchers within traverse().
>
>> , but code that uses Clang as a library (and often written by engineers
>> who are not compiler experts and just need to get some refactoring done)
>> does that very often in my experience working at Google.
>>
>>
>> I don't think the traverse() matcher changes so much in this area. "Just
>> need to get some refactoring done" sounds like writing a composition
>> matcher within a check, not writing it for some library code.
>>
> We have lots of engineers who are not Clang experts and write reusable
> refactoring code.
>
>> - The complexity cliff for users. Many non-trivial matchers need to
>>> explicitly manage implicit AST nodes.
>>>
>>>
>>> You just described the complexity cliff which is (after my change) not
>>> the first thing a newcomer hits.
>>>
>>> It's good to move that cliff back so that the newcomer is not confronted
>>> with it immediately.
>>>
>> As soon as you pretty-print the AST (which I think most people should do
>> before writing a matcher), you see implicit nodes.
>>
>>
>> If using clang-query, you can change the mode of how clang-query matches
>> and dumps ASTs. But, in the future you won't need to dump the AST anyway,
>> because clang-query will just tell you the available matchers and what they
>> match.
>>
>> See my EuroLLVM talk for more, in particular the workflow diagram which
>> does not include inspecting the AST:
>> https://steveire.files.wordpress.com/2018/11/all-no-ast.png?w=900&h=590
>>
>>
>> - The cost of ClangTidy. ClangTidy runs all matchers simultaneously in
>>> one AST traversal. If we implement implicit/no-implicit modes as separate
>>> AST traversals, ClangTidy would need to run two traversals, one for
>>> "easier" matchers (skipping implicit code), and one for "expert" matchers
>>> (traversing implicit code). We would also need to build two parent maps.
>>>
>>> As of https://reviews.llvm.org/D73388 we no longer create a parent map
>> for each traversal kind used at runtime.
>>
>> This means I can no longer measure a runtime cost of introducing use of
>> the traverse() matcher in clang-tidy.
>>
> Thank you, that's a great improvement! However, I'm not sure about the
> absence of runtime cost -- I think it would be more fair to say that there
> is no cost *right now*. This change adds extra logic
> (AscendIgnoreUnlessSpelledInSource) that is only called in the case
> of TK_IgnoreUnlessSpelledInSource traversal. Since most checkers in
> ClangTidy don't use that mode, we see no increase in runtime cost today.
> However, once the default is flipped, or once we start
> using TK_IgnoreUnlessSpelledInSource more, we could see an increased cost
> of AST matchers compared to the TK_AsIs traversal.
>
>> As long as the current AST Matchers APIs exist, they should be improved.
>>> Perhaps they will be less important in the future when we have Syntax Trees
>>> for these use-cases instead. However, for now, I don't think the ease of
>>> use improvement in my patches should be prevented based existence of a
>>> future alternative system.
>>>
>> I agree that we should not block improvements on existence of a future
>> system. However, each specific change should be evaluated based on its
>> merits and costs. I think there are non-trivial downsides to offering a
>> different mode in existing AST matchers, which create maintenance and
>> testing costs that we will be paying perpetually.
>>
>>
>> We may not achieve your agreement on this change, but I think we have
>> consensus already that this change should proceed.
>>
> I'm not sure how you measured the degree of consensus -- could you clarify?
>
> Here's what I see: so far, there have been few people participating on
> this thread, and outside of the authors of the proposal (you and Aaron),
> only I have provided direct feedback about supporting/opposing the
> proposal, other people (Artem, Gabor, and Ilya) only asked questions,
> provided clarifications, and provided feedback on the syntax trees
> developments. If you count me as objecting to the traverse() API in
> general, and not objecting to this change, then there has been no feedback
> from the community on this proposal at all. If you count me as objecting to
> this proposal, then there's one person objecting. Based on these
> observations, I don't see consensus yet, even if you exclude my feedback.
>
>>
>> It instantly creates a huge testing gap for existing AST matchers (which
>> we currently test only in one mode).
>>
>>
>> The traverse() matcher does not introduce a need to test each AST matcher
>> in each mode.
>>
> Why do you say so? Matchers composed out of other matchers are affected by
> the traversal mode that the user invokes them in.
>
>> It also creates traps for existing code that will be able to start
>> accidentally calling existing matchers in a traversal mode that they were
>> not designed for.
>>
>>
>> Yes, true. It is possible to write a matcher which works as someone
>> expects in one mode but not the other, such as a matcher which specifically
>> expects to match a implicitCastExpr().
>>
> Exactly.
>
>> This can matter for matchers which are in "libraries", but not for
>> matchers which are implementation details inside checks. Unless a
>> particular check exposes the traversal kind as an option, those matchers
>> can not be affected. Any check which does expose that as an option has to
>> do so consciously and explicitly and has to test it. Nothing new there.
>>
> I'm sorry if I have been unclear about this point, but during this whole
> conversation I have been talking not about specific ClangTidy checkers, but
> about libraries that provide matchers. Users of Clang (for example,
> internal users at Google) have such libraries, and use them broadly outside
> of ClangTidy. Previously in this thread I pointed at a matcher in ClangTidy
> as an example because that's the only piece of open source code I could
> easily find, that has similar complexity to libraries that we have at
> Google. Introduction of traverse() has increased the testing matrix for
> these matchers, so it is a new increase in complexity, and increase in
> technical debt (missing tests).
>
> Let me reiterate: I understand that Clang or Clang Tooling does not
> provide a stable API, so making breaking changes here is totally within the
> policy.
>
> However, the specific aspects of this change (silently breaking existing
> matchers, creating new traps for people who write reusable AST matchers,
> increasing the testing matrix) have consequences that are very different
> from a typical API change in Clang AST, where the user of that API will see
> a build break, and can easily locally fix code. The cost of this change is
> much higher for users of the API being changed.
>
> Let me point you at similar change that was done previously: the change in
> elidable constructor representation in Clang AST between C++14 and C++17.
> (TL;DR: AST built by Clang in C++11 mode has AST nodes for elidable
> constructor calls, but in C++17 mode they are always elided.) Lots of
> ClangTidy checkers were broken by this change, but nobody noticed, because
> ClangTidy was not being tested in all language modes. I closed this testing
> gap (https://reviews.llvm.org/D62125), and our team has fixed a bunch of
> ClangTidy checkers, but other checkers still remain broken until this day
> (run `git grep 'FIXME: Fix the checker to work in C++17 mode.'`) The worst
> part of this story is that the breakage was noticed a long, long time after
> the AST change was implemented, and that the cost of the change (expanding
> testing, fixing checkers) has fallen onto people other than the ones who
> changed the AST.
>
> The proposal has been implying (correct me if I'm wrong) that the
> maintenance cost and complexity that is being put onto experts with a
> codebase that uses AST matchers that they maintain long-term is worth the
> added convenience for novice users who write one-off tools that don't need
> to be maintained. If it is indeed the case, I would prefer this argument to
> be made explicitly (as in, the proposal should describe the downsides of
> making this change; right now it does not describe them).
>
> Dmitri
>
> --
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200508/530dbe6f/attachment-0001.html>


More information about the cfe-dev mailing list