[cfe-dev] RFC: Easier AST Matching by Default

Mon May 11 06:19:28 PDT 2020

On Mon, May 11, 2020 at 2:50 PM Aaron Ballman <aaron at aaronballman.com>
wrote:

> On Fri, May 8, 2020 at 6:55 AM Manuel Klimek via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > Update: Stephen updated the proposal with the results / fallout of
> making the change on clang-tidy's matchers. I believe that these results
> indicate that the risk of that change on downstream users is small (5 of >
> 300 matchers needed fixes).
> > I'd be curious whether others disagree with that assessment.
>
> No disagreement from me.
>

Given there seems to be consensus on this being a good thing, and me
checking with Dmitri that he doesn't have serious concerns (deferring to my
judgement), this proposal looks good to me.
I believe Aaron volunteered to do detailed reviews on specific patches
(thanks, Aaron, otherwise let me know if I misunderstood!!)

>
> ~Aaron
>
> >
> > On Sun, Jan 26, 2020 at 10:37 PM Dmitri Gribenko via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> >>
> >> On Sun, Jan 26, 2020 at 8:54 PM Stephen Kelly <steveire at gmail.com>
> wrote:
> >>>
> >>>
> >>> On 17/01/2020 12:54, Dmitri Gribenko wrote:
> >>>
> >>> On Fri, Jan 17, 2020 at 12:41 PM Stephen Kelly <steveire at gmail.com>
> wrote:
> >>>>
> >>>>
> >>>> On 13/01/2020 16:39, Dmitri Gribenko wrote:
> >>>>
> >>>> On Fri, Jan 10, 2020 at 4:34 PM Aaron Ballman via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> >>>>>
> >>>>> I've not seen much to suggest the community is not in favor of this
> >>>>> direction, so I think you're all set to post patches for review. If
> >>>>> there are community concerns, we can address them during the review
> >>>>> process. Thank you for working on this!
> >>>>
> >>>>
> >>>> Sorry, I just came back from vacation. I'm not sure if it is a good
> idea, in particular, for the following reasons:
> >>>>
> >>>> - The document specifies the goals in terms of a solution: "Goal:
> Users should not have to account for implicit AST nodes when writing AST
> Matchers".
> >>>>
> >>>>
> >>>> The goal is to make AST Matchers more accessible/easy to use for
> newcomers/non-experts in the clang AST by no-longer confronting them with
> 100% of the complexity on first use.
> >>>>
> >>>> I think it's important to bridge that gap. See my EuroLLVM talk for
> more.
> >>>
> >>> I totally agree with the goal of making AST Matchers, refactoring, and
> tooling more accessible (which is why I'm helping with libraries like
> Stencil and Syntax trees).
> >>>
> >>>
> >>> This change affects AST Matchers, but as far as I can see, it does not
> affect any other effort.
> >>>
> >>> I don't see why you mention those initiatives in this and your other
> email. We're only talking about AST Matchers. Unless AST Matchers are to be
> removed, we should improve them, regardless of Syntax Tree, Stencil and
> Transformer.
> >>
> >>
> >> I mention all of these libraries because people don't use AST matchers
> just for the sake of it. People use AST matchers to achieve a high-level
> goal. Therefore, we should be thinking of the whole user journey, not just
> the individual piece. My point is that, we are working on better solutions
> for the whole user journey. Those better solutions will allow to achieve
> the same high-level goals more easily than today's tools, and (I think)
> even more easily than *any* modification restricted to just AST matchers
> would have allowed us to do.
> >>>
> >>>
> >>> However the goal that I quoted from the proposal on this thread
> ("Goal: Users should not have to account for implicit AST nodes when
> writing AST Matchers") constrains the solution to exactly what is being
> proposed.
> >>>
> >>>
> >>> Perhaps there's a misunderstanding. That's the goal of changing the
> default.
> >>
> >> The goal should be something high-level that the user cares about, for
> example, "more easily write refactoring tools". Individual technical
> details are not meaningful as goals, because they don't have intrinsic
> value from the perspective of end users, maintainers of Clang, maintainers
> of libraries etc.
> >>
> >> How do we evaluate whether the proposal to change the default traversal
> is good or bad, if changing the default *is* the goal? It becomes a
> circular argument: we want to change it because we want it so. The goal
> should be stated in terms of a problem that the user is trying to solve.
> >>>
> >>>
> >>>> - The complexity for implementers (although it also applies to
> https://reviews.llvm.org/D61837, which landed a while ago, and I didn't
> see it). Allowing to run the same matcher in multiple modes increases our
> support and testing matrix.
> >>>>
> >>>>
> >>>> I don't think it increases the test matrix more than any other
> feature. Can you expand on this?
> >>>
> >>> https://reviews.llvm.org/D61837 makes it possible to run every
> matcher with different traversal semantics. Our current tests only test one
> mode for each matcher, the default mode. There are two ways to define
> custom matchers: the advanced way (through the AST_MATCHER macro) that is
> needed when you want to traverse the AST yourself, and a simpler way -- by
> combining existing matchers.
> >>>
> >>>
> >>> AFAICT, defining a matcher using the macro, or defining it as a
> composition function doesn't change much?
> >>
> >> When you define a matcher using a macro, you usually traverse the AST
> yourself, by directly using Clang AST APIs. However, when you define a
> matcher as a composition of other matchers, the user of your matcher can
> affect the traversal mode *within your matcher* by calling your matcher
> within traverse(). In other words, there is no abstraction barrier between
> the matcher and the user.
> >>>
> >>> Consider the `makeIteratorLoopMatcher` defined in
> clang-tools-extra/clang-tidy/modernize/LoopConvertCheck.cpp. The caller can
> run the newly made matcher in any traversal mode, even though the matcher
> was probably written with the default mode in mind.
> >>>
> >>> Clang codebase does not have many matchers that are combinations of
> existing matchers
> >>>
> >>>
> >>> It does actually - callee() for example. The fact that is uses a macro
> doesn't change anything AFAICT. Also alignOfExpr() which does not use a
> macro, but I don't see how that changes anything.
> >>>
> >>> Am I missing something?
> >>
> >> I didn't say such matchers *don't exist* in Clang, I said there are not
> many of them. And yes, these are the types of matchers I am concerned about
> being affected by traverse().
> >>
> >> callee and alignOfExpr() are not affected by traverse() because they
> only match one level of AST nodes -- the call expression, or the alignof
> expression, and pass down the inner matcher. Had they been traversing
> multiple levels of AST nodes, they would have been affected by traverse(),
> if the user called these matchers within traverse().
> >>>
> >>> , but code that uses Clang as a library (and often written by
> engineers who are not compiler experts and just need to get some
> refactoring done) does that very often in my experience working at Google.
> >>>
> >>>
> >>> I don't think the traverse() matcher changes so much in this area.
> "Just need to get some refactoring done" sounds like writing a composition
> matcher within a check, not writing it for some library code.
> >>
> >> We have lots of engineers who are not Clang experts and write reusable
> refactoring code.
> >>>>
> >>>> - The complexity cliff for users. Many non-trivial matchers need to
> explicitly manage implicit AST nodes.
> >>>>
> >>>>
> >>>> You just described the complexity cliff which is (after my change)
> not the first thing a newcomer hits.
> >>>>
> >>>> It's good to move that cliff back so that the newcomer is not
> confronted with it immediately.
> >>>
> >>> As soon as you pretty-print the AST (which I think most people should
> do before writing a matcher), you see implicit nodes.
> >>>
> >>>
> >>> If using clang-query, you can change the mode of how clang-query
> matches and dumps ASTs. But, in the future you won't need to dump the AST
> anyway, because clang-query will just tell you the available matchers and
> what they match.
> >>>
> >>> See my EuroLLVM talk for more, in particular the workflow diagram
> which does not include inspecting the AST:
> https://steveire.files.wordpress.com/2018/11/all-no-ast.png?w=900&h=590
> >>>
> >>>
> >>>> - The cost of ClangTidy. ClangTidy runs all matchers simultaneously
> in one AST traversal. If we implement implicit/no-implicit modes as
> separate AST traversals, ClangTidy would need to run two traversals, one
> for "easier" matchers (skipping implicit code), and one for "expert"
> matchers (traversing implicit code). We would also need to build two parent
> maps.
> >>>
> >>> As of https://reviews.llvm.org/D73388 we no longer create a parent
> map for each traversal kind used at runtime.
> >>>
> >>> This means I can no longer measure a runtime cost of introducing use
> of the traverse() matcher in clang-tidy.
> >>
> >> Thank you, that's a great improvement! However, I'm not sure about the
> absence of runtime cost -- I think it would be more fair to say that there
> is no cost *right now*. This change adds extra logic
> (AscendIgnoreUnlessSpelledInSource) that is only called in the case of
> TK_IgnoreUnlessSpelledInSource traversal. Since most checkers in ClangTidy
> don't use that mode, we see no increase in runtime cost today. However,
> once the default is flipped, or once we start using
> TK_IgnoreUnlessSpelledInSource more, we could see an increased cost of AST
> matchers compared to the TK_AsIs traversal.
> >>>>
> >>>> As long as the current AST Matchers APIs exist, they should be
> improved. Perhaps they will be less important in the future when we have
> Syntax Trees for these use-cases instead. However, for now, I don't think
> the ease of use improvement in my patches should be prevented based
> existence of a future alternative system.
> >>>
> >>> I agree that we should not block improvements on existence of a future
> system. However, each specific change should be evaluated based on its
> merits and costs. I think there are non-trivial downsides to offering a
> different mode in existing AST matchers, which create maintenance and
> testing costs that we will be paying perpetually.
> >>>
> >>>
> >>> We may not achieve your agreement on this change, but I think we have
> consensus already that this change should proceed.
> >>
> >> I'm not sure how you measured the degree of consensus -- could you
> clarify?
> >>
> >> Here's what I see: so far, there have been few people participating on
> this thread, and outside of the authors of the proposal (you and Aaron),
> only I have provided direct feedback about supporting/opposing the
> proposal, other people (Artem, Gabor, and Ilya) only asked questions,
> provided clarifications, and provided feedback on the syntax trees
> developments. If you count me as objecting to the traverse() API in
> general, and not objecting to this change, then there has been no feedback
> from the community on this proposal at all. If you count me as objecting to
> this proposal, then there's one person objecting. Based on these
> observations, I don't see consensus yet, even if you exclude my feedback.
> >>>
> >>>
> >>> It instantly creates a huge testing gap for existing AST matchers
> (which we currently test only in one mode).
> >>>
> >>>
> >>> The traverse() matcher does not introduce a need to test each AST
> matcher in each mode.
> >>
> >> Why do you say so? Matchers composed out of other matchers are affected
> by the traversal mode that the user invokes them in.
> >>>
> >>> It also creates traps for existing code that will be able to start
> accidentally calling existing matchers in a traversal mode that they were
> not designed for.
> >>>
> >>>
> >>> Yes, true. It is possible to write a matcher which works as someone
> expects in one mode but not the other, such as a matcher which specifically
> expects to match a implicitCastExpr().
> >>
> >> Exactly.
> >>>
> >>> This can matter for matchers which are in "libraries", but not for
> matchers which are implementation details inside checks. Unless a
> particular check exposes the traversal kind as an option, those matchers
> can not be affected. Any check which does expose that as an option has to
> do so consciously and explicitly and has to test it. Nothing new there.
> >>
> >> I'm sorry if I have been unclear about this point, but during this
> whole conversation I have been talking not about specific ClangTidy
> checkers, but about libraries that provide matchers. Users of Clang (for
> example, internal users at Google) have such libraries, and use them
> broadly outside of ClangTidy. Previously in this thread I pointed at a
> matcher in ClangTidy as an example because that's the only piece of open
> source code I could easily find, that has similar complexity to libraries
> that we have at Google. Introduction of traverse() has increased the
> testing matrix for these matchers, so it is a new increase in complexity,
> and increase in technical debt (missing tests).
> >>
> >> Let me reiterate: I understand that Clang or Clang Tooling does not
> provide a stable API, so making breaking changes here is totally within the
> policy.
> >>
> >> However, the specific aspects of this change (silently breaking
> existing matchers, creating new traps for people who write reusable AST
> matchers, increasing the testing matrix) have consequences that are very
> different from a typical API change in Clang AST, where the user of that
> API will see a build break, and can easily locally fix code. The cost of
> this change is much higher for users of the API being changed.
> >>
> >> Let me point you at similar change that was done previously: the change
> in elidable constructor representation in Clang AST between C++14 and
> C++17. (TL;DR: AST built by Clang in C++11 mode has AST nodes for elidable
> constructor calls, but in C++17 mode they are always elided.) Lots of
> ClangTidy checkers were broken by this change, but nobody noticed, because
> ClangTidy was not being tested in all language modes. I closed this testing
> gap (https://reviews.llvm.org/D62125), and our team has fixed a bunch of
> ClangTidy checkers, but other checkers still remain broken until this day
> (run `git grep 'FIXME: Fix the checker to work in C++17 mode.'`) The worst
> part of this story is that the breakage was noticed a long, long time after
> the AST change was implemented, and that the cost of the change (expanding
> testing, fixing checkers) has fallen onto people other than the ones who
> changed the AST.
> >>
> >> The proposal has been implying (correct me if I'm wrong) that the
> maintenance cost and complexity that is being put onto experts with a
> codebase that uses AST matchers that they maintain long-term is worth the
> added convenience for novice users who write one-off tools that don't need
> to be maintained. If it is indeed the case, I would prefer this argument to
> be made explicitly (as in, the proposal should describe the downsides of
> making this change; right now it does not describe them).
> >>
> >> Dmitri
> >>
> >> --
> >> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> >> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
> >> _______________________________________________
> >> cfe-dev mailing list
> >> cfe-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200511/63a7e5e2/attachment-0001.html>