[cfe-dev] RFC: Easier AST Matching by Default

Sun Jan 26 11:54:11 PST 2020

On 17/01/2020 12:54, Dmitri Gribenko wrote:
> On Fri, Jan 17, 2020 at 12:41 PM Stephen Kelly <steveire at gmail.com 
> <mailto:steveire at gmail.com>> wrote:
>
>
>     On 13/01/2020 16:39, Dmitri Gribenko wrote:
>>     On Fri, Jan 10, 2020 at 4:34 PM Aaron Ballman via cfe-dev
>>     <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>
>>         I've not seen much to suggest the community is not in favor
>>         of this
>>         direction, so I think you're all set to post patches for
>>         review. If
>>         there are community concerns, we can address them during the
>>         review
>>         process. Thank you for working on this!
>>
>>
>>     Sorry, I just came back from vacation. I'm not sure if it is a
>>     good idea, in particular, for the following reasons:
>>
>>     - The document specifies the goals in terms of a solution: "Goal:
>>     Users should not have to account for implicit AST nodes when
>>     writing AST Matchers".
>
>
>     The goal is to make AST Matchers more accessible/easy to use for
>     newcomers/non-experts in the clang AST by no-longer confronting
>     them with 100% of the complexity on first use.
>
>     I think it's important to bridge that gap. See my EuroLLVM talk
>     for more.
>
> I totally agree with the goal of making AST Matchers, refactoring, and 
> tooling more accessible (which is why I'm helping with libraries like 
> Stencil and Syntax trees).

This change affects AST Matchers, but as far as I can see, it does not 
affect any other effort.

I don't see why you mention those initiatives in this and your other 
email. We're only talking about AST Matchers. Unless AST Matchers are to 
be removed, we should improve them, regardless of Syntax Tree, Stencil 
and Transformer.

> However the goal that I quoted from the proposal on this thread 
> ("Goal: Users should not have to account for implicit AST nodes when 
> writing AST Matchers") constrains the solution to exactly what is 
> being proposed.

Perhaps there's a misunderstanding. That's the goal of changing the default.

>>     - The complexity for implementers (although it also applies to
>>     https://reviews.llvm.org/D61837, which landed a while ago, and I
>>     didn't see it). Allowing to run the same matcher in multiple
>>     modes increases our support and testing matrix.
>
>
>     I don't think it increases the test matrix more than any other
>     feature. Can you expand on this?
>
> https://reviews.llvm.org/D61837 makes it possible to run every matcher 
> with different traversal semantics. Our current tests only test one 
> mode for each matcher, the default mode. There are two ways to define 
> custom matchers: the advanced way (through the AST_MATCHER macro) that 
> is needed when you want to traverse the AST yourself, and a simpler 
> way -- by combining existing matchers.

AFAICT, defining a matcher using the macro, or defining it as a 
composition function doesn't change much?

>
> Consider the `makeIteratorLoopMatcher` defined in 
> clang-tools-extra/clang-tidy/modernize/LoopConvertCheck.cpp. The 
> caller can run the newly made matcher in any traversal mode, even 
> though the matcher was probably written with the default mode in mind.
>
> Clang codebase does not have many matchers that are combinations of 
> existing matchers

It does actually - callee() for example. The fact that is uses a macro 
doesn't change anything AFAICT. Also alignOfExpr() which does not use a 
macro, but I don't see how that changes anything.

Am I missing something?

> , but code that uses Clang as a library (and often written by 
> engineers who are not compiler experts and just need to get some 
> refactoring done) does that very often in my experience working at Google.

I don't think the traverse() matcher changes so much in this area. "Just 
need to get some refactoring done" sounds like writing a composition 
matcher within a check, not writing it for some library code.

>>
>>     - The complexity cliff for users. Many non-trivial matchers need
>>     to explicitly manage implicit AST nodes.
>
>
>     You just described the complexity cliff which is (after my change)
>     not the first thing a newcomer hits.
>
>     It's good to move that cliff back so that the newcomer is not
>     confronted with it immediately.
>
> As soon as you pretty-print the AST (which I think most people should 
> do before writing a matcher), you see implicit nodes.

If using clang-query, you can change the mode of how clang-query matches 
and dumps ASTs. But, in the future you won't need to dump the AST 
anyway, because clang-query will just tell you the available matchers 
and what they match.

See my EuroLLVM talk for more, in particular the workflow diagram which 
does not include inspecting the AST: 
https://steveire.files.wordpress.com/2018/11/all-no-ast.png?w=900&h=590

>>     - The cost of ClangTidy. ClangTidy runs all matchers
>>     simultaneously in one AST traversal. If we implement
>>     implicit/no-implicit modes as separate AST traversals, ClangTidy
>>     would need to run two traversals, one for "easier" matchers
>>     (skipping implicit code), and one for "expert" matchers
>>     (traversing implicit code). We would also need to build two
>>     parent maps.
>
As of https://reviews.llvm.org/D73388 we no longer create a parent map 
for each traversal kind used at runtime.

This means I can no longer measure a runtime cost of introducing use of 
the traverse() matcher in clang-tidy.

>     That said, I think there is some scope for optimization in my
>     approach. The optimizations I have in mind (being more efficient
>     with parent maps) wouldn't apply to Syntax Trees though as that is
>     a completely separate system.
>
>

Done, per above.

>     As long as the current AST Matchers APIs exist, they should be
>     improved. Perhaps they will be less important in the future when
>     we have Syntax Trees for these use-cases instead. However, for
>     now, I don't think the ease of use improvement in my patches
>     should be prevented based existence of a future alternative system.
>
> I agree that we should not block improvements on existence of a future 
> system. However, each specific change should be evaluated based on its 
> merits and costs. I think there are non-trivial downsides to offering 
> a different mode in existing AST matchers, which create maintenance 
> and testing costs that we will be paying perpetually.

We may not achieve your agreement on this change, but I think we have 
consensus already that this change should proceed.

> It instantly creates a huge testing gap for existing AST matchers 
> (which we currently test only in one mode).

The traverse() matcher does not introduce a need to test each AST 
matcher in each mode.

> It also creates traps for existing code that will be able to start 
> accidentally calling existing matchers in a traversal mode that they 
> were not designed for.

Yes, true. It is possible to write a matcher which works as someone 
expects in one mode but not the other, such as a matcher which 
specifically expects to match a implicitCastExpr().

This can matter for matchers which are in "libraries", but not for 
matchers which are implementation details inside checks. Unless a 
particular check exposes the traversal kind as an option, those matchers 
can not be affected. Any check which does expose that as an option has 
to do so consciously and explicitly and has to test it. Nothing new there.

>
> On the other hand, my objection primarily applies to 
> https://reviews.llvm.org/D61837, which already landed, not to changing 
> the default.

Understood. Hopefully that comes through in my email.

> If you want to still proceed with changing the default, my request 
> would be to provide detailed guidance (probably as a section in 
> release notes) about updating existing code that uses matchers, 
> including ClangTidy checkers.

Certainly.

Thanks,

Stephen.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200126/0535ac25/attachment.html>