[cfe-dev] RFC: Easier AST Matching by Default

Fri Jan 17 04:54:34 PST 2020

On Fri, Jan 17, 2020 at 12:41 PM Stephen Kelly <steveire at gmail.com> wrote:

>
> On 13/01/2020 16:39, Dmitri Gribenko wrote:
>
> On Fri, Jan 10, 2020 at 4:34 PM Aaron Ballman via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> I've not seen much to suggest the community is not in favor of this
>> direction, so I think you're all set to post patches for review. If
>> there are community concerns, we can address them during the review
>> process. Thank you for working on this!
>>
>
> Sorry, I just came back from vacation. I'm not sure if it is a good idea,
> in particular, for the following reasons:
>
> - The document specifies the goals in terms of a solution: "Goal: Users
> should not have to account for implicit AST nodes when writing AST
> Matchers".
>
>
> The goal is to make AST Matchers more accessible/easy to use for
> newcomers/non-experts in the clang AST by no-longer confronting them with
> 100% of the complexity on first use.
>
> I think it's important to bridge that gap. See my EuroLLVM talk for more.
>
I totally agree with the goal of making AST Matchers, refactoring, and
tooling more accessible (which is why I'm helping with libraries like
Stencil and Syntax trees). However the goal that I quoted from the proposal
on this thread ("Goal: Users should not have to account for implicit AST
nodes when writing AST Matchers") constrains the solution to exactly what
is being proposed.

>
> - The complexity for implementers (although it also applies to
> https://reviews.llvm.org/D61837, which landed a while ago, and I didn't
> see it). Allowing to run the same matcher in multiple modes increases our
> support and testing matrix.
>
>
> I don't think it increases the test matrix more than any other feature.
> Can you expand on this?
>
https://reviews.llvm.org/D61837 makes it possible to run every matcher with
different traversal semantics. Our current tests only test one mode for
each matcher, the default mode. There are two ways to define custom
matchers: the advanced way (through the AST_MATCHER macro) that is needed
when you want to traverse the AST yourself, and a simpler way -- by
combining existing matchers.

Consider the `makeIteratorLoopMatcher` defined in
clang-tools-extra/clang-tidy/modernize/LoopConvertCheck.cpp. The caller can
run the newly made matcher in any traversal mode, even though the matcher
was probably written with the default mode in mind.

Clang codebase does not have many matchers that are combinations of
existing matchers, but code that uses Clang as a library (and often written
by engineers who are not compiler experts and just need to get some
refactoring done) does that very often in my experience working at Google.

>
>
> - The complexity cliff for users. Many non-trivial matchers need to
> explicitly manage implicit AST nodes.
>
>
> You just described the complexity cliff which is (after my change) not the
> first thing a newcomer hits.
>
> It's good to move that cliff back so that the newcomer is not confronted
> with it immediately.
>
As soon as you pretty-print the AST (which I think most people should do
before writing a matcher), you see implicit nodes.

>
> - The cost of ClangTidy. ClangTidy runs all matchers simultaneously in one
> AST traversal. If we implement implicit/no-implicit modes as separate AST
> traversals, ClangTidy would need to run two traversals, one for "easier"
> matchers (skipping implicit code), and one for "expert" matchers
> (traversing implicit code). We would also need to build two parent maps.
>
>
> Will Syntax Trees ever be used in clang-tidy? Or will their use be
> forbidden in clang-tidy for the same reason of adding to a runtime cost
> when used in a mixture?
>
> If your concern is valid it would seem to apply to Syntax Trees to a
> greater extent than this patch does.
>
Syntax trees will be used in ClangTidy once they are more fleshed out. You
are absolutely right about syntax trees having their cost. However, the
runtime cost of syntax trees comes with a benefit of actually providing an
API that was designed to model syntax trees. The proposal to visit only
explicitly written AST nodes will only make matching look like it matches
syntax, but it won't help with any further inspection of the AST nodes,
mutation, or even viewing the AST before you write a matcher.

> That said, I think there is some scope for optimization in my approach.
> The optimizations I have in mind (being more efficient with parent maps)
> wouldn't apply to Syntax Trees though as that is a completely separate
> system.
>
>
>
> My best suggestion is to investigate implementing AST Matchers for syntax
> trees, and allow jumping between syntax and semantic nodes in one matcher
> -- to match syntax first, and then validate the necessary semantic
> constraints.
>
>
> My previous email quoted the Syntax Trees RFC saying that the first step
> is to "use [the current ASTMatchers API] to find interesting Clang AST
> nodes". You seem to be suggesting that the opposite would be done. Maybe
> things have evolved since that RFC, or maybe both approaches should work
> equally well.
>

I discussed it with Ilya in person and we concluded that the way it came
through as written in the RFC did not reflect our understanding of Syntax
Trees. See Ilya's response on this thread.

> As long as the current AST Matchers APIs exist, they should be improved.
> Perhaps they will be less important in the future when we have Syntax Trees
> for these use-cases instead. However, for now, I don't think the ease of
> use improvement in my patches should be prevented based existence of a
> future alternative system.
>
I agree that we should not block improvements on existence of a future
system. However, each specific change should be evaluated based on its
merits and costs. I think there are non-trivial downsides to offering a
different mode in existing AST matchers, which create maintenance and
testing costs that we will be paying perpetually. It instantly creates a
huge testing gap for existing AST matchers (which we currently test only in
one mode). It also creates traps for existing code that will be able to
start accidentally calling existing matchers in a traversal mode that they
were not designed for.

On the other hand, my objection primarily applies to
https://reviews.llvm.org/D61837, which already landed, not to changing the
default. Changing the default just exacerbates the problem significantly by
affecting the semantics of existing code silently. However, Clang does not
aim to provide a stable API even in the tooling space. So maybe it is too
late for me to object, and unless we also remove the traverse() matchers,
my objection can't be fully addressed.

If you want to still proceed with changing the default, my request would be
to provide detailed guidance (probably as a section in release notes) about
updating existing code that uses matchers, including ClangTidy checkers.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200117/8182a940/attachment-0001.html>