[cfe-dev] RFC: Easier AST Matching by Default

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 17 14:47:56 PDT 2020


On Wed, 17 Jun 2020 at 14:21, Stephen Kelly via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 16/06/2020 05:16, Richard Smith wrote:
>
> On Mon, 25 May 2020 at 14:56, Stephen Kelly via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> On 20/12/2019 21:01, Stephen Kelly via cfe-dev wrote:
>> >
>> > Hi,
>> >
>> > (Apologies if you receive this twice. GMail classified the first one as
>> > spam)
>> >
>> > Aaron Ballman and I met by chance in Belfast and we discussed a way
>> > forward towards making AST Matchers easier to use, particularly for C++
>> > developers who are not familiar with the details of the Clang AST.
>> >
>> > For those unaware, I expanded on this in the EuroLLVM conference this
>> > year, and then expanded on it at ACCU:
>> >
>> >   https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching
>> >
>> > One step in the process of getting there is changing the default
>> > behavior of AST Matchers to ignore invisible nodes while matching using
>> > the C++ API, and while matching and dumping AST nodes in clang-query.
>> >
>> > I think this is the most important change in the entire proposal as it
>> > sets out the intention of making the AST Matchers easier to use for C++
>> > developers who are not already familiar with Clang APIs.
>> >
>> > To that end, I've written an AST to motivate the change:
>> >
>> >
>> >
>> https://docs.google.com/document/d/17Z6gAwwc3HoRXvsy0OdwU0X5MFQEuiGeSu3i6ICOB90
>> >
>> >
>> > We're looking for feedback before pressing forward with the change. I
>> > already have some patches written to port clang-tidy and unit tests to
>> > account for the change of default.
>>
>>
>> This change is now in master.
>>
>
> Is this change responsible for https://bugs.llvm.org/show_bug.cgi?id=46287 ?
>
>
>
> Yes, I added a message on the bug and pointed out the
>
> set traversal AsIs
>
> command in clang-query: https://godbolt.org/z/tIS622
>
>
> If so, I think the current behavior is very confusing, and seems harmful.
> By default, a cxxConstructExpr matcher no longer matches all
> CXXConstructExprs, which seems very strange and surprising, and likewise
> it's alarming that an AST dump from clang-query no longer correctly dumps
> the AST.
>
>
> The intention is to be make the AST Matchers more approachable by
> newcomers.  CXXConstructExpr and CXXMemberCallExpr nodes show up in places
> which are non-intuitive for people who are new to the clang AST, but are
> nevertheless familiar enough with C++ to work professionally with it.
>
> Consider
>
> https://godbolt.org/z/TvUTgy
>
> and
>
> https://godbolt.org/z/me3EQu
>
> and imagine you've never taken a compiler course, and never seen the clang
> AST before.
>
> I am not surprised experts miss those nodes though, and the AsIs mode is
> essentially the "expert mode" while the default is intended to be "easy
> mode".
>
I think your goal is noble, and automatically skipping implicit nodes when
looking for an explicit one makes a lot of sense, but I don't think the
change as-is is a net positive for approachability by newcomers. The fact
is that you've made all of the matchers that match semantics rather than
syntax stop working reliably (but not stop working entirely, which, if
anything, is even worse because the behavior appears random and
inconsistent). Consider the example from Yitzhak Mandelbaum -- the issue
there is actually not whether we have a template or not; rather, it's a
difference between one- and two-argument constructors:

struct A {
  A(int);
  A(int, int);
};
A a(3);
A b(3, 4);

We produce uniform AST representations for these two cases: both are
CXXConstructExprs. Here's the AST dump:

|-VarDecl 0x638eb60 <col:38, col:43> col:40 a 'A' callinit
| `-CXXConstructExpr 0x638eff0 <col:40, col:43> 'A' 'void (int)'
|   `-IntegerLiteral 0x638ec10 <col:42> 'int' 3
`-VarDecl 0x638f030 <col:45, col:53> col:47 b 'A' callinit
  `-CXXConstructExpr 0x638f100 <col:47, col:53> 'A' 'void (int, int)'
    |-IntegerLiteral 0x638f098 <col:49> 'int' 3
    `-IntegerLiteral 0x638f0b8 <col:52> 'int' 4

But clang-query in its default mode instead dumps the example as

|-VarDecl 0x638eb60 <col:38, col:43> col:40 a 'A' callinit
| `-IntegerLiteral 0x638ec10 <col:42> 'int' 3
`-VarDecl 0x638f030 <col:45, col:53> col:47 b 'A' callinit
  `-CXXConstructExpr 0x638f100 <col:47, col:53> 'A' 'void (int, int)'
    |-IntegerLiteral 0x638f098 <col:49> 'int' 3
    `-IntegerLiteral 0x638f0b8 <col:52> 'int' 4

and a cxxConstructExpr() matcher matches the initialization of b but not
that of a. clang-query is following the new rules correctly: in the above
case, the single-argument construction is an implicit node because C++
non-explicit single argument constructors implement implicit conversions,
so the initialization of 'a' is an implicit conversion whereas the
initialization of 'b' is a construction.

> I hope you can agree that this change in behavior, for an example such as
this, is a regression for the ability for beginners to understand how
matchers work and effectively write them!

> Perhaps as an alternative, we should not skip implicit nodes when trying
> to match a node that might be implicit (analogous to how Type::getAs steps
> over sugar nodes except when looking for that type sugar node), and instead
> of hiding implicit nodes in the AST dump we should dim them out but still
> show them?
>
>
> Color wouldn't be particularly distinctive, especially in godbolt for
> example, but it also would mean that the confusing stuff is still *there*
> in the output. It can be overwhelming when absolutely everything in the
> output is new and confusing.
>
> (I also note that newcomers to Matchers don't need to see the AST at all
> if the clang-query shows them the matchers they can use instead:
> http://ce.steveire.com/z/9EF8x0 see my other posts on the topic for more)
>
Perhaps we should consider the AST dump to be an expert-level feature and
not try to optimize it for beginner usage, especially if we have better
output methods. I'm not sure, though; I think telling beginners to look at
the AST dump and match what they see there has been somewhat effective.

I think we should assume that we can ask the maintainers of compiler
explorer to render the output nicely, if we find that's a problem. (They
already apply some postprocessing to it, to remove pointer values and the
like.) As an alternative to greying out implicit nodes, we could render
them inside square brackets or something:

|-VarDecl 0x638eb60 <col:38, col:43> col:40 a 'A' callinit
| `-[CXXConstructExpr 0x638eff0 <col:40, col:43> 'A' 'void (int)']
|   `-IntegerLiteral 0x638ec10 <col:42> 'int' 3
`-VarDecl 0x638f030 <col:45, col:53> col:47 b 'A' callinit
  `-CXXConstructExpr 0x638f100 <col:47, col:53> 'A' 'void (int, int)'
    |-IntegerLiteral 0x638f098 <col:49> 'int' 3
    `-IntegerLiteral 0x638f0b8 <col:52> 'int' 4

> Nevertheless, this change in the direction that I consider easier hasn't
> been 100% popular among other clang developers (though I wonder how much of
> that is due to being experts?). At this point, if someone wishes to reverse
> the change of default I don't think I would attempt to oppose it. It's not
> clear to me whether my vision for making things easier for newcomers (which
> I set out in EuroLLVM and which included ignoring these nodes) is shared.
> It might be better at this point for others to decide.
>
I think it's a good goal. But I think the implementation needs some
refinement. Did you have any thoughts on my suggestion of how to fix this?
(That is: try matching against implicit nodes, but skip them if they don't
match.) I don't think you commented on it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200617/8be7ab26/attachment.html>


More information about the cfe-dev mailing list