recordDecl() AST matcher

Mon Sep 14 14:29:52 PDT 2015

On Mon, Sep 14, 2015 at 4:38 PM, Manuel Klimek <klimek at google.com> wrote:
>
>
> On Mon, Sep 14, 2015 at 12:26 PM Aaron Ballman <aaron at aaronballman.com>
> wrote:
>>
>> On Mon, Sep 14, 2015 at 2:45 PM, Daniel Jasper <djasper at google.com> wrote:
>> > By this point, I see that change might be profitable overall. However,
>> > lets
>> > completely map this out. Changing just cxxRecordDecl() can actually
>> > increase
>> > confusion in other areas. Right now, not a single AST matcher has the
>> > cxx
>> > prefix (although a total of 28 stand for the corresponding CXX.. AST
>> > node).
>> > This is consistent and people knowing this will never try to write
>> > cxxConstructExpr(). As soon as people have used cxxRecordDecl(), the
>> > chance
>> > of them trying cxxConstructExpr() increases. You have spent a long time
>> > figuring out that recordDecl means cxxRecordDecl(), which is one
>> > datapoint,
>> > but I am not aware of anyone else having this specific issue. And we
>> > could
>> > make this less bad with better documentation, I think.
>> >
>> > So, for me, the questions are:
>> > 1) Do we want/need this change?
>>
>> We definitely need *a* change because there currently is no way to
>> match a C struct or union when compiling in C mode. I discovered this
>> because I was trying to write a new checker for clang-tidy that
>> focuses on C code and it would fail to match when compiling in C mode.
>> Whether we decide to go with cxxRecordDecl vs recordDecl vs structDecl
>> (etc) is less important to me than the ability to write clang-tidy
>> checks for C code.
>>
>> > 2) Do we want to be consistent and change the other 27 matchers as well?
>>
>> I'm on the fence about this for all the reasons you point out.
>>
>> > A fundamental question is whether we want AST matchers to match AST
>> > nodes
>> > 1:1 or whether they should be an abstraction from some implementation
>> > details of the AST.
>>
>> I absolutely agree that this is a fundamental question. I think a
>> higher priority fundamental question that goes along with it is: are
>> we okay with breaking a lot of user code (are these meant to be stable
>> APIs like the LLVM C APIs)? If we want these APIs to be stable, that
>> changes the answer of what kind of mapping we can have.
>
>
> I think the AST matchers are so closely coupled to the AST that it trying to
> be more stable than the AST doesn't help. Basically all uses of AST matchers
> do something with the AST nodes afterwards, which will break anyway.

I can get behind that logic. So we're okay with breaking their code
because there's no way around it -- it's tied to the AST, so users
cannot rely on the AST APIs remaining the same from release to release
anyway.

>
>>
>> > And this is not an easy question to answer. There are
>> > many places where we don't follow a strict 1:1 mapping. Mostly node
>> > matchers, but also in traversal matchers, e.g. isDerivedFrom().
>> >
>> > Personally, I'd really hate to have the cxx Prefix everywhere, but
>> > that's
>> > just my personal opinion. I would even prefer matchers like record() and
>> > method(), but I think somebody convinced me that that would be a very
>> > bad
>> > idea ;-).
>>
>> My personal opinion is that (1) breaking code is fine, but we should
>> avoid doing it without very clear benefit, and (2) the mapping between
>> AST node identifiers and AST matcher identifiers needs to be
>> incredibly obvious, but perhaps not slavishly 1:1. If we instead
>> decide we want a 1:1 mapping, then I think we need to start seriously
>> considering auto-generating the AST node (and type) matchers from
>> tablegen so that the AST nodes *cannot* get out of sync with the AST
>> matchers, otherwise we'll be right back here again in a few years when
>> we modify the name of an AST node.
>
>
> I do think we want to auto-generate the matchers, but I don't think tablegen
> is the right approach (I think an ast-matcher based tool is ;)
> That said, auto-generating all the matchers is a) a lot of effort and b) the
> code-size and compile time of matchers already matters, so it's unclear
> which ones we would want to generate, especially for traversal matchers :(

Oh, that's an excellent point (I'm talking about (b), I already knew
(a) was a lot of work). Thank you for pointing that out!

>
>>
>> My definition of "incredibly obvious" is: if the AST has a prefixed
>> and unprefixed version, or two different prefixes, we should mimic
>> that directly with the matchers. Otherwise, existing AST matchers
>> without prefix shenanigans can remain as they are, and new AST
>> matchers should prefix as-required. If we decide we're okay breaking
>> code, then I don't see a problem with changing ctorInitializer() into
>> cxxCtorInitializer() when C adds constructors. ;-)
>
>
> I think the main things is cost for developers who try to write matchers and
> work from the -ast-dump. Figuring out that there *is* a matcher with an
> unprefixed node can take a while.

Hmm, yes, but "take a while" should be relatively short, I would
think. In that use-case, the user does an -ast-dump, sees
"CXXFrobbleGnasher", they go to the AST matcher reference and they
search for "CXXFrobberGnasher." The first hit won't be
cxxFrobbleGnasher, but the entry for frobbleGnasher (which is still
the first hit when searching from the top of the document due to the
way we position node matchers) will have a parameter of
CXXFrobbleGnasher, so they will find still get to the right matcher on
the first hit. If someone doesn't read the documentation at all,
they're going to try cxxFrobbleGnasher() and get a compile error/no
known matcher. Then they'll go look at ASTMatchers.h and figure out
it's called frobbleGnasher by searching there instead of the
documentation.

That's compared to having the matcher name always be the same as the
AST node, where the user writes cxxFrobbleGnasher and it just works,
which is definitely a mark in favor of making everything consistent. I
just don't think the current approach is too onerous in the case where
the matcher is at least *provided* for the user with a relatively sane
name.

>> I should be clear, I'm not opposed to just having a 1:1 mapping. I'm
>> just not certain the benefits of being strict about that outweigh the
>> costs to broken code. cxxCtorInitializer will break someone's code,
>> but I don't think it adds any clarity to their code, so I don't see
>> the benefit of forcing the change.
>
> Well, I think there's the cost of broken code *once* now, vs. the (smaller)
> cost for users in all future.
> I'm still strongly in favor of breaking now, and having a simpler model
> going forward.

I'm definitely in favor of breaking now in the case of RecordDecl vs
CXXRecordDecl. I think having recordDecl match CXXRecordDecl is a bug
given that there's no way to match a RecordDecl.

I would also be totally in favor of being consistent if we were
starting from scratch. I'm very, very weakly opposed to breaking more
user's code than we have to in order to get usable matchers because it
seems gratuitous. Breaking code to get something that works seems
reasonable. Breaking code that already works just to change the name
for consistency elsewhere, I'm a bit less keen on. But the fact that
we already can break user's code at-will because of the reliance on
the AST nodes makes me think it may be the right approach for the best
API, since that's what I would want if we were starting from scratch.

Okay, I'm convinced. I think we should rename the type and node
matchers (not traversal and narrowing matchers) to match the AST node
names in all cases. We can document the breakage in the release notes,
but (hopefully) only have to do this dance one time instead of
spreading the pain out as it happens to eventually get to the same
place anyway.

Daniel, is this something you would be okay with? (I'm assuming
Richard finds it acceptable based on previous comments from Manuel,
but Richard, feel free to chime in.)

~Aaron