recordDecl() AST matcher

Tue Sep 15 11:48:48 PDT 2015

On Mon, Sep 14, 2015 at 2:41 PM, Manuel Klimek <klimek at google.com> wrote:
>
> On Mon, Sep 14, 2015 at 2:29 PM Aaron Ballman <aaron at aaronballman.com>
> wrote:
>
>> On Mon, Sep 14, 2015 at 4:38 PM, Manuel Klimek <klimek at google.com> wrote:
>> >
>> >
>> > On Mon, Sep 14, 2015 at 12:26 PM Aaron Ballman <aaron at aaronballman.com>
>> > wrote:
>> >>
>> >> On Mon, Sep 14, 2015 at 2:45 PM, Daniel Jasper <djasper at google.com>
>> wrote:
>> >> > By this point, I see that change might be profitable overall.
>> However,
>> >> > lets
>> >> > completely map this out. Changing just cxxRecordDecl() can actually
>> >> > increase
>> >> > confusion in other areas. Right now, not a single AST matcher has the
>> >> > cxx
>> >> > prefix (although a total of 28 stand for the corresponding CXX.. AST
>> >> > node).
>> >> > This is consistent and people knowing this will never try to write
>> >> > cxxConstructExpr(). As soon as people have used cxxRecordDecl(), the
>> >> > chance
>> >> > of them trying cxxConstructExpr() increases. You have spent a long
>> time
>> >> > figuring out that recordDecl means cxxRecordDecl(), which is one
>> >> > datapoint,
>> >> > but I am not aware of anyone else having this specific issue. And we
>> >> > could
>> >> > make this less bad with better documentation, I think.
>> >> >
>> >> > So, for me, the questions are:
>> >> > 1) Do we want/need this change?
>> >>
>> >> We definitely need *a* change because there currently is no way to
>> >> match a C struct or union when compiling in C mode. I discovered this
>> >> because I was trying to write a new checker for clang-tidy that
>> >> focuses on C code and it would fail to match when compiling in C mode.
>> >> Whether we decide to go with cxxRecordDecl vs recordDecl vs structDecl
>> >> (etc) is less important to me than the ability to write clang-tidy
>> >> checks for C code.
>> >>
>> >> > 2) Do we want to be consistent and change the other 27 matchers as
>> well?
>> >>
>> >> I'm on the fence about this for all the reasons you point out.
>> >>
>> >> > A fundamental question is whether we want AST matchers to match AST
>> >> > nodes
>> >> > 1:1 or whether they should be an abstraction from some implementation
>> >> > details of the AST.
>> >>
>> >> I absolutely agree that this is a fundamental question. I think a
>> >> higher priority fundamental question that goes along with it is: are
>> >> we okay with breaking a lot of user code (are these meant to be stable
>> >> APIs like the LLVM C APIs)? If we want these APIs to be stable, that
>> >> changes the answer of what kind of mapping we can have.
>> >
>> >
>> > I think the AST matchers are so closely coupled to the AST that it
>> trying to
>> > be more stable than the AST doesn't help. Basically all uses of AST
>> matchers
>> > do something with the AST nodes afterwards, which will break anyway.
>>
>> I can get behind that logic. So we're okay with breaking their code
>> because there's no way around it -- it's tied to the AST, so users
>> cannot rely on the AST APIs remaining the same from release to release
>> anyway.
>>
>
> We might even *want* the code to break, as the use of the AST node might
> now be incorrect on a semantic level.
>
>
>>
>> >
>> >>
>> >> > And this is not an easy question to answer. There are
>> >> > many places where we don't follow a strict 1:1 mapping. Mostly node
>> >> > matchers, but also in traversal matchers, e.g. isDerivedFrom().
>> >> >
>> >> > Personally, I'd really hate to have the cxx Prefix everywhere, but
>> >> > that's
>> >> > just my personal opinion. I would even prefer matchers like record()
>> and
>> >> > method(), but I think somebody convinced me that that would be a very
>> >> > bad
>> >> > idea ;-).
>> >>
>> >> My personal opinion is that (1) breaking code is fine, but we should
>> >> avoid doing it without very clear benefit, and (2) the mapping between
>> >> AST node identifiers and AST matcher identifiers needs to be
>> >> incredibly obvious, but perhaps not slavishly 1:1. If we instead
>> >> decide we want a 1:1 mapping, then I think we need to start seriously
>> >> considering auto-generating the AST node (and type) matchers from
>> >> tablegen so that the AST nodes *cannot* get out of sync with the AST
>> >> matchers, otherwise we'll be right back here again in a few years when
>> >> we modify the name of an AST node.
>> >
>> >
>> > I do think we want to auto-generate the matchers, but I don't think
>> tablegen
>> > is the right approach (I think an ast-matcher based tool is ;)
>> > That said, auto-generating all the matchers is a) a lot of effort and
>> b) the
>> > code-size and compile time of matchers already matters, so it's unclear
>> > which ones we would want to generate, especially for traversal matchers
>> :(
>>
>> Oh, that's an excellent point (I'm talking about (b), I already knew
>> (a) was a lot of work). Thank you for pointing that out!
>>
>> >
>> >>
>> >> My definition of "incredibly obvious" is: if the AST has a prefixed
>> >> and unprefixed version, or two different prefixes, we should mimic
>> >> that directly with the matchers. Otherwise, existing AST matchers
>> >> without prefix shenanigans can remain as they are, and new AST
>> >> matchers should prefix as-required. If we decide we're okay breaking
>> >> code, then I don't see a problem with changing ctorInitializer() into
>> >> cxxCtorInitializer() when C adds constructors. ;-)
>> >
>> >
>> > I think the main things is cost for developers who try to write
>> matchers and
>> > work from the -ast-dump. Figuring out that there *is* a matcher with an
>> > unprefixed node can take a while.
>>
>> Hmm, yes, but "take a while" should be relatively short, I would
>> think. In that use-case, the user does an -ast-dump, sees
>> "CXXFrobbleGnasher", they go to the AST matcher reference and they
>> search for "CXXFrobberGnasher." The first hit won't be
>> cxxFrobbleGnasher, but the entry for frobbleGnasher (which is still
>> the first hit when searching from the top of the document due to the
>> way we position node matchers) will have a parameter of
>> CXXFrobbleGnasher, so they will find still get to the right matcher on
>> the first hit. If someone doesn't read the documentation at all,
>> they're going to try cxxFrobbleGnasher() and get a compile error/no
>> known matcher. Then they'll go look at ASTMatchers.h and figure out
>> it's called frobbleGnasher by searching there instead of the
>> documentation.
>>
>
> The problem is that I've learned that sometimes people try to make things
> work in ways that I couldn't even imagine, and they lose more time than I
> could ever imagine them using :) Also, I agree the time is probably on
> average not that large, but we pay it over a long time in the future, and
> it tends to add up.
>
>
>> That's compared to having the matcher name always be the same as the
>> AST node, where the user writes cxxFrobbleGnasher and it just works,
>> which is definitely a mark in favor of making everything consistent. I
>> just don't think the current approach is too onerous in the case where
>> the matcher is at least *provided* for the user with a relatively sane
>> name.
>>
>> >> I should be clear, I'm not opposed to just having a 1:1 mapping. I'm
>> >> just not certain the benefits of being strict about that outweigh the
>> >> costs to broken code. cxxCtorInitializer will break someone's code,
>> >> but I don't think it adds any clarity to their code, so I don't see
>> >> the benefit of forcing the change.
>> >
>> > Well, I think there's the cost of broken code *once* now, vs. the
>> (smaller)
>> > cost for users in all future.
>> > I'm still strongly in favor of breaking now, and having a simpler model
>> > going forward.
>>
>> I'm definitely in favor of breaking now in the case of RecordDecl vs
>> CXXRecordDecl. I think having recordDecl match CXXRecordDecl is a bug
>> given that there's no way to match a RecordDecl.
>>
>> I would also be totally in favor of being consistent if we were
>> starting from scratch. I'm very, very weakly opposed to breaking more
>> user's code than we have to in order to get usable matchers because it
>> seems gratuitous. Breaking code to get something that works seems
>> reasonable. Breaking code that already works just to change the name
>> for consistency elsewhere, I'm a bit less keen on. But the fact that
>> we already can break user's code at-will because of the reliance on
>> the AST nodes makes me think it may be the right approach for the best
>> API, since that's what I would want if we were starting from scratch.
>>
>> Okay, I'm convinced. I think we should rename the type and node
>> matchers (not traversal and narrowing matchers) to match the AST node
>> names in all cases. We can document the breakage in the release notes,
>> but (hopefully) only have to do this dance one time instead of
>> spreading the pain out as it happens to eventually get to the same
>> place anyway.
>>
>
> Yea, people who want more stability do use releases anyway.
>
>
>> Daniel, is this something you would be okay with? (I'm assuming
>> Richard finds it acceptable based on previous comments from Manuel,
>> but Richard, feel free to chime in.)
>>
>
> Offline conversation with Richard says that he is convinced.
>

Right, Manuel has me entirely convinced that the matcher names should
exactly match the AST node names (modulo initial capitalization). This is
what users of the matchers expect, and is what makes it practical to figure
out how to write a matcher (from an AST dump). Also, the API stability
guarantees for the matchers are exactly the same as for the Clang C++ API:
source and binary compatibility across point releases, but no guarantees
whatsoever across minor releases or in SVN trunk.

As such, it's completely irrelevant whether we plan to rename AST nodes to
remove the CXX in future. The matchers should match the AST now. If we
rename AST nodes in future, we should rename the matchers to match when we
do so.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150915/d27e7adb/attachment-0001.html>