recordDecl() AST matcher

Aaron Ballman via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 15 10:32:58 PDT 2015


Here are the complete patches to solve the issues we've discussed:

1) splits recordDecl into recordDecl and cxxRecordDecl
2) adds isStruct, isUnion, isClass to identify what kind of
recordDecl() you may be looking at
3) prefixes all of the node matchers with cxx that should have it
4) fixes a similar issue with CUDAKernelCallExpr (the prefix needs to
be cuda instead of CUDA to distinguish the matcher name from the type
name).
5) updates all of the documentation and code that broke.

One patch is for changes to clang, the other is for changes to
clang-tools-extra.

~Aaron

On Mon, Sep 14, 2015 at 5:49 PM, Aaron Ballman <aaron at aaronballman.com> wrote:
> On Mon, Sep 14, 2015 at 5:47 PM, Daniel Jasper <djasper at google.com> wrote:
>> Btw, I think generating them, potentially into several different headers to
>> work around the compile time issue isn't such a bad idea.
>
> I'm not going to start with this approach, but think it may be worth
> exploring at some point. ;-)
>
> ~Aaron
>
>>
>> On Mon, Sep 14, 2015 at 11:45 PM, Manuel Klimek <klimek at google.com> wrote:
>>>
>>> Feel free to rename the AST nodes :)
>>>
>>>
>>> On Mon, Sep 14, 2015, 2:44 PM Daniel Jasper <djasper at google.com> wrote:
>>>>
>>>> Ok. I am happy with this then.
>>>>
>>>> (Just personally grumpy having to write
>>>> cxxRecordDecl(has(cxxConstructorDecl(..))) in the future ;-) ).
>>>>
>>>> On Mon, Sep 14, 2015 at 11:41 PM, Manuel Klimek <klimek at google.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 14, 2015 at 2:29 PM Aaron Ballman <aaron at aaronballman.com>
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 4:38 PM, Manuel Klimek <klimek at google.com>
>>>>>> wrote:
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Sep 14, 2015 at 12:26 PM Aaron Ballman
>>>>>> > <aaron at aaronballman.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> On Mon, Sep 14, 2015 at 2:45 PM, Daniel Jasper <djasper at google.com>
>>>>>> >> wrote:
>>>>>> >> > By this point, I see that change might be profitable overall.
>>>>>> >> > However,
>>>>>> >> > lets
>>>>>> >> > completely map this out. Changing just cxxRecordDecl() can
>>>>>> >> > actually
>>>>>> >> > increase
>>>>>> >> > confusion in other areas. Right now, not a single AST matcher has
>>>>>> >> > the
>>>>>> >> > cxx
>>>>>> >> > prefix (although a total of 28 stand for the corresponding CXX..
>>>>>> >> > AST
>>>>>> >> > node).
>>>>>> >> > This is consistent and people knowing this will never try to write
>>>>>> >> > cxxConstructExpr(). As soon as people have used cxxRecordDecl(),
>>>>>> >> > the
>>>>>> >> > chance
>>>>>> >> > of them trying cxxConstructExpr() increases. You have spent a long
>>>>>> >> > time
>>>>>> >> > figuring out that recordDecl means cxxRecordDecl(), which is one
>>>>>> >> > datapoint,
>>>>>> >> > but I am not aware of anyone else having this specific issue. And
>>>>>> >> > we
>>>>>> >> > could
>>>>>> >> > make this less bad with better documentation, I think.
>>>>>> >> >
>>>>>> >> > So, for me, the questions are:
>>>>>> >> > 1) Do we want/need this change?
>>>>>> >>
>>>>>> >> We definitely need *a* change because there currently is no way to
>>>>>> >> match a C struct or union when compiling in C mode. I discovered
>>>>>> >> this
>>>>>> >> because I was trying to write a new checker for clang-tidy that
>>>>>> >> focuses on C code and it would fail to match when compiling in C
>>>>>> >> mode.
>>>>>> >> Whether we decide to go with cxxRecordDecl vs recordDecl vs
>>>>>> >> structDecl
>>>>>> >> (etc) is less important to me than the ability to write clang-tidy
>>>>>> >> checks for C code.
>>>>>> >>
>>>>>> >> > 2) Do we want to be consistent and change the other 27 matchers as
>>>>>> >> > well?
>>>>>> >>
>>>>>> >> I'm on the fence about this for all the reasons you point out.
>>>>>> >>
>>>>>> >> > A fundamental question is whether we want AST matchers to match
>>>>>> >> > AST
>>>>>> >> > nodes
>>>>>> >> > 1:1 or whether they should be an abstraction from some
>>>>>> >> > implementation
>>>>>> >> > details of the AST.
>>>>>> >>
>>>>>> >> I absolutely agree that this is a fundamental question. I think a
>>>>>> >> higher priority fundamental question that goes along with it is: are
>>>>>> >> we okay with breaking a lot of user code (are these meant to be
>>>>>> >> stable
>>>>>> >> APIs like the LLVM C APIs)? If we want these APIs to be stable, that
>>>>>> >> changes the answer of what kind of mapping we can have.
>>>>>> >
>>>>>> >
>>>>>> > I think the AST matchers are so closely coupled to the AST that it
>>>>>> > trying to
>>>>>> > be more stable than the AST doesn't help. Basically all uses of AST
>>>>>> > matchers
>>>>>> > do something with the AST nodes afterwards, which will break anyway.
>>>>>>
>>>>>> I can get behind that logic. So we're okay with breaking their code
>>>>>> because there's no way around it -- it's tied to the AST, so users
>>>>>> cannot rely on the AST APIs remaining the same from release to release
>>>>>> anyway.
>>>>>
>>>>>
>>>>> We might even *want* the code to break, as the use of the AST node might
>>>>> now be incorrect on a semantic level.
>>>>>
>>>>>>
>>>>>>
>>>>>> >
>>>>>> >>
>>>>>> >> > And this is not an easy question to answer. There are
>>>>>> >> > many places where we don't follow a strict 1:1 mapping. Mostly
>>>>>> >> > node
>>>>>> >> > matchers, but also in traversal matchers, e.g. isDerivedFrom().
>>>>>> >> >
>>>>>> >> > Personally, I'd really hate to have the cxx Prefix everywhere, but
>>>>>> >> > that's
>>>>>> >> > just my personal opinion. I would even prefer matchers like
>>>>>> >> > record() and
>>>>>> >> > method(), but I think somebody convinced me that that would be a
>>>>>> >> > very
>>>>>> >> > bad
>>>>>> >> > idea ;-).
>>>>>> >>
>>>>>> >> My personal opinion is that (1) breaking code is fine, but we should
>>>>>> >> avoid doing it without very clear benefit, and (2) the mapping
>>>>>> >> between
>>>>>> >> AST node identifiers and AST matcher identifiers needs to be
>>>>>> >> incredibly obvious, but perhaps not slavishly 1:1. If we instead
>>>>>> >> decide we want a 1:1 mapping, then I think we need to start
>>>>>> >> seriously
>>>>>> >> considering auto-generating the AST node (and type) matchers from
>>>>>> >> tablegen so that the AST nodes *cannot* get out of sync with the AST
>>>>>> >> matchers, otherwise we'll be right back here again in a few years
>>>>>> >> when
>>>>>> >> we modify the name of an AST node.
>>>>>> >
>>>>>> >
>>>>>> > I do think we want to auto-generate the matchers, but I don't think
>>>>>> > tablegen
>>>>>> > is the right approach (I think an ast-matcher based tool is ;)
>>>>>> > That said, auto-generating all the matchers is a) a lot of effort and
>>>>>> > b) the
>>>>>> > code-size and compile time of matchers already matters, so it's
>>>>>> > unclear
>>>>>> > which ones we would want to generate, especially for traversal
>>>>>> > matchers :(
>>>>>>
>>>>>> Oh, that's an excellent point (I'm talking about (b), I already knew
>>>>>> (a) was a lot of work). Thank you for pointing that out!
>>>>>>
>>>>>> >
>>>>>> >>
>>>>>> >> My definition of "incredibly obvious" is: if the AST has a prefixed
>>>>>> >> and unprefixed version, or two different prefixes, we should mimic
>>>>>> >> that directly with the matchers. Otherwise, existing AST matchers
>>>>>> >> without prefix shenanigans can remain as they are, and new AST
>>>>>> >> matchers should prefix as-required. If we decide we're okay breaking
>>>>>> >> code, then I don't see a problem with changing ctorInitializer()
>>>>>> >> into
>>>>>> >> cxxCtorInitializer() when C adds constructors. ;-)
>>>>>> >
>>>>>> >
>>>>>> > I think the main things is cost for developers who try to write
>>>>>> > matchers and
>>>>>> > work from the -ast-dump. Figuring out that there *is* a matcher with
>>>>>> > an
>>>>>> > unprefixed node can take a while.
>>>>>>
>>>>>> Hmm, yes, but "take a while" should be relatively short, I would
>>>>>> think. In that use-case, the user does an -ast-dump, sees
>>>>>> "CXXFrobbleGnasher", they go to the AST matcher reference and they
>>>>>> search for "CXXFrobberGnasher." The first hit won't be
>>>>>> cxxFrobbleGnasher, but the entry for frobbleGnasher (which is still
>>>>>> the first hit when searching from the top of the document due to the
>>>>>> way we position node matchers) will have a parameter of
>>>>>> CXXFrobbleGnasher, so they will find still get to the right matcher on
>>>>>> the first hit. If someone doesn't read the documentation at all,
>>>>>> they're going to try cxxFrobbleGnasher() and get a compile error/no
>>>>>> known matcher. Then they'll go look at ASTMatchers.h and figure out
>>>>>> it's called frobbleGnasher by searching there instead of the
>>>>>> documentation.
>>>>>
>>>>>
>>>>> The problem is that I've learned that sometimes people try to make
>>>>> things work in ways that I couldn't even imagine, and they lose more time
>>>>> than I could ever imagine them using :) Also, I agree the time is probably
>>>>> on average not that large, but we pay it over a long time in the future, and
>>>>> it tends to add up.
>>>>>
>>>>>>
>>>>>> That's compared to having the matcher name always be the same as the
>>>>>> AST node, where the user writes cxxFrobbleGnasher and it just works,
>>>>>> which is definitely a mark in favor of making everything consistent. I
>>>>>> just don't think the current approach is too onerous in the case where
>>>>>> the matcher is at least *provided* for the user with a relatively sane
>>>>>> name.
>>>>>>
>>>>>> >> I should be clear, I'm not opposed to just having a 1:1 mapping. I'm
>>>>>> >> just not certain the benefits of being strict about that outweigh
>>>>>> >> the
>>>>>> >> costs to broken code. cxxCtorInitializer will break someone's code,
>>>>>> >> but I don't think it adds any clarity to their code, so I don't see
>>>>>> >> the benefit of forcing the change.
>>>>>> >
>>>>>> > Well, I think there's the cost of broken code *once* now, vs. the
>>>>>> > (smaller)
>>>>>> > cost for users in all future.
>>>>>> > I'm still strongly in favor of breaking now, and having a simpler
>>>>>> > model
>>>>>> > going forward.
>>>>>>
>>>>>> I'm definitely in favor of breaking now in the case of RecordDecl vs
>>>>>> CXXRecordDecl. I think having recordDecl match CXXRecordDecl is a bug
>>>>>> given that there's no way to match a RecordDecl.
>>>>>>
>>>>>> I would also be totally in favor of being consistent if we were
>>>>>> starting from scratch. I'm very, very weakly opposed to breaking more
>>>>>> user's code than we have to in order to get usable matchers because it
>>>>>> seems gratuitous. Breaking code to get something that works seems
>>>>>> reasonable. Breaking code that already works just to change the name
>>>>>> for consistency elsewhere, I'm a bit less keen on. But the fact that
>>>>>> we already can break user's code at-will because of the reliance on
>>>>>> the AST nodes makes me think it may be the right approach for the best
>>>>>> API, since that's what I would want if we were starting from scratch.
>>>>>>
>>>>>> Okay, I'm convinced. I think we should rename the type and node
>>>>>> matchers (not traversal and narrowing matchers) to match the AST node
>>>>>> names in all cases. We can document the breakage in the release notes,
>>>>>> but (hopefully) only have to do this dance one time instead of
>>>>>> spreading the pain out as it happens to eventually get to the same
>>>>>> place anyway.
>>>>>
>>>>>
>>>>> Yea, people who want more stability do use releases anyway.
>>>>>
>>>>>>
>>>>>> Daniel, is this something you would be okay with? (I'm assuming
>>>>>> Richard finds it acceptable based on previous comments from Manuel,
>>>>>> but Richard, feel free to chime in.)
>>>>>
>>>>>
>>>>> Offline conversation with Richard says that he is convinced.
>>>>>
>>>>>>
>>>>>>
>>>>>> ~Aaron
>>>>
>>>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: recordDecl-clang.patch
Type: application/octet-stream
Size: 190700 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150915/185c5401/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: recordDecl-clang-tools-extra.patch
Type: application/octet-stream
Size: 54614 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150915/185c5401/attachment-0003.obj>


More information about the cfe-commits mailing list