recordDecl() AST matcher

Wed Sep 16 14:03:47 PDT 2015

Attached is an updated patch for clang-tools-extra that does not have
my in-progress, not-related-to-any-of-this code in it. ;-)

~Aaron

On Wed, Sep 16, 2015 at 4:04 PM, Aaron Ballman <aaron at aaronballman.com> wrote:
> Quick ping. I know this is a fairly gigantic patch, but I'm hoping for
> a relatively quick review turnaround because of potential merge
> conflicts with people doing a fair amount of work on clang-tidy
> lately. Everything should be pretty straight-forward (it's all just
> renames, no semantic changes intended aside from
> recordDecl/cxxRecordDecl and the narrowing matchers.
>
> ~Aaron
>
> On Tue, Sep 15, 2015 at 1:32 PM, Aaron Ballman <aaron at aaronballman.com> wrote:
>> Here are the complete patches to solve the issues we've discussed:
>>
>> 1) splits recordDecl into recordDecl and cxxRecordDecl
>> 2) adds isStruct, isUnion, isClass to identify what kind of
>> recordDecl() you may be looking at
>> 3) prefixes all of the node matchers with cxx that should have it
>> 4) fixes a similar issue with CUDAKernelCallExpr (the prefix needs to
>> be cuda instead of CUDA to distinguish the matcher name from the type
>> name).
>> 5) updates all of the documentation and code that broke.
>>
>> One patch is for changes to clang, the other is for changes to
>> clang-tools-extra.
>>
>> ~Aaron
>>
>> On Mon, Sep 14, 2015 at 5:49 PM, Aaron Ballman <aaron at aaronballman.com> wrote:
>>> On Mon, Sep 14, 2015 at 5:47 PM, Daniel Jasper <djasper at google.com> wrote:
>>>> Btw, I think generating them, potentially into several different headers to
>>>> work around the compile time issue isn't such a bad idea.
>>>
>>> I'm not going to start with this approach, but think it may be worth
>>> exploring at some point. ;-)
>>>
>>> ~Aaron
>>>
>>>>
>>>> On Mon, Sep 14, 2015 at 11:45 PM, Manuel Klimek <klimek at google.com> wrote:
>>>>>
>>>>> Feel free to rename the AST nodes :)
>>>>>
>>>>>
>>>>> On Mon, Sep 14, 2015, 2:44 PM Daniel Jasper <djasper at google.com> wrote:
>>>>>>
>>>>>> Ok. I am happy with this then.
>>>>>>
>>>>>> (Just personally grumpy having to write
>>>>>> cxxRecordDecl(has(cxxConstructorDecl(..))) in the future ;-) ).
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 11:41 PM, Manuel Klimek <klimek at google.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 14, 2015 at 2:29 PM Aaron Ballman <aaron at aaronballman.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Sep 14, 2015 at 4:38 PM, Manuel Klimek <klimek at google.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, Sep 14, 2015 at 12:26 PM Aaron Ballman
>>>>>>>> > <aaron at aaronballman.com>
>>>>>>>> > wrote:
>>>>>>>> >>
>>>>>>>> >> On Mon, Sep 14, 2015 at 2:45 PM, Daniel Jasper <djasper at google.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > By this point, I see that change might be profitable overall.
>>>>>>>> >> > However,
>>>>>>>> >> > lets
>>>>>>>> >> > completely map this out. Changing just cxxRecordDecl() can
>>>>>>>> >> > actually
>>>>>>>> >> > increase
>>>>>>>> >> > confusion in other areas. Right now, not a single AST matcher has
>>>>>>>> >> > the
>>>>>>>> >> > cxx
>>>>>>>> >> > prefix (although a total of 28 stand for the corresponding CXX..
>>>>>>>> >> > AST
>>>>>>>> >> > node).
>>>>>>>> >> > This is consistent and people knowing this will never try to write
>>>>>>>> >> > cxxConstructExpr(). As soon as people have used cxxRecordDecl(),
>>>>>>>> >> > the
>>>>>>>> >> > chance
>>>>>>>> >> > of them trying cxxConstructExpr() increases. You have spent a long
>>>>>>>> >> > time
>>>>>>>> >> > figuring out that recordDecl means cxxRecordDecl(), which is one
>>>>>>>> >> > datapoint,
>>>>>>>> >> > but I am not aware of anyone else having this specific issue. And
>>>>>>>> >> > we
>>>>>>>> >> > could
>>>>>>>> >> > make this less bad with better documentation, I think.
>>>>>>>> >> >
>>>>>>>> >> > So, for me, the questions are:
>>>>>>>> >> > 1) Do we want/need this change?
>>>>>>>> >>
>>>>>>>> >> We definitely need *a* change because there currently is no way to
>>>>>>>> >> match a C struct or union when compiling in C mode. I discovered
>>>>>>>> >> this
>>>>>>>> >> because I was trying to write a new checker for clang-tidy that
>>>>>>>> >> focuses on C code and it would fail to match when compiling in C
>>>>>>>> >> mode.
>>>>>>>> >> Whether we decide to go with cxxRecordDecl vs recordDecl vs
>>>>>>>> >> structDecl
>>>>>>>> >> (etc) is less important to me than the ability to write clang-tidy
>>>>>>>> >> checks for C code.
>>>>>>>> >>
>>>>>>>> >> > 2) Do we want to be consistent and change the other 27 matchers as
>>>>>>>> >> > well?
>>>>>>>> >>
>>>>>>>> >> I'm on the fence about this for all the reasons you point out.
>>>>>>>> >>
>>>>>>>> >> > A fundamental question is whether we want AST matchers to match
>>>>>>>> >> > AST
>>>>>>>> >> > nodes
>>>>>>>> >> > 1:1 or whether they should be an abstraction from some
>>>>>>>> >> > implementation
>>>>>>>> >> > details of the AST.
>>>>>>>> >>
>>>>>>>> >> I absolutely agree that this is a fundamental question. I think a
>>>>>>>> >> higher priority fundamental question that goes along with it is: are
>>>>>>>> >> we okay with breaking a lot of user code (are these meant to be
>>>>>>>> >> stable
>>>>>>>> >> APIs like the LLVM C APIs)? If we want these APIs to be stable, that
>>>>>>>> >> changes the answer of what kind of mapping we can have.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I think the AST matchers are so closely coupled to the AST that it
>>>>>>>> > trying to
>>>>>>>> > be more stable than the AST doesn't help. Basically all uses of AST
>>>>>>>> > matchers
>>>>>>>> > do something with the AST nodes afterwards, which will break anyway.
>>>>>>>>
>>>>>>>> I can get behind that logic. So we're okay with breaking their code
>>>>>>>> because there's no way around it -- it's tied to the AST, so users
>>>>>>>> cannot rely on the AST APIs remaining the same from release to release
>>>>>>>> anyway.
>>>>>>>
>>>>>>>
>>>>>>> We might even *want* the code to break, as the use of the AST node might
>>>>>>> now be incorrect on a semantic level.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >
>>>>>>>> >>
>>>>>>>> >> > And this is not an easy question to answer. There are
>>>>>>>> >> > many places where we don't follow a strict 1:1 mapping. Mostly
>>>>>>>> >> > node
>>>>>>>> >> > matchers, but also in traversal matchers, e.g. isDerivedFrom().
>>>>>>>> >> >
>>>>>>>> >> > Personally, I'd really hate to have the cxx Prefix everywhere, but
>>>>>>>> >> > that's
>>>>>>>> >> > just my personal opinion. I would even prefer matchers like
>>>>>>>> >> > record() and
>>>>>>>> >> > method(), but I think somebody convinced me that that would be a
>>>>>>>> >> > very
>>>>>>>> >> > bad
>>>>>>>> >> > idea ;-).
>>>>>>>> >>
>>>>>>>> >> My personal opinion is that (1) breaking code is fine, but we should
>>>>>>>> >> avoid doing it without very clear benefit, and (2) the mapping
>>>>>>>> >> between
>>>>>>>> >> AST node identifiers and AST matcher identifiers needs to be
>>>>>>>> >> incredibly obvious, but perhaps not slavishly 1:1. If we instead
>>>>>>>> >> decide we want a 1:1 mapping, then I think we need to start
>>>>>>>> >> seriously
>>>>>>>> >> considering auto-generating the AST node (and type) matchers from
>>>>>>>> >> tablegen so that the AST nodes *cannot* get out of sync with the AST
>>>>>>>> >> matchers, otherwise we'll be right back here again in a few years
>>>>>>>> >> when
>>>>>>>> >> we modify the name of an AST node.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I do think we want to auto-generate the matchers, but I don't think
>>>>>>>> > tablegen
>>>>>>>> > is the right approach (I think an ast-matcher based tool is ;)
>>>>>>>> > That said, auto-generating all the matchers is a) a lot of effort and
>>>>>>>> > b) the
>>>>>>>> > code-size and compile time of matchers already matters, so it's
>>>>>>>> > unclear
>>>>>>>> > which ones we would want to generate, especially for traversal
>>>>>>>> > matchers :(
>>>>>>>>
>>>>>>>> Oh, that's an excellent point (I'm talking about (b), I already knew
>>>>>>>> (a) was a lot of work). Thank you for pointing that out!
>>>>>>>>
>>>>>>>> >
>>>>>>>> >>
>>>>>>>> >> My definition of "incredibly obvious" is: if the AST has a prefixed
>>>>>>>> >> and unprefixed version, or two different prefixes, we should mimic
>>>>>>>> >> that directly with the matchers. Otherwise, existing AST matchers
>>>>>>>> >> without prefix shenanigans can remain as they are, and new AST
>>>>>>>> >> matchers should prefix as-required. If we decide we're okay breaking
>>>>>>>> >> code, then I don't see a problem with changing ctorInitializer()
>>>>>>>> >> into
>>>>>>>> >> cxxCtorInitializer() when C adds constructors. ;-)
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I think the main things is cost for developers who try to write
>>>>>>>> > matchers and
>>>>>>>> > work from the -ast-dump. Figuring out that there *is* a matcher with
>>>>>>>> > an
>>>>>>>> > unprefixed node can take a while.
>>>>>>>>
>>>>>>>> Hmm, yes, but "take a while" should be relatively short, I would
>>>>>>>> think. In that use-case, the user does an -ast-dump, sees
>>>>>>>> "CXXFrobbleGnasher", they go to the AST matcher reference and they
>>>>>>>> search for "CXXFrobberGnasher." The first hit won't be
>>>>>>>> cxxFrobbleGnasher, but the entry for frobbleGnasher (which is still
>>>>>>>> the first hit when searching from the top of the document due to the
>>>>>>>> way we position node matchers) will have a parameter of
>>>>>>>> CXXFrobbleGnasher, so they will find still get to the right matcher on
>>>>>>>> the first hit. If someone doesn't read the documentation at all,
>>>>>>>> they're going to try cxxFrobbleGnasher() and get a compile error/no
>>>>>>>> known matcher. Then they'll go look at ASTMatchers.h and figure out
>>>>>>>> it's called frobbleGnasher by searching there instead of the
>>>>>>>> documentation.
>>>>>>>
>>>>>>>
>>>>>>> The problem is that I've learned that sometimes people try to make
>>>>>>> things work in ways that I couldn't even imagine, and they lose more time
>>>>>>> than I could ever imagine them using :) Also, I agree the time is probably
>>>>>>> on average not that large, but we pay it over a long time in the future, and
>>>>>>> it tends to add up.
>>>>>>>
>>>>>>>>
>>>>>>>> That's compared to having the matcher name always be the same as the
>>>>>>>> AST node, where the user writes cxxFrobbleGnasher and it just works,
>>>>>>>> which is definitely a mark in favor of making everything consistent. I
>>>>>>>> just don't think the current approach is too onerous in the case where
>>>>>>>> the matcher is at least *provided* for the user with a relatively sane
>>>>>>>> name.
>>>>>>>>
>>>>>>>> >> I should be clear, I'm not opposed to just having a 1:1 mapping. I'm
>>>>>>>> >> just not certain the benefits of being strict about that outweigh
>>>>>>>> >> the
>>>>>>>> >> costs to broken code. cxxCtorInitializer will break someone's code,
>>>>>>>> >> but I don't think it adds any clarity to their code, so I don't see
>>>>>>>> >> the benefit of forcing the change.
>>>>>>>> >
>>>>>>>> > Well, I think there's the cost of broken code *once* now, vs. the
>>>>>>>> > (smaller)
>>>>>>>> > cost for users in all future.
>>>>>>>> > I'm still strongly in favor of breaking now, and having a simpler
>>>>>>>> > model
>>>>>>>> > going forward.
>>>>>>>>
>>>>>>>> I'm definitely in favor of breaking now in the case of RecordDecl vs
>>>>>>>> CXXRecordDecl. I think having recordDecl match CXXRecordDecl is a bug
>>>>>>>> given that there's no way to match a RecordDecl.
>>>>>>>>
>>>>>>>> I would also be totally in favor of being consistent if we were
>>>>>>>> starting from scratch. I'm very, very weakly opposed to breaking more
>>>>>>>> user's code than we have to in order to get usable matchers because it
>>>>>>>> seems gratuitous. Breaking code to get something that works seems
>>>>>>>> reasonable. Breaking code that already works just to change the name
>>>>>>>> for consistency elsewhere, I'm a bit less keen on. But the fact that
>>>>>>>> we already can break user's code at-will because of the reliance on
>>>>>>>> the AST nodes makes me think it may be the right approach for the best
>>>>>>>> API, since that's what I would want if we were starting from scratch.
>>>>>>>>
>>>>>>>> Okay, I'm convinced. I think we should rename the type and node
>>>>>>>> matchers (not traversal and narrowing matchers) to match the AST node
>>>>>>>> names in all cases. We can document the breakage in the release notes,
>>>>>>>> but (hopefully) only have to do this dance one time instead of
>>>>>>>> spreading the pain out as it happens to eventually get to the same
>>>>>>>> place anyway.
>>>>>>>
>>>>>>>
>>>>>>> Yea, people who want more stability do use releases anyway.
>>>>>>>
>>>>>>>>
>>>>>>>> Daniel, is this something you would be okay with? (I'm assuming
>>>>>>>> Richard finds it acceptable based on previous comments from Manuel,
>>>>>>>> but Richard, feel free to chime in.)
>>>>>>>
>>>>>>>
>>>>>>> Offline conversation with Richard says that he is convinced.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ~Aaron
>>>>>>
>>>>>>
>>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: recordDecl-clang-tools-extra.patch
Type: application/octet-stream
Size: 48452 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150916/a2b4100c/attachment-0001.obj>