recordDecl() AST matcher

Aaron Ballman via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 17 06:33:21 PDT 2015


I've commit in r247885 and r247886. I will add something to the
release notes, and watch the bots to see if any tests got missed
(since I did my development on Windows).

Thank you!

~Aaron

On Wed, Sep 16, 2015 at 7:49 PM, Manuel Klimek <klimek at google.com> wrote:
> LG, ship it.
>
> On Wed, Sep 16, 2015 at 2:03 PM Aaron Ballman <aaron at aaronballman.com>
> wrote:
>>
>> Attached is an updated patch for clang-tools-extra that does not have
>> my in-progress, not-related-to-any-of-this code in it. ;-)
>>
>> ~Aaron
>>
>> On Wed, Sep 16, 2015 at 4:04 PM, Aaron Ballman <aaron at aaronballman.com>
>> wrote:
>> > Quick ping. I know this is a fairly gigantic patch, but I'm hoping for
>> > a relatively quick review turnaround because of potential merge
>> > conflicts with people doing a fair amount of work on clang-tidy
>> > lately. Everything should be pretty straight-forward (it's all just
>> > renames, no semantic changes intended aside from
>> > recordDecl/cxxRecordDecl and the narrowing matchers.
>> >
>> > ~Aaron
>> >
>> > On Tue, Sep 15, 2015 at 1:32 PM, Aaron Ballman <aaron at aaronballman.com>
>> > wrote:
>> >> Here are the complete patches to solve the issues we've discussed:
>> >>
>> >> 1) splits recordDecl into recordDecl and cxxRecordDecl
>> >> 2) adds isStruct, isUnion, isClass to identify what kind of
>> >> recordDecl() you may be looking at
>> >> 3) prefixes all of the node matchers with cxx that should have it
>> >> 4) fixes a similar issue with CUDAKernelCallExpr (the prefix needs to
>> >> be cuda instead of CUDA to distinguish the matcher name from the type
>> >> name).
>> >> 5) updates all of the documentation and code that broke.
>> >>
>> >> One patch is for changes to clang, the other is for changes to
>> >> clang-tools-extra.
>> >>
>> >> ~Aaron
>> >>
>> >> On Mon, Sep 14, 2015 at 5:49 PM, Aaron Ballman <aaron at aaronballman.com>
>> >> wrote:
>> >>> On Mon, Sep 14, 2015 at 5:47 PM, Daniel Jasper <djasper at google.com>
>> >>> wrote:
>> >>>> Btw, I think generating them, potentially into several different
>> >>>> headers to
>> >>>> work around the compile time issue isn't such a bad idea.
>> >>>
>> >>> I'm not going to start with this approach, but think it may be worth
>> >>> exploring at some point. ;-)
>> >>>
>> >>> ~Aaron
>> >>>
>> >>>>
>> >>>> On Mon, Sep 14, 2015 at 11:45 PM, Manuel Klimek <klimek at google.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Feel free to rename the AST nodes :)
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Sep 14, 2015, 2:44 PM Daniel Jasper <djasper at google.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Ok. I am happy with this then.
>> >>>>>>
>> >>>>>> (Just personally grumpy having to write
>> >>>>>> cxxRecordDecl(has(cxxConstructorDecl(..))) in the future ;-) ).
>> >>>>>>
>> >>>>>> On Mon, Sep 14, 2015 at 11:41 PM, Manuel Klimek <klimek at google.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Sep 14, 2015 at 2:29 PM Aaron Ballman
>> >>>>>>> <aaron at aaronballman.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> On Mon, Sep 14, 2015 at 4:38 PM, Manuel Klimek
>> >>>>>>>> <klimek at google.com>
>> >>>>>>>> wrote:
>> >>>>>>>> >
>> >>>>>>>> >
>> >>>>>>>> > On Mon, Sep 14, 2015 at 12:26 PM Aaron Ballman
>> >>>>>>>> > <aaron at aaronballman.com>
>> >>>>>>>> > wrote:
>> >>>>>>>> >>
>> >>>>>>>> >> On Mon, Sep 14, 2015 at 2:45 PM, Daniel Jasper
>> >>>>>>>> >> <djasper at google.com>
>> >>>>>>>> >> wrote:
>> >>>>>>>> >> > By this point, I see that change might be profitable
>> >>>>>>>> >> > overall.
>> >>>>>>>> >> > However,
>> >>>>>>>> >> > lets
>> >>>>>>>> >> > completely map this out. Changing just cxxRecordDecl() can
>> >>>>>>>> >> > actually
>> >>>>>>>> >> > increase
>> >>>>>>>> >> > confusion in other areas. Right now, not a single AST
>> >>>>>>>> >> > matcher has
>> >>>>>>>> >> > the
>> >>>>>>>> >> > cxx
>> >>>>>>>> >> > prefix (although a total of 28 stand for the corresponding
>> >>>>>>>> >> > CXX..
>> >>>>>>>> >> > AST
>> >>>>>>>> >> > node).
>> >>>>>>>> >> > This is consistent and people knowing this will never try to
>> >>>>>>>> >> > write
>> >>>>>>>> >> > cxxConstructExpr(). As soon as people have used
>> >>>>>>>> >> > cxxRecordDecl(),
>> >>>>>>>> >> > the
>> >>>>>>>> >> > chance
>> >>>>>>>> >> > of them trying cxxConstructExpr() increases. You have spent
>> >>>>>>>> >> > a long
>> >>>>>>>> >> > time
>> >>>>>>>> >> > figuring out that recordDecl means cxxRecordDecl(), which is
>> >>>>>>>> >> > one
>> >>>>>>>> >> > datapoint,
>> >>>>>>>> >> > but I am not aware of anyone else having this specific
>> >>>>>>>> >> > issue. And
>> >>>>>>>> >> > we
>> >>>>>>>> >> > could
>> >>>>>>>> >> > make this less bad with better documentation, I think.
>> >>>>>>>> >> >
>> >>>>>>>> >> > So, for me, the questions are:
>> >>>>>>>> >> > 1) Do we want/need this change?
>> >>>>>>>> >>
>> >>>>>>>> >> We definitely need *a* change because there currently is no
>> >>>>>>>> >> way to
>> >>>>>>>> >> match a C struct or union when compiling in C mode. I
>> >>>>>>>> >> discovered
>> >>>>>>>> >> this
>> >>>>>>>> >> because I was trying to write a new checker for clang-tidy
>> >>>>>>>> >> that
>> >>>>>>>> >> focuses on C code and it would fail to match when compiling in
>> >>>>>>>> >> C
>> >>>>>>>> >> mode.
>> >>>>>>>> >> Whether we decide to go with cxxRecordDecl vs recordDecl vs
>> >>>>>>>> >> structDecl
>> >>>>>>>> >> (etc) is less important to me than the ability to write
>> >>>>>>>> >> clang-tidy
>> >>>>>>>> >> checks for C code.
>> >>>>>>>> >>
>> >>>>>>>> >> > 2) Do we want to be consistent and change the other 27
>> >>>>>>>> >> > matchers as
>> >>>>>>>> >> > well?
>> >>>>>>>> >>
>> >>>>>>>> >> I'm on the fence about this for all the reasons you point out.
>> >>>>>>>> >>
>> >>>>>>>> >> > A fundamental question is whether we want AST matchers to
>> >>>>>>>> >> > match
>> >>>>>>>> >> > AST
>> >>>>>>>> >> > nodes
>> >>>>>>>> >> > 1:1 or whether they should be an abstraction from some
>> >>>>>>>> >> > implementation
>> >>>>>>>> >> > details of the AST.
>> >>>>>>>> >>
>> >>>>>>>> >> I absolutely agree that this is a fundamental question. I
>> >>>>>>>> >> think a
>> >>>>>>>> >> higher priority fundamental question that goes along with it
>> >>>>>>>> >> is: are
>> >>>>>>>> >> we okay with breaking a lot of user code (are these meant to
>> >>>>>>>> >> be
>> >>>>>>>> >> stable
>> >>>>>>>> >> APIs like the LLVM C APIs)? If we want these APIs to be
>> >>>>>>>> >> stable, that
>> >>>>>>>> >> changes the answer of what kind of mapping we can have.
>> >>>>>>>> >
>> >>>>>>>> >
>> >>>>>>>> > I think the AST matchers are so closely coupled to the AST that
>> >>>>>>>> > it
>> >>>>>>>> > trying to
>> >>>>>>>> > be more stable than the AST doesn't help. Basically all uses of
>> >>>>>>>> > AST
>> >>>>>>>> > matchers
>> >>>>>>>> > do something with the AST nodes afterwards, which will break
>> >>>>>>>> > anyway.
>> >>>>>>>>
>> >>>>>>>> I can get behind that logic. So we're okay with breaking their
>> >>>>>>>> code
>> >>>>>>>> because there's no way around it -- it's tied to the AST, so
>> >>>>>>>> users
>> >>>>>>>> cannot rely on the AST APIs remaining the same from release to
>> >>>>>>>> release
>> >>>>>>>> anyway.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> We might even *want* the code to break, as the use of the AST node
>> >>>>>>> might
>> >>>>>>> now be incorrect on a semantic level.
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> >
>> >>>>>>>> >>
>> >>>>>>>> >> > And this is not an easy question to answer. There are
>> >>>>>>>> >> > many places where we don't follow a strict 1:1 mapping.
>> >>>>>>>> >> > Mostly
>> >>>>>>>> >> > node
>> >>>>>>>> >> > matchers, but also in traversal matchers, e.g.
>> >>>>>>>> >> > isDerivedFrom().
>> >>>>>>>> >> >
>> >>>>>>>> >> > Personally, I'd really hate to have the cxx Prefix
>> >>>>>>>> >> > everywhere, but
>> >>>>>>>> >> > that's
>> >>>>>>>> >> > just my personal opinion. I would even prefer matchers like
>> >>>>>>>> >> > record() and
>> >>>>>>>> >> > method(), but I think somebody convinced me that that would
>> >>>>>>>> >> > be a
>> >>>>>>>> >> > very
>> >>>>>>>> >> > bad
>> >>>>>>>> >> > idea ;-).
>> >>>>>>>> >>
>> >>>>>>>> >> My personal opinion is that (1) breaking code is fine, but we
>> >>>>>>>> >> should
>> >>>>>>>> >> avoid doing it without very clear benefit, and (2) the mapping
>> >>>>>>>> >> between
>> >>>>>>>> >> AST node identifiers and AST matcher identifiers needs to be
>> >>>>>>>> >> incredibly obvious, but perhaps not slavishly 1:1. If we
>> >>>>>>>> >> instead
>> >>>>>>>> >> decide we want a 1:1 mapping, then I think we need to start
>> >>>>>>>> >> seriously
>> >>>>>>>> >> considering auto-generating the AST node (and type) matchers
>> >>>>>>>> >> from
>> >>>>>>>> >> tablegen so that the AST nodes *cannot* get out of sync with
>> >>>>>>>> >> the AST
>> >>>>>>>> >> matchers, otherwise we'll be right back here again in a few
>> >>>>>>>> >> years
>> >>>>>>>> >> when
>> >>>>>>>> >> we modify the name of an AST node.
>> >>>>>>>> >
>> >>>>>>>> >
>> >>>>>>>> > I do think we want to auto-generate the matchers, but I don't
>> >>>>>>>> > think
>> >>>>>>>> > tablegen
>> >>>>>>>> > is the right approach (I think an ast-matcher based tool is ;)
>> >>>>>>>> > That said, auto-generating all the matchers is a) a lot of
>> >>>>>>>> > effort and
>> >>>>>>>> > b) the
>> >>>>>>>> > code-size and compile time of matchers already matters, so it's
>> >>>>>>>> > unclear
>> >>>>>>>> > which ones we would want to generate, especially for traversal
>> >>>>>>>> > matchers :(
>> >>>>>>>>
>> >>>>>>>> Oh, that's an excellent point (I'm talking about (b), I already
>> >>>>>>>> knew
>> >>>>>>>> (a) was a lot of work). Thank you for pointing that out!
>> >>>>>>>>
>> >>>>>>>> >
>> >>>>>>>> >>
>> >>>>>>>> >> My definition of "incredibly obvious" is: if the AST has a
>> >>>>>>>> >> prefixed
>> >>>>>>>> >> and unprefixed version, or two different prefixes, we should
>> >>>>>>>> >> mimic
>> >>>>>>>> >> that directly with the matchers. Otherwise, existing AST
>> >>>>>>>> >> matchers
>> >>>>>>>> >> without prefix shenanigans can remain as they are, and new AST
>> >>>>>>>> >> matchers should prefix as-required. If we decide we're okay
>> >>>>>>>> >> breaking
>> >>>>>>>> >> code, then I don't see a problem with changing
>> >>>>>>>> >> ctorInitializer()
>> >>>>>>>> >> into
>> >>>>>>>> >> cxxCtorInitializer() when C adds constructors. ;-)
>> >>>>>>>> >
>> >>>>>>>> >
>> >>>>>>>> > I think the main things is cost for developers who try to write
>> >>>>>>>> > matchers and
>> >>>>>>>> > work from the -ast-dump. Figuring out that there *is* a matcher
>> >>>>>>>> > with
>> >>>>>>>> > an
>> >>>>>>>> > unprefixed node can take a while.
>> >>>>>>>>
>> >>>>>>>> Hmm, yes, but "take a while" should be relatively short, I would
>> >>>>>>>> think. In that use-case, the user does an -ast-dump, sees
>> >>>>>>>> "CXXFrobbleGnasher", they go to the AST matcher reference and
>> >>>>>>>> they
>> >>>>>>>> search for "CXXFrobberGnasher." The first hit won't be
>> >>>>>>>> cxxFrobbleGnasher, but the entry for frobbleGnasher (which is
>> >>>>>>>> still
>> >>>>>>>> the first hit when searching from the top of the document due to
>> >>>>>>>> the
>> >>>>>>>> way we position node matchers) will have a parameter of
>> >>>>>>>> CXXFrobbleGnasher, so they will find still get to the right
>> >>>>>>>> matcher on
>> >>>>>>>> the first hit. If someone doesn't read the documentation at all,
>> >>>>>>>> they're going to try cxxFrobbleGnasher() and get a compile
>> >>>>>>>> error/no
>> >>>>>>>> known matcher. Then they'll go look at ASTMatchers.h and figure
>> >>>>>>>> out
>> >>>>>>>> it's called frobbleGnasher by searching there instead of the
>> >>>>>>>> documentation.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> The problem is that I've learned that sometimes people try to make
>> >>>>>>> things work in ways that I couldn't even imagine, and they lose
>> >>>>>>> more time
>> >>>>>>> than I could ever imagine them using :) Also, I agree the time is
>> >>>>>>> probably
>> >>>>>>> on average not that large, but we pay it over a long time in the
>> >>>>>>> future, and
>> >>>>>>> it tends to add up.
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> That's compared to having the matcher name always be the same as
>> >>>>>>>> the
>> >>>>>>>> AST node, where the user writes cxxFrobbleGnasher and it just
>> >>>>>>>> works,
>> >>>>>>>> which is definitely a mark in favor of making everything
>> >>>>>>>> consistent. I
>> >>>>>>>> just don't think the current approach is too onerous in the case
>> >>>>>>>> where
>> >>>>>>>> the matcher is at least *provided* for the user with a relatively
>> >>>>>>>> sane
>> >>>>>>>> name.
>> >>>>>>>>
>> >>>>>>>> >> I should be clear, I'm not opposed to just having a 1:1
>> >>>>>>>> >> mapping. I'm
>> >>>>>>>> >> just not certain the benefits of being strict about that
>> >>>>>>>> >> outweigh
>> >>>>>>>> >> the
>> >>>>>>>> >> costs to broken code. cxxCtorInitializer will break someone's
>> >>>>>>>> >> code,
>> >>>>>>>> >> but I don't think it adds any clarity to their code, so I
>> >>>>>>>> >> don't see
>> >>>>>>>> >> the benefit of forcing the change.
>> >>>>>>>> >
>> >>>>>>>> > Well, I think there's the cost of broken code *once* now, vs.
>> >>>>>>>> > the
>> >>>>>>>> > (smaller)
>> >>>>>>>> > cost for users in all future.
>> >>>>>>>> > I'm still strongly in favor of breaking now, and having a
>> >>>>>>>> > simpler
>> >>>>>>>> > model
>> >>>>>>>> > going forward.
>> >>>>>>>>
>> >>>>>>>> I'm definitely in favor of breaking now in the case of RecordDecl
>> >>>>>>>> vs
>> >>>>>>>> CXXRecordDecl. I think having recordDecl match CXXRecordDecl is a
>> >>>>>>>> bug
>> >>>>>>>> given that there's no way to match a RecordDecl.
>> >>>>>>>>
>> >>>>>>>> I would also be totally in favor of being consistent if we were
>> >>>>>>>> starting from scratch. I'm very, very weakly opposed to breaking
>> >>>>>>>> more
>> >>>>>>>> user's code than we have to in order to get usable matchers
>> >>>>>>>> because it
>> >>>>>>>> seems gratuitous. Breaking code to get something that works seems
>> >>>>>>>> reasonable. Breaking code that already works just to change the
>> >>>>>>>> name
>> >>>>>>>> for consistency elsewhere, I'm a bit less keen on. But the fact
>> >>>>>>>> that
>> >>>>>>>> we already can break user's code at-will because of the reliance
>> >>>>>>>> on
>> >>>>>>>> the AST nodes makes me think it may be the right approach for the
>> >>>>>>>> best
>> >>>>>>>> API, since that's what I would want if we were starting from
>> >>>>>>>> scratch.
>> >>>>>>>>
>> >>>>>>>> Okay, I'm convinced. I think we should rename the type and node
>> >>>>>>>> matchers (not traversal and narrowing matchers) to match the AST
>> >>>>>>>> node
>> >>>>>>>> names in all cases. We can document the breakage in the release
>> >>>>>>>> notes,
>> >>>>>>>> but (hopefully) only have to do this dance one time instead of
>> >>>>>>>> spreading the pain out as it happens to eventually get to the
>> >>>>>>>> same
>> >>>>>>>> place anyway.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Yea, people who want more stability do use releases anyway.
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Daniel, is this something you would be okay with? (I'm assuming
>> >>>>>>>> Richard finds it acceptable based on previous comments from
>> >>>>>>>> Manuel,
>> >>>>>>>> but Richard, feel free to chime in.)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Offline conversation with Richard says that he is convinced.
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> ~Aaron
>> >>>>>>
>> >>>>>>
>> >>>>


More information about the cfe-commits mailing list