recordDecl() AST matcher

Wed Sep 16 16:49:27 PDT 2015

LG, ship it.

On Wed, Sep 16, 2015 at 2:03 PM Aaron Ballman <aaron at aaronballman.com>
wrote:

> Attached is an updated patch for clang-tools-extra that does not have
> my in-progress, not-related-to-any-of-this code in it. ;-)
>
> ~Aaron
>
> On Wed, Sep 16, 2015 at 4:04 PM, Aaron Ballman <aaron at aaronballman.com>
> wrote:
> > Quick ping. I know this is a fairly gigantic patch, but I'm hoping for
> > a relatively quick review turnaround because of potential merge
> > conflicts with people doing a fair amount of work on clang-tidy
> > lately. Everything should be pretty straight-forward (it's all just
> > renames, no semantic changes intended aside from
> > recordDecl/cxxRecordDecl and the narrowing matchers.
> >
> > ~Aaron
> >
> > On Tue, Sep 15, 2015 at 1:32 PM, Aaron Ballman <aaron at aaronballman.com>
> wrote:
> >> Here are the complete patches to solve the issues we've discussed:
> >>
> >> 1) splits recordDecl into recordDecl and cxxRecordDecl
> >> 2) adds isStruct, isUnion, isClass to identify what kind of
> >> recordDecl() you may be looking at
> >> 3) prefixes all of the node matchers with cxx that should have it
> >> 4) fixes a similar issue with CUDAKernelCallExpr (the prefix needs to
> >> be cuda instead of CUDA to distinguish the matcher name from the type
> >> name).
> >> 5) updates all of the documentation and code that broke.
> >>
> >> One patch is for changes to clang, the other is for changes to
> >> clang-tools-extra.
> >>
> >> ~Aaron
> >>
> >> On Mon, Sep 14, 2015 at 5:49 PM, Aaron Ballman <aaron at aaronballman.com>
> wrote:
> >>> On Mon, Sep 14, 2015 at 5:47 PM, Daniel Jasper <djasper at google.com>
> wrote:
> >>>> Btw, I think generating them, potentially into several different
> headers to
> >>>> work around the compile time issue isn't such a bad idea.
> >>>
> >>> I'm not going to start with this approach, but think it may be worth
> >>> exploring at some point. ;-)
> >>>
> >>> ~Aaron
> >>>
> >>>>
> >>>> On Mon, Sep 14, 2015 at 11:45 PM, Manuel Klimek <klimek at google.com>
> wrote:
> >>>>>
> >>>>> Feel free to rename the AST nodes :)
> >>>>>
> >>>>>
> >>>>> On Mon, Sep 14, 2015, 2:44 PM Daniel Jasper <djasper at google.com>
> wrote:
> >>>>>>
> >>>>>> Ok. I am happy with this then.
> >>>>>>
> >>>>>> (Just personally grumpy having to write
> >>>>>> cxxRecordDecl(has(cxxConstructorDecl(..))) in the future ;-) ).
> >>>>>>
> >>>>>> On Mon, Sep 14, 2015 at 11:41 PM, Manuel Klimek <klimek at google.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Sep 14, 2015 at 2:29 PM Aaron Ballman <
> aaron at aaronballman.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> On Mon, Sep 14, 2015 at 4:38 PM, Manuel Klimek <klimek at google.com
> >
> >>>>>>>> wrote:
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > On Mon, Sep 14, 2015 at 12:26 PM Aaron Ballman
> >>>>>>>> > <aaron at aaronballman.com>
> >>>>>>>> > wrote:
> >>>>>>>> >>
> >>>>>>>> >> On Mon, Sep 14, 2015 at 2:45 PM, Daniel Jasper <
> djasper at google.com>
> >>>>>>>> >> wrote:
> >>>>>>>> >> > By this point, I see that change might be profitable overall.
> >>>>>>>> >> > However,
> >>>>>>>> >> > lets
> >>>>>>>> >> > completely map this out. Changing just cxxRecordDecl() can
> >>>>>>>> >> > actually
> >>>>>>>> >> > increase
> >>>>>>>> >> > confusion in other areas. Right now, not a single AST
> matcher has
> >>>>>>>> >> > the
> >>>>>>>> >> > cxx
> >>>>>>>> >> > prefix (although a total of 28 stand for the corresponding
> CXX..
> >>>>>>>> >> > AST
> >>>>>>>> >> > node).
> >>>>>>>> >> > This is consistent and people knowing this will never try to
> write
> >>>>>>>> >> > cxxConstructExpr(). As soon as people have used
> cxxRecordDecl(),
> >>>>>>>> >> > the
> >>>>>>>> >> > chance
> >>>>>>>> >> > of them trying cxxConstructExpr() increases. You have spent
> a long
> >>>>>>>> >> > time
> >>>>>>>> >> > figuring out that recordDecl means cxxRecordDecl(), which is
> one
> >>>>>>>> >> > datapoint,
> >>>>>>>> >> > but I am not aware of anyone else having this specific
> issue. And
> >>>>>>>> >> > we
> >>>>>>>> >> > could
> >>>>>>>> >> > make this less bad with better documentation, I think.
> >>>>>>>> >> >
> >>>>>>>> >> > So, for me, the questions are:
> >>>>>>>> >> > 1) Do we want/need this change?
> >>>>>>>> >>
> >>>>>>>> >> We definitely need *a* change because there currently is no
> way to
> >>>>>>>> >> match a C struct or union when compiling in C mode. I
> discovered
> >>>>>>>> >> this
> >>>>>>>> >> because I was trying to write a new checker for clang-tidy that
> >>>>>>>> >> focuses on C code and it would fail to match when compiling in
> C
> >>>>>>>> >> mode.
> >>>>>>>> >> Whether we decide to go with cxxRecordDecl vs recordDecl vs
> >>>>>>>> >> structDecl
> >>>>>>>> >> (etc) is less important to me than the ability to write
> clang-tidy
> >>>>>>>> >> checks for C code.
> >>>>>>>> >>
> >>>>>>>> >> > 2) Do we want to be consistent and change the other 27
> matchers as
> >>>>>>>> >> > well?
> >>>>>>>> >>
> >>>>>>>> >> I'm on the fence about this for all the reasons you point out.
> >>>>>>>> >>
> >>>>>>>> >> > A fundamental question is whether we want AST matchers to
> match
> >>>>>>>> >> > AST
> >>>>>>>> >> > nodes
> >>>>>>>> >> > 1:1 or whether they should be an abstraction from some
> >>>>>>>> >> > implementation
> >>>>>>>> >> > details of the AST.
> >>>>>>>> >>
> >>>>>>>> >> I absolutely agree that this is a fundamental question. I
> think a
> >>>>>>>> >> higher priority fundamental question that goes along with it
> is: are
> >>>>>>>> >> we okay with breaking a lot of user code (are these meant to be
> >>>>>>>> >> stable
> >>>>>>>> >> APIs like the LLVM C APIs)? If we want these APIs to be
> stable, that
> >>>>>>>> >> changes the answer of what kind of mapping we can have.
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > I think the AST matchers are so closely coupled to the AST that
> it
> >>>>>>>> > trying to
> >>>>>>>> > be more stable than the AST doesn't help. Basically all uses of
> AST
> >>>>>>>> > matchers
> >>>>>>>> > do something with the AST nodes afterwards, which will break
> anyway.
> >>>>>>>>
> >>>>>>>> I can get behind that logic. So we're okay with breaking their
> code
> >>>>>>>> because there's no way around it -- it's tied to the AST, so users
> >>>>>>>> cannot rely on the AST APIs remaining the same from release to
> release
> >>>>>>>> anyway.
> >>>>>>>
> >>>>>>>
> >>>>>>> We might even *want* the code to break, as the use of the AST node
> might
> >>>>>>> now be incorrect on a semantic level.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> >
> >>>>>>>> >>
> >>>>>>>> >> > And this is not an easy question to answer. There are
> >>>>>>>> >> > many places where we don't follow a strict 1:1 mapping.
> Mostly
> >>>>>>>> >> > node
> >>>>>>>> >> > matchers, but also in traversal matchers, e.g.
> isDerivedFrom().
> >>>>>>>> >> >
> >>>>>>>> >> > Personally, I'd really hate to have the cxx Prefix
> everywhere, but
> >>>>>>>> >> > that's
> >>>>>>>> >> > just my personal opinion. I would even prefer matchers like
> >>>>>>>> >> > record() and
> >>>>>>>> >> > method(), but I think somebody convinced me that that would
> be a
> >>>>>>>> >> > very
> >>>>>>>> >> > bad
> >>>>>>>> >> > idea ;-).
> >>>>>>>> >>
> >>>>>>>> >> My personal opinion is that (1) breaking code is fine, but we
> should
> >>>>>>>> >> avoid doing it without very clear benefit, and (2) the mapping
> >>>>>>>> >> between
> >>>>>>>> >> AST node identifiers and AST matcher identifiers needs to be
> >>>>>>>> >> incredibly obvious, but perhaps not slavishly 1:1. If we
> instead
> >>>>>>>> >> decide we want a 1:1 mapping, then I think we need to start
> >>>>>>>> >> seriously
> >>>>>>>> >> considering auto-generating the AST node (and type) matchers
> from
> >>>>>>>> >> tablegen so that the AST nodes *cannot* get out of sync with
> the AST
> >>>>>>>> >> matchers, otherwise we'll be right back here again in a few
> years
> >>>>>>>> >> when
> >>>>>>>> >> we modify the name of an AST node.
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > I do think we want to auto-generate the matchers, but I don't
> think
> >>>>>>>> > tablegen
> >>>>>>>> > is the right approach (I think an ast-matcher based tool is ;)
> >>>>>>>> > That said, auto-generating all the matchers is a) a lot of
> effort and
> >>>>>>>> > b) the
> >>>>>>>> > code-size and compile time of matchers already matters, so it's
> >>>>>>>> > unclear
> >>>>>>>> > which ones we would want to generate, especially for traversal
> >>>>>>>> > matchers :(
> >>>>>>>>
> >>>>>>>> Oh, that's an excellent point (I'm talking about (b), I already
> knew
> >>>>>>>> (a) was a lot of work). Thank you for pointing that out!
> >>>>>>>>
> >>>>>>>> >
> >>>>>>>> >>
> >>>>>>>> >> My definition of "incredibly obvious" is: if the AST has a
> prefixed
> >>>>>>>> >> and unprefixed version, or two different prefixes, we should
> mimic
> >>>>>>>> >> that directly with the matchers. Otherwise, existing AST
> matchers
> >>>>>>>> >> without prefix shenanigans can remain as they are, and new AST
> >>>>>>>> >> matchers should prefix as-required. If we decide we're okay
> breaking
> >>>>>>>> >> code, then I don't see a problem with changing
> ctorInitializer()
> >>>>>>>> >> into
> >>>>>>>> >> cxxCtorInitializer() when C adds constructors. ;-)
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > I think the main things is cost for developers who try to write
> >>>>>>>> > matchers and
> >>>>>>>> > work from the -ast-dump. Figuring out that there *is* a matcher
> with
> >>>>>>>> > an
> >>>>>>>> > unprefixed node can take a while.
> >>>>>>>>
> >>>>>>>> Hmm, yes, but "take a while" should be relatively short, I would
> >>>>>>>> think. In that use-case, the user does an -ast-dump, sees
> >>>>>>>> "CXXFrobbleGnasher", they go to the AST matcher reference and they
> >>>>>>>> search for "CXXFrobberGnasher." The first hit won't be
> >>>>>>>> cxxFrobbleGnasher, but the entry for frobbleGnasher (which is
> still
> >>>>>>>> the first hit when searching from the top of the document due to
> the
> >>>>>>>> way we position node matchers) will have a parameter of
> >>>>>>>> CXXFrobbleGnasher, so they will find still get to the right
> matcher on
> >>>>>>>> the first hit. If someone doesn't read the documentation at all,
> >>>>>>>> they're going to try cxxFrobbleGnasher() and get a compile
> error/no
> >>>>>>>> known matcher. Then they'll go look at ASTMatchers.h and figure
> out
> >>>>>>>> it's called frobbleGnasher by searching there instead of the
> >>>>>>>> documentation.
> >>>>>>>
> >>>>>>>
> >>>>>>> The problem is that I've learned that sometimes people try to make
> >>>>>>> things work in ways that I couldn't even imagine, and they lose
> more time
> >>>>>>> than I could ever imagine them using :) Also, I agree the time is
> probably
> >>>>>>> on average not that large, but we pay it over a long time in the
> future, and
> >>>>>>> it tends to add up.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> That's compared to having the matcher name always be the same as
> the
> >>>>>>>> AST node, where the user writes cxxFrobbleGnasher and it just
> works,
> >>>>>>>> which is definitely a mark in favor of making everything
> consistent. I
> >>>>>>>> just don't think the current approach is too onerous in the case
> where
> >>>>>>>> the matcher is at least *provided* for the user with a relatively
> sane
> >>>>>>>> name.
> >>>>>>>>
> >>>>>>>> >> I should be clear, I'm not opposed to just having a 1:1
> mapping. I'm
> >>>>>>>> >> just not certain the benefits of being strict about that
> outweigh
> >>>>>>>> >> the
> >>>>>>>> >> costs to broken code. cxxCtorInitializer will break someone's
> code,
> >>>>>>>> >> but I don't think it adds any clarity to their code, so I
> don't see
> >>>>>>>> >> the benefit of forcing the change.
> >>>>>>>> >
> >>>>>>>> > Well, I think there's the cost of broken code *once* now, vs.
> the
> >>>>>>>> > (smaller)
> >>>>>>>> > cost for users in all future.
> >>>>>>>> > I'm still strongly in favor of breaking now, and having a
> simpler
> >>>>>>>> > model
> >>>>>>>> > going forward.
> >>>>>>>>
> >>>>>>>> I'm definitely in favor of breaking now in the case of RecordDecl
> vs
> >>>>>>>> CXXRecordDecl. I think having recordDecl match CXXRecordDecl is a
> bug
> >>>>>>>> given that there's no way to match a RecordDecl.
> >>>>>>>>
> >>>>>>>> I would also be totally in favor of being consistent if we were
> >>>>>>>> starting from scratch. I'm very, very weakly opposed to breaking
> more
> >>>>>>>> user's code than we have to in order to get usable matchers
> because it
> >>>>>>>> seems gratuitous. Breaking code to get something that works seems
> >>>>>>>> reasonable. Breaking code that already works just to change the
> name
> >>>>>>>> for consistency elsewhere, I'm a bit less keen on. But the fact
> that
> >>>>>>>> we already can break user's code at-will because of the reliance
> on
> >>>>>>>> the AST nodes makes me think it may be the right approach for the
> best
> >>>>>>>> API, since that's what I would want if we were starting from
> scratch.
> >>>>>>>>
> >>>>>>>> Okay, I'm convinced. I think we should rename the type and node
> >>>>>>>> matchers (not traversal and narrowing matchers) to match the AST
> node
> >>>>>>>> names in all cases. We can document the breakage in the release
> notes,
> >>>>>>>> but (hopefully) only have to do this dance one time instead of
> >>>>>>>> spreading the pain out as it happens to eventually get to the same
> >>>>>>>> place anyway.
> >>>>>>>
> >>>>>>>
> >>>>>>> Yea, people who want more stability do use releases anyway.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Daniel, is this something you would be okay with? (I'm assuming
> >>>>>>>> Richard finds it acceptable based on previous comments from
> Manuel,
> >>>>>>>> but Richard, feel free to chime in.)
> >>>>>>>
> >>>>>>>
> >>>>>>> Offline conversation with Richard says that he is convinced.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ~Aaron
> >>>>>>
> >>>>>>
> >>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150916/0fbcd852/attachment-0001.html>