[PATCH] D116518: [ast-matchers] Add hasSubstatementSequence matcher

Richard via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 11 19:30:26 PST 2022


LegalizeAdulthood added inline comments.


================
Comment at: clang/include/clang/ASTMatchers/ASTMatchers.h:5435-5442
+/// Matches two consecutive statements within a compound statement.
+///
+/// Given
+/// \code
+///   { if (x > 0) return true; return false; }
+/// \endcode
+/// compoundStmt(hasSubstatementSequence(ifStmt(), returnStmt()))
----------------
aaron.ballman wrote:
> aaron.ballman wrote:
> > LegalizeAdulthood wrote:
> > > LegalizeAdulthood wrote:
> > > > aaron.ballman wrote:
> > > > > LegalizeAdulthood wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > LegalizeAdulthood wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > How do we extend this to support testing arbitrary sequences of statements? (If this supports two statements, someone will find a need for three, etc).
> > > > > > > > Yeah, I was wondering that too.  I didn't see (but didn't look extensively) any support for variadic matchers taking a parameter pack.
> > > > > > > > 
> > > > > > > > I stopped at 2 because this suits my immediate needs with `readability-simplify-boolean-expr` where I have to manually loop over `CompoundStmt` matches in order to verify that the `if (x) return true; return false;` constructs consist of two adjacent statements.
> > > > > > > I don't think we have any variadic matchers yet to model on, but I think if this is only needed for one check, we can add it in the current form as a local matcher for that check. Whenever we figure out how to give the better interface through the static and dynamic matchers, then we can figure out how to lift this to the general matcher interface.
> > > > > > > 
> > > > > > > WDYT?
> > > > > > I don't think it is harmful to make it visible to all and I think it is helpful.
> > > > > > Defining it in ASTMatchers, enables using it in `clang-query`, for instance.
> > > > > > 
> > > > > I contend this is not a generally useful matcher without supporting an arbitrary number of statements. Even then, to be honest, it's questionable whether there's sufficient need for this to be a public matcher. Typically, we don't expose a new public matcher unless there's a general need for it, and this one is already pretty borderline even if it's fully generalized. This file is *very* expensive to instantiate and it gets used in a lot of places, so that's one of the primary reasons we don't expose matchers from here unless they're generally useful.
> > > > > 
> > > > > Unless @klimek or another AST matcher code owner thinks this is useful in general (to multiple checks people are likely to want to write even if they're not contributing the checks to the community), I'm opposed to exposing this as-is. However, adding it as a private matcher for the check that needs the limited functionality would get no opposition from me (and this implementation looks correct as well).
> > > > My thoughts:
> > > > 
> > > > I'm OK with moving it as a private matcher as it would simplify a big chunk of code
> > > > in the simplify-boolean-expr check.
> > > > 
> > > > I agree that ASTMatchers.h is used all over the place and at a minimum causes a
> > > > huge amount of rebuild.
> > > > 
> > > > Regarding the general usefulness of the matcher, let me elaborate on my motivation
> > > > for adding this matcher.
> > > > 
> > > > I see it from the viewpoint of a developer of checks/refactorings.
> > > > 
> > > > I really want us to get to a world where a complete refactoring can be specified as
> > > > a script input to a refactoring tool.  Eliminating the need to write C++ that directly
> > > > manipulates the AST and the edits will lower the bar for entry for other people
> > > > writing automated changes to their codebases.
> > > > 
> > > > Since the last time I worked in the clang codebase, the transformer library has
> > > > been added.  With this new library, I think all we're missing is a parser that matches
> > > > the input script to the necessary calls to code in the transformer library.  Once we
> > > > do this, I think the need for more and higher-level matchers will become evident
> > > > and a matcher like this is the only way you can specify that a statement Y
> > > > immediately follows statement X purely through matchers.  Current matchers
> > > > don't even let you specify relative ordering of statements.  The best you can do
> > > > is assert that a block contains the statements of interest and then you must write
> > > > your own C++ code to walk the block and determine if they fit your actual
> > > > match criteria.
> > > Also, regarding a variadic version of this matcher, I'd be curious to try it out
> > > just from a learning/programming perspective, but I'm not sure how I'd go about
> > > it.  Suggestions on a plan of attack would be most welcome! `:)`
> > > 
> > > I'm OK with moving it as a private matcher as it would simplify a big chunk of code in the simplify-boolean-expr check.
> > 
> > Thank you!
> > 
> > > Regarding the general usefulness of the matcher, let me elaborate on my motivation for adding this matcher.
> > 
> > Thanks for sharing this, and FWIW, I agree with what you're saying. I'm curious to see how Stencil (CC @ymandel) changes the landscape in this area, but I think there's still room for improvement in the AST matchers as well. If we found a way to fully generalize the matcher from here, I'd be happy to have it as a first-class AST matcher. My only concern with the current form is that it's awkward for cases where you want three or more statements in a row. For example, you might want to find an old-school swap implementation and replace it with a call to `std::swap()`, and one form of that would look for three statements in a row.
> I think `mapAnyOf()` is a fully variadic matcher that even has dynamic matcher support, so that may be worth looking at. The `traverse()` match is definitely fully variadic, but it has no dynamic matcher support. FWIW, I think it's acceptable to add the matcher even if we can't figure out the dynamic interface in the initial patch, but it'd be really nice if we could make the dynamic matcher work in the initial offering because that means it's a usable feature within clang-query (which is how a lot of folks explore matcher behavior when writing tidy checks).
> I'm curious to see how Stencil changes the landscape

Yes, I've been talking with Yitzhak over email about this.  As we move
more towards something like Stencil encompassing more checks, I
think we'll find additional matchers that are missing to enable Stencil
for various cases.  I've already had to add a bunch of matchers for
the checks I've written and they're not particularly complicated.  The
existing set of matchers are driven from need, so as new needs arise
I expect new matchers to arise.

For matchers to be usable by any sort of scripted approach, they have
to be registered with the parsing framework.  I am not sure if such
registration is an internal detail or if you can register your own custom
matchers with the parsing framework.

I investigated making this a variadic matcher and I'm going to have
to spend more time understanding the matcher framework implementation
before I'm going to be able to implement it.  There's quite a bit of
machinery under the hood.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116518/new/

https://reviews.llvm.org/D116518



More information about the cfe-commits mailing list