[PATCH] D116328: [ast-matchers] Add hasSubstatement() traversal matcher for caseStmt(), defaultStmt(), labelStmt()

Wed Jan 5 12:35:10 PST 2022

aaron.ballman added a reviewer: sammccall.
aaron.ballman added a subscriber: sammccall.
aaron.ballman added a comment.

In D116328#3223268 <https://reviews.llvm.org/D116328#3223268>, @LegalizeAdulthood wrote:

> In D116328#3221995 <https://reviews.llvm.org/D116328#3221995>, @aaron.ballman wrote:
>
>>> Previously if you wanted to match the statement associated with a case, default, or labelled statement, you had to use hasDescendant.
>>
>> This isn't true. You can use `has()` to do direct substatemematching, and I'm wondering whether we need this new matcher at all. e.g., https://godbolt.org/z/K5sYP757r
>
> Well, the whole point of this was that previous review comments were complaining that I was using an "expensive" matcher
> instead of one that got right to the point.  (What's the difference between `has` and `hasDescendent` anyway?)

`has()` matches the direct descendant AST node, whereas `hasDescendant()` matches *any* descendant AST node. So `hasDescendant()` actually is expensive -- it can traverse the AST more times than you'd expect. I don't know of any performance concerns with `has()` though.

e.g., https://godbolt.org/z/n3adEz4qW (the first query matches nothing, the second query has a match)

> I looked at the implementation of "has" and it has a huge amount of machinery underneath it.  I drilled in as follows (my ToT hash is 773ab3c6655f4d2beec25bb3516b4d4fe2eea990 <https://reviews.llvm.org/rG773ab3c6655f4d2beec25bb3516b4d4fe2eea990>):
> ASTMatchers.h:3391 declares `has` as an instance of `internal::ArgumentAdaptingMatcherFunc<internal::HasMatcher>`
> `ArgumentAdaptingMatcherFun` appears just to be a factory mechanism for creating the appropriate matcher.
> `HasMatcher` seems to do the actual matching
> `HasMatcher<T, ChildT>::matches` delegates to `ASTMatchFinder::matchesChildOf(const T &Node, ...)`
> `ASTMatchFinder::matchesChildOf(const T &Node, ...)` delegates to another overload of `matchesChildOf`(const DynTypedNode &Node, ...)` after asserting a static type relationship (ASTMatchersInternal.h:762)
> `ASTMatchFinder::matchesChildOf(const DynTypedNode &Node, ...)` is a virtual function with one implementation in `MatchASTVisitor` (ASTMatchFinder.cpp:659)
> `MatchASTVisitor::matchesChildOf` delegates to `memoizedMatchesRecursively` (ASTMatchFinder.cpp:664)
> For non-memoizable nodes, `MatchASTVisitor::memoizedMatchesRecursively` delegates to `matchesRecursively` (ASTMatchFinder.cpp:599)
> `MatchASTVIsitor::matchesRecursively` instantiates a `ASTNodeNotSpelledInSourceScope` and a `MatchChildASTVisitor` upon which it calls `findMatch` (ASTMatchFinder.cpp:645)
> `MatchChildASTVisitor::findMatch` does a series of `DynNode.get<T>` checks and delegates to the appropriate overload of `MatchChildASTVisitor::traverse`, in our case it would be the one for `Stmt`
> `MatchChildASTVisitor::traverse<T>(const T &Node)` delegates to `MatchChildASTVisitor::match<T>(const T &Node)`
> `MatchChildASTVisitor::match<T>(const T &Node)` instantiates a `BoundNodesTreeBuilder` and calls `match` on the held `Matcher`.
>
> Honestly, at this point, I become lost in manually tracing the code through "go to definition" in my IDE.

I'm impressed you got this far -- the AST matching machinery under the hood is really quite complex! :-)

> So I switched my unit test for `HasCaseSubstmt` to use `has` and drilled in through the debugger.
> What I saw echoes the above, although I took the wrong branch in my manual analysis in `memoizedMatchesRecursively`, it goes down the memoization path.

Phew, that's good -- I would expect `has` to use the memoization pass.

> However, the whole point of memoizing a result is because that result was expensive to compute and the memoization saves time on subsequent queries
> because the result has been memorized.  In this case, it's a simple getter on `CaseStmt` to obtain the node against which you want to match, so I don't
> see the memoization as having any particular benefit.  (The inner matcher to `hasSubstatement` may be expensive and could be memoized.)  For results
> obtain by the visitor pattern, I can see why you'd want to memoize them as there is lots of code being executed to implement the Visitor pattern.

My understanding (and my recollection here may be wrong) is that `has()` can memoize the results because the sub-matcher will always be matched against the immediate direct descendant AST node, but you can't get the same behavior from `hasDescendant()` or `hasAncestor()` because any of the descendant (or ancestor) nodes can be matched, so you have to traverse them instead of memoizing. I'm adding @sammccall in case he has information about the performance characteristics or thoughts about the proposed matcher in general.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116328/new/

https://reviews.llvm.org/D116328