[cfe-dev] [PATCH] New matcher for MaterializeTemporaryExpression.

Mon Aug 20 17:08:07 PDT 2012

> - First, we have a very simple of of specifying the node matchers,
> which is very close to your aliasing: the VariadicDynCastAllOf thingy.
> I contemplated renaming that to NodeMatcher to make that more clear.

It doesn't matter how easy it is to specify them if the mechanism for
specifying them is not part of the API. `VariadicDynCastAllOf` is in
namespace `internal::`, so it isn't something that a user is even
allowed to use without making them feel dirty.

(writing this after writing the rest of the message: I guess,
reflecting on it, that one of my main points (mentioned in the tl;dr)
is that the mechanism for defining your own matchers (in terms of the
raw Clang AST API) needs to be part of the matcher API; most of the
rest of what I'm saying is suggestions for the design of a clean set
of powerful primitives for matching in terms of the raw Clang AST API)

> - The way to write a matcher would actually be astNode<Stmt,
> MaterializeTemporaryExpr>, because afaik we cannot statically figure
> out what our base type is

Yes, you can, because there are a finite number of base types, and a
given AST node can only be upcast to one of them.

> - if you allow users to alias, but don't provide a central location
> where to put those aliases, [...]

API _users_ don't care about modifying the library source code, they
care about getting things done and aren't going to submit a patch
(even if they did, would we want to add Random User's alias to the
public API?). Furthermore, I don't see why
"ASTMatcherSugar.h"/"ASTMatcherAliases.h" couldn't be  "a central
location to put those aliases".

> [...] we'll end up with matcher expressions that
> are harder to read and maintain, as users will provide their own names

This will happen anyway, unless you believe that the current set of
matchers is perfect and covers all use cases*. Plus, you can trivially
fix this with an evolving set of naming guidelines, like "aliases that
exactly match AST nodes should be named as the AST node but with the
first letter lowercased".

* I suspect (and of course could be wrong) that most of the people
that have used the current matcher API are people who also have the
ability to change it and add new matchers into the header; this masks
catastrophic crash-and-burn API failures ("this AST matching API can't
be used to match a MaterializeTemporaryExpr") since the API "user" can
go and *modify the API* to expose the functionality that they want. I
suspect that Sam's patch that started the thread stems from this. This
phenomenon (if in fact it is happening) is vicious since the people
who can fix the API don't have any problems using it, while all real
users of the API are left out in the cold.

Also, I prefer coherent, completely self-documenting, possibly
slightly lengthier names over incoherent names which send you running
back to the documentation (or worse, source code) every time you need
to use them. I will always remember `astNode<NamedDecl>` matches a
NamedDecl; can you remember the name of the matcher that matches
NamedDecl in the current API? I can't.

> - having node matchers for nodes for which none of the other matchers
> exist, is pretty useless - thus, I doubt that we'll add matchers for
> all of the 500+ classes of the AST

There are a small number of bases, and (AFAIK) each hierarchy has an
enum which uniquely identifies the node kind. So if you have

enum ASTNodeKind {
  Decl,
  Stmt,
  ...
};

Then the ASTNodeKind and the hierarchy-specific kind uniquely identify
a node type. Thus a single `astNode<T>` template can uniquely define a
matcher for _all_ of them in an amount of code proportional to the
number of bases, rather than the number of nodes. Putting all of this
together seems like a good fit for an `ASTNodeTraits<T>` class
template.

> When it come to documentation I'm also personally not a big believer
> to single point of truth, as context, presentation and repetition
> matters for learning.

Fair enough, I agree with that reasoning. I can personally attest
however to reading the documentation of one of the AST nodes an
thinking "it would be great if this had an example/more examples of
what this corresponds to in a program", and I'm pretty sure I'm not
alone. I guess I can now lodge the weaker complaint that it just seems
unfair to hoard these examples away from the logical place to go
looking for such examples: the documentation of the relevant AST node.

> That said, I'll soon go over the matchers and rename them to exactly
> match the AST nodes, which gives another good argument
> pro-orthogonality.

I think this will be a good first step, however, I've pretty much
settled on the belief that the API is fundamentally flawed if it
doesn't expose more plumbing. It seems like at least 2/3 of the
current matchers would be obviated by

1. be able to match any AST node I want with `astNode<T>`
2. be able to assert that an arbitrary member function on that node
has/doesn't have an arbitrary value, by doing something like
`astNode<UnaryOperator>(property(&UnaryOperator::getSubExpr).is_not(NULL))`

Looking at <http://clang.llvm.org/docs/LibASTMatchersReference.html>,
it seems like the majority of the matchers are obviated by just those
two things. Moreover, in the face of having to debug a problem, the
user can readily understand what the matcher is doing by relating it
to the underlying Clang documentation instead of having to dive into
the matcher implementation and *then* look at the underlying Clang
documentation. Being explicit about what is happening is a big win;
ultimately the users of this API are going to presumably be doing
something with the raw AST node, so it makes sense to allow them to
express their match in terms of methods on the raw AST nodes as well
instead of forcing them to learn a whole new vocabulary in order to
encode things that they already know how to express.

It is easy enough to implement 2. in terms of an arbitrary callable
(and thus you just have `property().is_not()` return a callable that
does the matching). This means that you can use any helper function
you want and if a library client is using C++11, they can use a lambda
too.

Another benefit of this approach is that you can dispense with all
those nasty macros!

tl;dr Focus should be on providing a small set of flexible and
reusable tools that can be assembled to do AST matching, possibly in
addition to a "sugar" API.

I apologize for the lengthy post.

--Sean Silva

On Mon, Aug 20, 2012 at 4:50 PM, Manuel Klimek <klimek at google.com> wrote:
> Moving to cfe-dev, as I think the design discussion is misplaced on a
> review thread for one added feature :)
>
> On Mon, Aug 20, 2012 at 1:14 PM, Sean Silva
> <reviews at llvm-reviews.chandlerc.com> wrote:
>>
>>   It seems like it would almost make more sense to move these great doc comments with great examples of source constructs that the AST node corresponds to into the documentation for the AST nodes themselves, and then have single header (say, "StmtMatchers.h") with a block comment at the top saying something to the tune of
>
> This obviously depends a lot on the answers to your later points, but
> under the condition that we actually want those matchers spelled out
> somewhere, I think having the documentation there is good, even if it
> might seem redundant. Requiring users to go through more layers of
> indirection for digging up files seems worse than the maintenance
> burden here.
>
>>   "each `Stmt` subclass has a corresponding matcher here, with a name which is the name of the AST node with the first letter lowercase rather than upper, so for example `MaterializeTemporaryExpr`'s corresponding matcher is `materializeTemporaryExpr`. If you are unsure about what a given `Stmt` subclass corresponds to at the source level, each corresponding `Stmt` subclass's doxygen comment contains a number of examples which should help you get a feel for what the AST node corresponds to at the source level."
>>
>>   And then you could just stamp out all of them. I'm not sure how you're determining when to add exact AST node matchers, but I can't see a compelling reason to not just add all of them at once (well, besides the ever-present lack of time :). Of course, all of what I said for `Stmt`'s could just as easily be said for `Decl`'s or other AST nodes.
>>
>>   Maybe you could make it even more uniform from an API point of view and have a matcher `astNode<T>` where `T` is any AST node and which matches that node exactly, so the `materializeTemporaryExpr` you have in this patch could be written as `astNode<MaterializeTemporaryExpr>`. I think that from an API point of view this is really clean, since then the user only has to learn one matcher in order to match any exact AST node they want (this also seems like a huge win from a documentation perspective as well, since you now only need to document one matcher in order to cover all exact AST node matchers).
>>
>>   I understand that the `astNode<T>` is a bit more verbose, but the clarity and API transparency that it affords is I think //much// more important, since you can recover the syntax sugar trivially with aliases. For example, suppose that I just want to match records; clang has an AST node for that, which is `RecordDecl`, so I know I just have to do `astNode<RecordDecl>` and that's it. However, with the current design, I need to wade through docs to find it (it doesn't seem to exist after wading and grepping), or worse be deceived and use `record()`, which doesn't match `RecordDecl`'s, but actually `CXXRecordDecl`'s!
>>
>>   Is there something that I'm missing that complicates doing it that way? Could you compare/contrast your current approach with the one I just described?
>>
>>   In a sense, I guess what my radar is picking up on here is that you have some "boilerplate" AST matchers here which correspond exactly with AST nodes, while other ones don't, like allOf(), yet this distinction doesn't appear to be being called out in documentation or well-factored within the code itself (or documentation). Also, it seems like a documentation Single Point Of Truth violation to have these great examples of what the AST node corresponds to here and not on the actual AST node---these examples would probably also be of interest to a person reading documentation of the AST node. Having the examples here makes perfect sense for matchers which are not matching exact AST nodes, though.
>
> So, you raise a very good point, and we have had that exact argument
> multiple times over the course of the AST matchers (the last time was
> I think when Daniel brought it up ;)
>
> It is far from a clear cut "obvious" design choice. As you have said,
> there are costs and benefits to both approaches, and in the end the
> question is the value you put on each of those points.
>
> I have a few comments on the points you raise (and why I might value
> them differently than you do):
> - First, we have a very simple of of specifying the node matchers,
> which is very close to your aliasing: the VariadicDynCastAllOf thingy.
> I contemplated renaming that to NodeMatcher to make that more clear.
> - The way to write a matcher would actually be astNode<Stmt,
> MaterializeTemporaryExpr>, because afaik we cannot statically figure
> out what our base type is
> - if you allow users to alias, but don't provide a central location
> where to put those aliases, we'll end up with matcher expressions that
> are harder to read and maintain, as users will provide their own names
> - having node matchers for nodes for which none of the other matchers
> exist, is pretty useless - thus, I doubt that we'll add matchers for
> all of the 500+ classes of the AST
>
> When it come to documentation I'm also personally not a big believer
> to single point of truth, as context, presentation and repetition
> matters for learning.
>
> That said, I'll soon go over the matchers and rename them to exactly
> match the AST nodes, which gives another good argument
> pro-orthogonality. I still don't think it quite sways the balance, but
> I'm open to more opinions :)
>
> Thoughts?
> /Manuel