[cfe-dev] [PATCH] New matcher for MaterializeTemporaryExpression.

Tue Aug 21 07:17:45 PDT 2012

On Mon, Aug 20, 2012 at 5:08 PM, Sean Silva <silvas at purdue.edu> wrote:
>> - First, we have a very simple of of specifying the node matchers,
>> which is very close to your aliasing: the VariadicDynCastAllOf thingy.
>> I contemplated renaming that to NodeMatcher to make that more clear.
>
> It doesn't matter how easy it is to specify them if the mechanism for
> specifying them is not part of the API. `VariadicDynCastAllOf` is in
> namespace `internal::`, so it isn't something that a user is even
> allowed to use without making them feel dirty.
>
> (writing this after writing the rest of the message: I guess,
> reflecting on it, that one of my main points (mentioned in the tl;dr)
> is that the mechanism for defining your own matchers (in terms of the
> raw Clang AST API) needs to be part of the matcher API; most of the
> rest of what I'm saying is suggestions for the design of a clean set
> of powerful primitives for matching in terms of the raw Clang AST API)

Ah, here we fully agree (both that we need a clean API for users to
define their own matchers and that we don't have one currently). The
question is a) how exactly that would look and b) how to prioritize it
with the other work we're doing (like clang-format, etc).

>> - The way to write a matcher would actually be astNode<Stmt,
>> MaterializeTemporaryExpr>, because afaik we cannot statically figure
>> out what our base type is
>
> Yes, you can, because there are a finite number of base types, and a
> given AST node can only be upcast to one of them.

Ah, good point. That's an excellent idea!

>> - if you allow users to alias, but don't provide a central location
>> where to put those aliases, [...]
>
> API _users_ don't care about modifying the library source code, they
> care about getting things done and aren't going to submit a patch
> (even if they did, would we want to add Random User's alias to the
> public API?). Furthermore, I don't see why
> "ASTMatcherSugar.h"/"ASTMatcherAliases.h" couldn't be  "a central
> location to put those aliases".

I interpret that in context of your first paragraph, and see now disagreement :)

>> [...] we'll end up with matcher expressions that
>> are harder to read and maintain, as users will provide their own names
>
> This will happen anyway, unless you believe that the current set of
> matchers is perfect and covers all use cases*. Plus, you can trivially
> fix this with an evolving set of naming guidelines, like "aliases that
> exactly match AST nodes should be named as the AST node but with the
> first letter lowercased".

I was saying this in the context of the assumption of *not* providing
*any* common aliases ourselves (because we're already doing this I
didn't see the point of your argument otherwise).

> * I suspect (and of course could be wrong) that most of the people
> that have used the current matcher API are people who also have the
> ability to change it and add new matchers into the header; this masks
> catastrophic crash-and-burn API failures ("this AST matching API can't
> be used to match a MaterializeTemporaryExpr") since the API "user" can
> go and *modify the API* to expose the functionality that they want. I
> suspect that Sam's patch that started the thread stems from this. This
> phenomenon (if in fact it is happening) is vicious since the people
> who can fix the API don't have any problems using it, while all real
> users of the API are left out in the cold.

We have quite a few people using this internally who're definitely not
"left in the cold". While I agree with your general sentiment, I think
your argument makes this look a little worse than it is :)

> Also, I prefer coherent, completely self-documenting, possibly
> slightly lengthier names over incoherent names which send you running
> back to the documentation (or worse, source code) every time you need
> to use them. I will always remember `astNode<NamedDecl>` matches a
> NamedDecl; can you remember the name of the matcher that matches
> NamedDecl in the current API? I can't.

As I said, we're about to change this. It was a bad idea. The matcher
for NamedDecl will be namedDecl. The problem is that NamedDecl doesn't
only match named decls - it also matches some anonymous decls. The way
going forward is that we'll propose to change the AST node first and
then adapt the matcher, not the other way round. If you want the full
history of why we thought that's a good idea, we should talk over a
beer :)

>> - having node matchers for nodes for which none of the other matchers
>> exist, is pretty useless - thus, I doubt that we'll add matchers for
>> all of the 500+ classes of the AST
>
> There are a small number of bases, and (AFAIK) each hierarchy has an
> enum which uniquely identifies the node kind. So if you have
>
> enum ASTNodeKind {
>   Decl,
>   Stmt,
>   ...
> };
>
> Then the ASTNodeKind and the hierarchy-specific kind uniquely identify
> a node type. Thus a single `astNode<T>` template can uniquely define a
> matcher for _all_ of them in an amount of code proportional to the
> number of bases, rather than the number of nodes. Putting all of this
> together seems like a good fit for an `ASTNodeTraits<T>` class
> template.

My point was that I don't think it makes sense to write out the
aliases for matchers in those cases. I'm completely not opposed to
having a better way to specify new matchers *in addition* to the
pre-defined ones. In fact, I agree it's something we want.

>> When it come to documentation I'm also personally not a big believer
>> to single point of truth, as context, presentation and repetition
>> matters for learning.
>
> Fair enough, I agree with that reasoning. I can personally attest
> however to reading the documentation of one of the AST nodes an
> thinking "it would be great if this had an example/more examples of
> what this corresponds to in a program", and I'm pretty sure I'm not
> alone. I guess I can now lodge the weaker complaint that it just seems
> unfair to hoard these examples away from the logical place to go
> looking for such examples: the documentation of the relevant AST node.

Yep. The problem (or upside) is that when writing new matchers it's
really easy to write some tests that verify that they match what you
think they should match. I think copying the examples to the AST nodes
is a great idea. I'm planning to put some of my "community service"
time into that over the next year anyway.

>> That said, I'll soon go over the matchers and rename them to exactly
>> match the AST nodes, which gives another good argument
>> pro-orthogonality.
>
> I think this will be a good first step, however, I've pretty much
> settled on the belief that the API is fundamentally flawed if it
> doesn't expose more plumbing. It seems like at least 2/3 of the
> current matchers would be obviated by

I completely disagree with the statement that the API is fundamentally
flawed unless this is in.
As I said, I am fully in agreement that we need something like this,
and we have discussed that multiple times.

> 1. be able to match any AST node I want with `astNode<T>`
> 2. be able to assert that an arbitrary member function on that node
> has/doesn't have an arbitrary value, by doing something like
> `astNode<UnaryOperator>(property(&UnaryOperator::getSubExpr).is_not(NULL))`

I also completely disagree that this would obviate what we have - note
that from my counting currently there are exactly 12 matchers that
match your pattern (2). For the others you'd at least need to
implement "property" in special ways.

> Looking at <http://clang.llvm.org/docs/LibASTMatchersReference.html>,
> it seems like the majority of the matchers are obviated by just those
> two things. Moreover, in the face of having to debug a problem, the
> user can readily understand what the matcher is doing by relating it
> to the underlying Clang documentation instead of having to dive into
> the matcher implementation and *then* look at the underlying Clang
> documentation. Being explicit about what is happening is a big win;

The being explicit part is one other thing we disagree upon. Being
explicit in this case trade-offs against easy-to-read.

> ultimately the users of this API are going to presumably be doing
> something with the raw AST node, so it makes sense to allow them to
> express their match in terms of methods on the raw AST nodes as well
> instead of forcing them to learn a whole new vocabulary in order to
> encode things that they already know how to express.

Here we agree ...

> It is easy enough to implement 2. in terms of an arbitrary callable
> (and thus you just have `property().is_not()` return a callable that
> does the matching). This means that you can use any helper function
> you want and if a library client is using C++11, they can use a lambda
> too.
>
> Another benefit of this approach is that you can dispense with all
> those nasty macros!
>
> tl;dr Focus should be on providing a small set of flexible and
> reusable tools that can be assembled to do AST matching, possibly in
> addition to a "sugar" API.

So, to summarize:
- we agree that we want a better way to specify matchers; getting rid
of the macros would be great
- we completely disagree on how important that is. The cool thing to
me is that having the pre-defined stuff builds an abstraction, for
which we can freely change the implementation. As long as we don't
change how the matchers can be combined, incremental improvement is
possible, while getting impact out of the usefulness without having to
find the "perfect" solution up front

> I apologize for the lengthy post.

Would be one line shorter if you didn't apologize ;)

Cheers,
/Manuel

> --Sean Silva
>
> On Mon, Aug 20, 2012 at 4:50 PM, Manuel Klimek <klimek at google.com> wrote:
>> Moving to cfe-dev, as I think the design discussion is misplaced on a
>> review thread for one added feature :)
>>
>> On Mon, Aug 20, 2012 at 1:14 PM, Sean Silva
>> <reviews at llvm-reviews.chandlerc.com> wrote:
>>>
>>>   It seems like it would almost make more sense to move these great doc comments with great examples of source constructs that the AST node corresponds to into the documentation for the AST nodes themselves, and then have single header (say, "StmtMatchers.h") with a block comment at the top saying something to the tune of
>>
>> This obviously depends a lot on the answers to your later points, but
>> under the condition that we actually want those matchers spelled out
>> somewhere, I think having the documentation there is good, even if it
>> might seem redundant. Requiring users to go through more layers of
>> indirection for digging up files seems worse than the maintenance
>> burden here.
>>
>>>   "each `Stmt` subclass has a corresponding matcher here, with a name which is the name of the AST node with the first letter lowercase rather than upper, so for example `MaterializeTemporaryExpr`'s corresponding matcher is `materializeTemporaryExpr`. If you are unsure about what a given `Stmt` subclass corresponds to at the source level, each corresponding `Stmt` subclass's doxygen comment contains a number of examples which should help you get a feel for what the AST node corresponds to at the source level."
>>>
>>>   And then you could just stamp out all of them. I'm not sure how you're determining when to add exact AST node matchers, but I can't see a compelling reason to not just add all of them at once (well, besides the ever-present lack of time :). Of course, all of what I said for `Stmt`'s could just as easily be said for `Decl`'s or other AST nodes.
>>>
>>>   Maybe you could make it even more uniform from an API point of view and have a matcher `astNode<T>` where `T` is any AST node and which matches that node exactly, so the `materializeTemporaryExpr` you have in this patch could be written as `astNode<MaterializeTemporaryExpr>`. I think that from an API point of view this is really clean, since then the user only has to learn one matcher in order to match any exact AST node they want (this also seems like a huge win from a documentation perspective as well, since you now only need to document one matcher in order to cover all exact AST node matchers).
>>>
>>>   I understand that the `astNode<T>` is a bit more verbose, but the clarity and API transparency that it affords is I think //much// more important, since you can recover the syntax sugar trivially with aliases. For example, suppose that I just want to match records; clang has an AST node for that, which is `RecordDecl`, so I know I just have to do `astNode<RecordDecl>` and that's it. However, with the current design, I need to wade through docs to find it (it doesn't seem to exist after wading and grepping), or worse be deceived and use `record()`, which doesn't match `RecordDecl`'s, but actually `CXXRecordDecl`'s!
>>>
>>>   Is there something that I'm missing that complicates doing it that way? Could you compare/contrast your current approach with the one I just described?
>>>
>>>   In a sense, I guess what my radar is picking up on here is that you have some "boilerplate" AST matchers here which correspond exactly with AST nodes, while other ones don't, like allOf(), yet this distinction doesn't appear to be being called out in documentation or well-factored within the code itself (or documentation). Also, it seems like a documentation Single Point Of Truth violation to have these great examples of what the AST node corresponds to here and not on the actual AST node---these examples would probably also be of interest to a person reading documentation of the AST node. Having the examples here makes perfect sense for matchers which are not matching exact AST nodes, though.
>>
>> So, you raise a very good point, and we have had that exact argument
>> multiple times over the course of the AST matchers (the last time was
>> I think when Daniel brought it up ;)
>>
>> It is far from a clear cut "obvious" design choice. As you have said,
>> there are costs and benefits to both approaches, and in the end the
>> question is the value you put on each of those points.
>>
>> I have a few comments on the points you raise (and why I might value
>> them differently than you do):
>> - First, we have a very simple of of specifying the node matchers,
>> which is very close to your aliasing: the VariadicDynCastAllOf thingy.
>> I contemplated renaming that to NodeMatcher to make that more clear.
>> - The way to write a matcher would actually be astNode<Stmt,
>> MaterializeTemporaryExpr>, because afaik we cannot statically figure
>> out what our base type is
>> - if you allow users to alias, but don't provide a central location
>> where to put those aliases, we'll end up with matcher expressions that
>> are harder to read and maintain, as users will provide their own names
>> - having node matchers for nodes for which none of the other matchers
>> exist, is pretty useless - thus, I doubt that we'll add matchers for
>> all of the 500+ classes of the AST
>>
>> When it come to documentation I'm also personally not a big believer
>> to single point of truth, as context, presentation and repetition
>> matters for learning.
>>
>> That said, I'll soon go over the matchers and rename them to exactly
>> match the AST nodes, which gives another good argument
>> pro-orthogonality. I still don't think it quite sways the balance, but
>> I'm open to more opinions :)
>>
>> Thoughts?
>> /Manuel