[cfe-dev] RFC: Easier AST Matching by Default

Mon Jun 29 16:36:50 PDT 2020

David~

I really like this proposal.  I would greatly like a major improvement in
matchers handling of Types as well.

https://gcc.godbolt.org/z/w5jusg

The only solution I could find involves peeling out of the typeloc space to
ask about implicitness in the decl space.

declaratorDecl(unless(isImplicit()),
  hasTypeLoc(typeLoc(loc(qualType(elaboratedType(namesType(
typedefType()))))).bind("loc"))
)

I know a slight shift from the current topic, but my original support for
this change was based on my (false) belief that it also addressed implicit
TypeLocs in a more principled fashion.

Matt

On Mon, Jun 29, 2020 at 7:22 PM David Rector <davrecthreads at gmail.com>
wrote:

> Stephen you’ve raised a great point about how the user should be able to
> interface with Stmt nodes, and your clang-query idea is absolutely great, I
> love it.
>
> But I don’t think your proposed solution is broad enough, even for your
> own use case.  Below I give an easy, general solution that would work for
> you and everyone else.  (This clarifies & applies to Stephen’s case the
> points I discussed with Richard earlier.)  Please point out where you see a
> problem with this; that would at least help clarify the discussion for
> others.
>
> To summarize the issue, the now famous "B-42" example:
> ```
> struct B { B(int); }
> B func1() { return 42; }
> ```
> This currently get matches for `hasReturnValue(exprWCleanups())`; you want
> it to match `hasReturnValue(integerLiteral())` under a new "beginner"
> setting.
>
> But if we’re going to introduce a new, easier setting, why not go all out,
> and let it *also* match `hasReturnValue(cxxConstructExpr())` OR
> hasReturnValue(materializeTemporaryExpr()) OR, yes, even the original
> `hasReturnValue(exprWithCleanups())`?  I believe this is what Richard
> originally suggested.
>
> We should call this setting "easy" mode — and I would probably agree with
> you that it *should* be made the default, for *everyone*’s sake, not just
> beginners - more on that later.
>
> This is how it could be implemented, three steps:
>
> 1. Add Expr::isImplicit(), Expr::desalt(), and Expr::getAs<T>(), the
> latter two similar but, as Richard pointed out, anti-analogous to
> Type::desugar() and Type::getAs<T>().  (I support introducing these in the
> main AST nodes, rather than only in ASTMatchers, so that other users of the
> AST can inteface with nodes in the same way ASTMatchers do, thereby
> avoiding a bifurcation in how AST nodes are interpreted and used.)
> ```
> class Expr : public Stmt {
> //…
> public:
> bool isImplicit() const {
> //…Dispatch to the most derived type’s isImplicit(), and be sure they all
> define one.
> }
>
> /// Whereas Type::desugar peels off meaningless syntax ("sugar") to get at
> underlying semantics, this
> /// peels off invisible semantics ("salt") to get at underlying syntax.
>  (Proposed name, open to suggestions.)
> Expr * desalt() const {
> if (!this->isImplicit())
> return nullptr;
> assert(children.end() = children.begin() + 1 &&
>   "Expected implicit nodes to have exactly one child");
> return *children.begin();
> }
>
>
> template<typename T>
> T *getAs() const {
> if (isa<T>(this))
> return cast<T>(this);
> return desalt()->getAs<T>();
> }
> };
> ```
>
> 2. In the ASTMatchers implem, define BoundNodes::getNodeAs<T>() in terms
> of Stmt::getAs<T>().  I think it would be as simple as this:
> ```
>   class BoundNodes {
> //…
>     template <typename T>
>     const T *getNodeAs(StringRef ID) const {
> if (std::is_base_of<Expr, T>::value) {
> if (Expr *E = MyBoundNodes.getNodeAs<Expr>(ID))
> return E->getAs<T>();
> return nullptr;
> }
>       return MyBoundNodes.getNodeAs<T>(ID);
> }
>     };
> ```
>
> At this point, if on the B-42 example you call
>
> `returnStmt(hasReturnValue(expr().bind("theReturnVal")))`
>
> when in "easy" mode, for theReturnVal you should be able to
> getNodeAs<IntegerLiteral>(), getNodeAs<ExprWCleanups>(),
> getNodeAs<CXXConstructExpr>() etc — all would be nonnull, users can take
> their pick.  And btw, if you wanted to get the other CXXConstructExpr,
> since there are two, you would call
> firstConstructExpr->desalt()->getAs<CXXConstructExpr>()).  And, if you only
> wanted matches that were non-implicit, you'd use ignoreImplicit as usual —
> but you would have to specify this.  This would force the user to reckon
> with the existence of what I guess I’m calling "semantic salt" over their
> syntax, without forcing them to manually dust it all off to get to a simple
> IntegerLiteral underneath.
>
> So the user retains maximum flexibility, but gains maximum
> tightness/expressiveness, in this new mode.
>
> 3. The last part of the puzzle: how to allow
> `hasReturnValue(integerLiteral())` to match the "return 42;" in the B-42
> example.  From perusing ASTMatchersInternal.h, it seems you would just want
> to change any dyn_cast<SomeExpr>(E) to E->getAs<T>() when in "easy" mode.
> If a generic template is used somewhere, specialize it for Exprs to do
> this, same as we did for getNodeAs<T>().
>
> Then, it should be able to work as you suggested — correct me if I’m wrong.
>
> What is more, as you suggested, this really isn’t just a beginner mode —
> the experts would benefit too:
>
> Suppose in the future we find we need more than just ExprWCleanups — we
> need an extra implicit ExprWExtraCleanups that wraps the ExprWCleanups and
> everything else.  Or we need new inner implicit nodes.  If an expert used
> "easy" mode and getNodeAs<ExprWCleanups>(), and getAs<T>() whenever they
> sought to navigate to a child of an implicit node, he/she would not need to
> change her code at all.  Only if she required that each match was exact
> (call such a mode "directMatchOnly" perhaps) would she need to do a lot of
> maintenance whenever a new implicit node is introduced.
>
> That’s why I would probably agree with Stephen that "easy" mode should
> probably be the default — the user should have to call "directMatchOnly" to
> opt out.  But I could see the other side of it too.
>
> Bottom line: you’ve raised a great but subtle point, doggedly pursued it,
> but now, let’s just get rid of the half measures — let’s go all out with a
> solution that is easy and lifts all boats.
>
> If there is some problem this way of doing things won’t solve, please
> point it out — that would at least clarify the discussion for others who
> perhaps can help better than I can.  Thanks again for your efforts,
>
> - Dave
>
> On Jun 28, 2020, at 7:07 PM, Stephen Kelly via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>
> On 24/06/2020 21:26, Yitzhak Mandelbaum wrote:
>
> My 2cents:
>
>
> Thanks for responding about the matcher output from clang-query.
>
>
> I mostly agree with Richard and Sam. My only concern w/ Richard's proposal
> is that the hybrid model may be even more confusing to the beginner, since
> it may be harder to predict how a given matcher will match against an AST.
> The system will be more forgiving, yet it becomes harder to build a mental
> model of how it works.
>
>
> In addition to making clang-query output relatable information, I agree
> with what you wrote here about creating a mental model.
>
>
> I think we'd need to be sure we can concisely explain how this mode works,
> without having to reference any code/algorithms. For example, is there a
> deterministic elaboration from any unadorned matcher to a matcher with
> explicit `traverse` annotations? That would allow us to give some examples
> that illuminate what's happening. Otherwise, I'd prefer we just go back to
> the original default for the library. I'm fine if beginner-oriented tools
> want to start with "IgnoreUnless..." as their default. It sounds like
> Stephen is amenable to this approach as well from his last reply.
>
>
> I think reversing the change of default is a worse outcome, but I think
> it's better than the current state of unhappiness from people.
>
> Further development can't happen until this thread is resolved I think.
>
> However, if the default is changed back, I think there is less reason to
> add the matcher output feature to clang-query, because it will either need
> to
>
> * not show ignoring*() matchers in its output
>
> * show output based on each of (and none of) the ignoring*() matchers
> being applied
>
> I think that would be too noisy and I don't think it's a good thing to
> require newcomer users to immediately create a mental model which requires
> those matchers.
>
>
> re: context-sensitive suggestions.
> Stephen, if you drop the need to suggest all possible matchers at that
> point and only provide (beginner) helpful suggestions, does your concern go
> away?
>
>
> It doesn't change my concern.
>
> I don't know how you would determine what should be in the output and what
> shouldn't. Also, this content is not just for beginners. I use the feature
> still.
>
> I don't think this is the thread for details such as your other question.
>
>
> Overall, I'm pessimistic about making the current matchers easier for
> users.
>
>
> I think if AST Matchers are not going to be removed, we should make them
> easier for users.
>
>
> I think the AST is challenging for beginners and band aids will not solve
> that and can often make it worse. If we want beginner-friendly tooling (and
> I do!) we need to admit its cost and then invest in it.
>
>
> Yes, I have started such tooling, but there is much more that can be done.
>
> But, as in this thread, I don't think this is fully a tooling issue. It's
> also about discoverability and that means making it easier for users to
> create a mental model that doesn't require them to put ignoringImplicit()
> everywhere.
>
>
> Thanks Yitzhak for your input. I remain interested in feedback/input from
> others on the content I quote below.
>
>
> Thanks,
>
> Stephen.
>
>
> On Mon, Jun 22, 2020 at 5:51 PM Stephen Kelly via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>>
>> The main reason I'm not supportive of this idea is that I want to make
>> clang-query tell the user what matchers they can use with the nodes in the
>> binding list.
>>
>> For example, if I write
>>
>>  returnStmt()
>>
>> in clang-query it should tell me that I can write
>>
>>  hasReturnValue(integerLiteral())
>>
>> inside the expr():
>>
>>  http://ce.steveire.com/z/9EF8x0
>>
>> That's how I as a newcomer quickly discover hasReturnValue() and
>> integerLiteral().
>>
>> You have no idea how overwhelming
>>
>>  http://clang.llvm.org/docs/LibASTMatchersReference.html
>>
>> is to someone who has zero familiarity with the clang AST.
>>
>> But the above link is interactive and it aids *discovery* which is what
>> my talk was about and which is what I'm trying to make easier with AST
>> Matchers. The above suggested clang-query feature (not upstream yet) tells
>> me at every step which matchers I can use in the context of what I am
>> trying to achieve.
>>
>> However, if all of the implicit nodes should also appear in that output,
>> and if in
>>
>> `returnStmt(hasReturnValue(expr().bind("theReturnVal")))`
>>
>> the expr() should match the exprWithCleanups(), then I don't know how to
>> make that feature work. If the expr() is the exprWithCleanups() then the
>> tool definitely shouldn't tell the user they can write integerLiteral()
>> there. The output for implicit nodes would add lots of noise and would have
>> to be neutral in terms of what it recommends.
>>
>> The entire point of what I'm trying to do is not present the user with
>> exprWithCleanups() and let them achieve their goals without it. I don't
>> know if that's possible with the ideas floating around at the moment about
>> making AST Matchers present all of the implicit nodes.
>>
>> But, if making IgnoreUnlessSpelledInSource non-default means that I can
>> continue work toward that goal (and eg ignore template instantiations and
>> other implicit nodes which are not skipped yet), then maybe that's a viable
>> way forward. So, that's my preference now I think.
>>
>> That way, people who want expert mode get it by default, and newcomers
>> (and anyone else who doesn't want to write ignoringImplicit() everywhere)
>> can relatively easily get the easier mode, and can use the expert mode when
>> wanting to match implicit nodes.
>>
>> That might be the best compromise.
>>
>> Thanks,
>>
>> Stephen
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200629/b10e572a/attachment-0001.html>