[cfe-dev] RFC: Easier AST Matching by Default

Mon Jun 29 16:22:05 PDT 2020

Stephen you’ve raised a great point about how the user should be able to interface with Stmt nodes, and your clang-query idea is absolutely great, I love it.   

But I don’t think your proposed solution is broad enough, even for your own use case.  Below I give an easy, general solution that would work for you and everyone else.  (This clarifies & applies to Stephen’s case the points I discussed with Richard earlier.)  Please point out where you see a problem with this; that would at least help clarify the discussion for others.

To summarize the issue, the now famous "B-42" example:
```
struct B { B(int); }
B func1() { return 42; }
```
This currently get matches for `hasReturnValue(exprWCleanups())`; you want it to match `hasReturnValue(integerLiteral())` under a new "beginner" setting.

But if we’re going to introduce a new, easier setting, why not go all out, and let it *also* match `hasReturnValue(cxxConstructExpr())` OR hasReturnValue(materializeTemporaryExpr()) OR, yes, even the original `hasReturnValue(exprWithCleanups())`?  I believe this is what Richard originally suggested.

We should call this setting "easy" mode — and I would probably agree with you that it *should* be made the default, for *everyone*’s sake, not just beginners - more on that later.  

This is how it could be implemented, three steps:

1. Add Expr::isImplicit(), Expr::desalt(), and Expr::getAs<T>(), the latter two similar but, as Richard pointed out, anti-analogous to Type::desugar() and Type::getAs<T>().  (I support introducing these in the main AST nodes, rather than only in ASTMatchers, so that other users of the AST can inteface with nodes in the same way ASTMatchers do, thereby avoiding a bifurcation in how AST nodes are interpreted and used.)
```
	class Expr : public Stmt {
		//…
	public:
		bool isImplicit() const {
			//…Dispatch to the most derived type’s isImplicit(), and be sure they all define one.
		}

		/// Whereas Type::desugar peels off meaningless syntax ("sugar") to get at underlying semantics, this
		/// peels off invisible semantics ("salt") to get at underlying syntax.  (Proposed name, open to suggestions.)
		Expr * desalt() const {
			if (!this->isImplicit())
				return nullptr;
			assert(children.end() = children.begin() + 1 &&
				  "Expected implicit nodes to have exactly one child");
			return *children.begin();
		}

		template<typename T>
		T *getAs() const {
			if (isa<T>(this)) 
				return cast<T>(this);
			return desalt()->getAs<T>();
		}
	};
```

2. In the ASTMatchers implem, define BoundNodes::getNodeAs<T>() in terms of Stmt::getAs<T>().  I think it would be as simple as this:
```
  	class BoundNodes {
		//… 
    		template <typename T>
    		const T *getNodeAs(StringRef ID) const {
			if (std::is_base_of<Expr, T>::value) {
				if (Expr *E = MyBoundNodes.getNodeAs<Expr>(ID))
					return E->getAs<T>();
				return nullptr;
			}
      			return MyBoundNodes.getNodeAs<T>(ID);
		}
    	};
```

At this point, if on the B-42 example you call

`returnStmt(hasReturnValue(expr().bind("theReturnVal")))`

when in "easy" mode, for theReturnVal you should be able to getNodeAs<IntegerLiteral>(), getNodeAs<ExprWCleanups>(), getNodeAs<CXXConstructExpr>() etc — all would be nonnull, users can take their pick.  And btw, if you wanted to get the other CXXConstructExpr, since there are two, you would call firstConstructExpr->desalt()->getAs<CXXConstructExpr>()).  And, if you only wanted matches that were non-implicit, you'd use ignoreImplicit as usual — but you would have to specify this.  This would force the user to reckon with the existence of what I guess I’m calling "semantic salt" over their syntax, without forcing them to manually dust it all off to get to a simple IntegerLiteral underneath.  

So the user retains maximum flexibility, but gains maximum tightness/expressiveness, in this new mode.

3. The last part of the puzzle: how to allow `hasReturnValue(integerLiteral())` to match the "return 42;" in the B-42 example.  From perusing ASTMatchersInternal.h, it seems you would just want to change any dyn_cast<SomeExpr>(E) to E->getAs<T>() when in "easy" mode.  If a generic template is used somewhere, specialize it for Exprs to do this, same as we did for getNodeAs<T>().

Then, it should be able to work as you suggested — correct me if I’m wrong.

What is more, as you suggested, this really isn’t just a beginner mode — the experts would benefit too:

Suppose in the future we find we need more than just ExprWCleanups — we need an extra implicit ExprWExtraCleanups that wraps the ExprWCleanups and everything else.  Or we need new inner implicit nodes.  If an expert used "easy" mode and getNodeAs<ExprWCleanups>(), and getAs<T>() whenever they sought to navigate to a child of an implicit node, he/she would not need to change her code at all.  Only if she required that each match was exact (call such a mode "directMatchOnly" perhaps) would she need to do a lot of maintenance whenever a new implicit node is introduced.  

That’s why I would probably agree with Stephen that "easy" mode should probably be the default — the user should have to call "directMatchOnly" to opt out.  But I could see the other side of it too.

Bottom line: you’ve raised a great but subtle point, doggedly pursued it, but now, let’s just get rid of the half measures — let’s go all out with a solution that is easy and lifts all boats.

If there is some problem this way of doing things won’t solve, please point it out — that would at least clarify the discussion for others who perhaps can help better than I can.  Thanks again for your efforts,

- Dave

> On Jun 28, 2020, at 7:07 PM, Stephen Kelly via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> 
> 
> On 24/06/2020 21:26, Yitzhak Mandelbaum wrote:
>> My 2cents: 
>> 
> 
> Thanks for responding about the matcher output from clang-query.
> 
> 
> 
>> I mostly agree with Richard and Sam. My only concern w/ Richard's proposal is that the hybrid model may be even more confusing to the beginner, since it may be harder to predict how a given matcher will match against an AST.  The system will be more forgiving, yet it becomes harder to build a mental model of how it works. 
> 
> In addition to making clang-query output relatable information, I agree with what you wrote here about creating a mental model.
> 
> 
> 
>> I think we'd need to be sure we can concisely explain how this mode works, without having to reference any code/algorithms. For example, is there a deterministic elaboration from any unadorned matcher to a matcher with explicit `traverse` annotations? That would allow us to give some examples that illuminate what's happening. Otherwise, I'd prefer we just go back to the original default for the library. I'm fine if beginner-oriented tools want to start with "IgnoreUnless..." as their default. It sounds like Stephen is amenable to this approach as well from his last reply.
>> 
> 
> I think reversing the change of default is a worse outcome, but I think it's better than the current state of unhappiness from people. 
> 
> Further development can't happen until this thread is resolved I think.
> 
> However, if the default is changed back, I think there is less reason to add the matcher output feature to clang-query, because it will either need to 
> 
> * not show ignoring*() matchers in its output
> 
> * show output based on each of (and none of) the ignoring*() matchers being applied
> 
> I think that would be too noisy and I don't think it's a good thing to require newcomer users to immediately create a mental model which requires those matchers.
> 
> 
> 
>> re: context-sensitive suggestions. 
>> Stephen, if you drop the need to suggest all possible matchers at that point and only provide (beginner) helpful suggestions, does your concern go away? 
> 
> It doesn't change my concern. 
> 
> I don't know how you would determine what should be in the output and what shouldn't. Also, this content is not just for beginners. I use the feature still.
> 
> I don't think this is the thread for details such as your other question.
> 
> 
> 
>> Overall, I'm pessimistic about making the current matchers easier for users.
> 
> I think if AST Matchers are not going to be removed, we should make them easier for users.
> 
> 
> 
>> I think the AST is challenging for beginners and band aids will not solve that and can often make it worse. If we want beginner-friendly tooling (and I do!) we need to admit its cost and then invest in it.
> 
> Yes, I have started such tooling, but there is much more that can be done.
> 
> But, as in this thread, I don't think this is fully a tooling issue. It's also about discoverability and that means making it easier for users to create a mental model that doesn't require them to put ignoringImplicit() everywhere.
> 
> 
> 
> Thanks Yitzhak for your input. I remain interested in feedback/input from others on the content I quote below.
> 
> 
> 
> Thanks,
> 
> Stephen.
> 
> 
> 
>> On Mon, Jun 22, 2020 at 5:51 PM Stephen Kelly via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> 
>> The main reason I'm not supportive of this idea is that I want to make clang-query tell the user what matchers they can use with the nodes in the binding list.
>> 
>> For example, if I write
>> 
>>  returnStmt()
>> 
>> in clang-query it should tell me that I can write 
>> 
>>  hasReturnValue(integerLiteral())
>> 
>> inside the expr():
>> 
>>  http://ce.steveire.com/z/9EF8x0 <http://ce.steveire.com/z/9EF8x0>
>> That's how I as a newcomer quickly discover hasReturnValue() and integerLiteral().
>> 
>> You have no idea how overwhelming 
>> 
>>  http://clang.llvm.org/docs/LibASTMatchersReference.html <http://clang.llvm.org/docs/LibASTMatchersReference.html>
>> is to someone who has zero familiarity with the clang AST.
>> 
>> But the above link is interactive and it aids *discovery* which is what my talk was about and which is what I'm trying to make easier with AST Matchers. The above suggested clang-query feature (not upstream yet) tells me at every step which matchers I can use in the context of what I am trying to achieve.
>> 
>> However, if all of the implicit nodes should also appear in that output, and if in
>> 
>> `returnStmt(hasReturnValue(expr().bind("theReturnVal")))`
>> 
>> the expr() should match the exprWithCleanups(), then I don't know how to make that feature work. If the expr() is the exprWithCleanups() then the tool definitely shouldn't tell the user they can write integerLiteral() there. The output for implicit nodes would add lots of noise and would have to be neutral in terms of what it recommends.
>> 
>> The entire point of what I'm trying to do is not present the user with exprWithCleanups() and let them achieve their goals without it. I don't know if that's possible with the ideas floating around at the moment about making AST Matchers present all of the implicit nodes.
>> 
>> But, if making IgnoreUnlessSpelledInSource non-default means that I can continue work toward that goal (and eg ignore template instantiations and other implicit nodes which are not skipped yet), then maybe that's a viable way forward. So, that's my preference now I think.
>> 
>> That way, people who want expert mode get it by default, and newcomers (and anyone else who doesn't want to write ignoringImplicit() everywhere) can relatively easily get the easier mode, and can use the expert mode when wanting to match implicit nodes. 
>> 
>> That might be the best compromise.
>> 
>> Thanks,
>> 
>> Stephen
>> 
>> 
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200629/df19ad2f/attachment-0001.html>