<div dir="ltr"><div dir="ltr">On Wed, 17 Jun 2020 at 14:21, Stephen Kelly via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>On 16/06/2020 05:16, Richard Smith
wrote:<br></p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 25 May 2020 at
14:56, Stephen Kelly via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 20/12/2019 21:01,
Stephen Kelly via cfe-dev wrote:<br>
> <br>
> Hi,<br>
> <br>
> (Apologies if you receive this twice. GMail classified
the first one as <br>
> spam)<br>
> <br>
> Aaron Ballman and I met by chance in Belfast and we
discussed a way <br>
> forward towards making AST Matchers easier to use,
particularly for C++ <br>
> developers who are not familiar with the details of the
Clang AST.<br>
> <br>
> For those unaware, I expanded on this in the EuroLLVM
conference this <br>
> year, and then expanded on it at ACCU:<br>
> <br>
> <a href="https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching" rel="noreferrer" target="_blank">https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching</a><br>
> <br>
> One step in the process of getting there is changing
the default <br>
> behavior of AST Matchers to ignore invisible nodes
while matching using <br>
> the C++ API, and while matching and dumping AST nodes
in clang-query.<br>
> <br>
> I think this is the most important change in the entire
proposal as it <br>
> sets out the intention of making the AST Matchers
easier to use for C++ <br>
> developers who are not already familiar with Clang
APIs.<br>
> <br>
> To that end, I've written an AST to motivate the
change:<br>
> <br>
> <br>
> <a href="https://docs.google.com/document/d/17Z6gAwwc3HoRXvsy0OdwU0X5MFQEuiGeSu3i6ICOB90" rel="noreferrer" target="_blank">https://docs.google.com/document/d/17Z6gAwwc3HoRXvsy0OdwU0X5MFQEuiGeSu3i6ICOB90</a>
<br>
> <br>
> <br>
> We're looking for feedback before pressing forward with
the change. I <br>
> already have some patches written to port clang-tidy
and unit tests to <br>
> account for the change of default.<br>
<br>
<br>
This change is now in master.<br>
</blockquote>
<div><br>
</div>
<div>Is this change responsible for <a href="https://bugs.llvm.org/show_bug.cgi?id=46287" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=46287</a> ?
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
<p>Yes, I added a message on the bug and pointed out the <br>
</p>
<p>set traversal AsIs</p>
<p>command in clang-query: <a href="https://godbolt.org/z/tIS622" target="_blank">https://godbolt.org/z/tIS622</a><br>
</p>
<p><br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>If so, I think the current behavior is very
confusing, and seems harmful. By default, a cxxConstructExpr
matcher no longer matches all CXXConstructExprs, which seems
very strange and surprising, and likewise it's alarming that
an AST dump from clang-query no longer correctly dumps the
AST.</div>
</div>
</div>
</blockquote>
<p><br>
</p>
<p>The intention is to be make the AST Matchers more approachable by
newcomers. CXXConstructExpr and CXXMemberCallExpr nodes show up
in places which are non-intuitive for people who are new to the
clang AST, but are nevertheless familiar enough with C++ to work
professionally with it. <br>
</p>
<p>Consider <br>
</p>
<p><a href="https://godbolt.org/z/TvUTgy" target="_blank">https://godbolt.org/z/TvUTgy</a></p>
<p>and <br>
</p>
<p><a href="https://godbolt.org/z/me3EQu" target="_blank">https://godbolt.org/z/me3EQu</a></p>
<p>and imagine you've never taken a compiler course, and never seen
the clang AST before.<br>
</p>
<p>I am not surprised experts miss those nodes though, and the AsIs
mode is essentially the "expert mode" while the default is
intended to be "easy mode".</p></div></blockquote><div>I think your goal is noble, and automatically skipping implicit nodes when looking for an explicit one makes a lot of sense, but I don't think the change as-is is a net positive for approachability by newcomers. The fact is that you've made all of the matchers that match semantics rather than syntax stop working reliably (but not stop working entirely, which, if anything, is even worse because the behavior appears random and inconsistent). Consider the example from Yitzhak Mandelbaum -- the issue there is actually not whether we have a template or not; rather, it's a difference between one- and two-argument constructors:</div><div><br></div><div>struct A {</div><div> A(int);</div><div> A(int, int);</div><div>};</div><div>A a(3);</div><div>A b(3, 4);</div><div><br></div><div>We produce uniform AST representations for these two cases: both are CXXConstructExprs. Here's the AST dump:</div><div><br></div><div>|-VarDecl 0x638eb60 <col:38, col:43> col:40 a 'A' callinit<br>| `-CXXConstructExpr 0x638eff0 <col:40, col:43> 'A' 'void (int)'<br>| `-IntegerLiteral 0x638ec10 <col:42> 'int' 3<br>`-VarDecl 0x638f030 <col:45, col:53> col:47 b 'A' callinit<br> `-CXXConstructExpr 0x638f100 <col:47, col:53> 'A' 'void (int, int)'<br> |-IntegerLiteral 0x638f098 <col:49> 'int' 3<br> `-IntegerLiteral 0x638f0b8 <col:52> 'int' 4<br></div><div><br></div><div>But clang-query in its default mode instead dumps the example as</div><div><br></div><div>|-VarDecl 0x638eb60 <col:38, col:43> col:40 a 'A' callinit<br>| `-IntegerLiteral 0x638ec10 <col:42> 'int' 3<br>`-VarDecl 0x638f030 <col:45, col:53> col:47 b 'A' callinit<br> `-CXXConstructExpr 0x638f100 <col:47, col:53> 'A' 'void (int, int)'<br> |-IntegerLiteral 0x638f098 <col:49> 'int' 3<br> `-IntegerLiteral 0x638f0b8 <col:52> 'int' 4<br></div><div><br></div><div>and a cxxConstructExpr() matcher matches the initialization of b but not that of a. clang-query is following the new rules correctly: in the above case, the single-argument construction is an implicit node because C++ non-explicit single argument constructors implement implicit conversions, so the initialization of 'a' is an implicit conversion whereas the initialization of 'b' is a construction.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div></div></div></div></blockquote></div></blockquote><div>I hope you can agree that this change in behavior, for an example such as this, is a regression for the ability for beginners to understand how matchers work and effectively write them!<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><blockquote type="cite"><div dir="ltr"><div class="gmail_quote"><div>Perhaps as an alternative, we should not skip implicit
nodes when trying to match a node that might be implicit
(analogous to how Type::getAs steps over sugar nodes except
when looking for that type sugar node), and instead of
hiding implicit nodes in the AST dump we should dim them out
but still show them?</div>
</div>
</div>
</blockquote>
<p><br>
</p>
<p>Color wouldn't be particularly distinctive, especially in godbolt
for example, but it also would mean that the confusing stuff is
still *there* in the output. It can be overwhelming when
absolutely everything in the output is new and confusing.</p>
<p>(I also note that newcomers to Matchers don't need to see the AST
at all if the clang-query shows them the matchers they can use
instead: <a href="http://ce.steveire.com/z/9EF8x0" target="_blank">http://ce.steveire.com/z/9EF8x0</a> see my other posts on the
topic for more)<br></p></div></blockquote><div>Perhaps we should consider the AST dump to be an expert-level feature and not try to optimize it for beginner usage, especially if we have better output methods. I'm not sure, though; I think telling beginners to look at the AST dump and match what they see there has been somewhat effective.<br></div><div><br></div><div>I think we should assume that we can ask the maintainers of compiler explorer to render the output nicely, if we find that's a problem. (They already apply some postprocessing to it, to remove pointer values and the like.) As an alternative to greying out implicit nodes, we could render them inside square brackets or something:</div><div><br></div><div>|-VarDecl 0x638eb60 <col:38, col:43> col:40 a 'A' callinit<br>| `-[CXXConstructExpr 0x638eff0 <col:40, col:43> 'A' 'void (int)']<br>| `-IntegerLiteral 0x638ec10 <col:42> 'int' 3<br>`-VarDecl 0x638f030 <col:45, col:53> col:47 b 'A' callinit<br> `-CXXConstructExpr 0x638f100 <col:47, col:53> 'A' 'void (int, int)'<br> |-IntegerLiteral 0x638f098 <col:49> 'int' 3<br> `-IntegerLiteral 0x638f0b8 <col:52> 'int' 4<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><p>
</p>
<p>Nevertheless, this change in the direction that I consider easier
hasn't been 100% popular among other clang developers (though I
wonder how much of that is due to being experts?). At this point,
if someone wishes to reverse the change of default I don't think I
would attempt to oppose it. It's not clear to me whether my vision
for making things easier for newcomers (which I set out in
EuroLLVM and which included ignoring these nodes) is shared. It
might be better at this point for others to decide.</p></div></blockquote><div>I think it's a good goal. But I think the implementation needs some refinement. Did you have any thoughts on my suggestion of how to fix this? (That is: try matching against implicit nodes, but skip them if they don't match.) I don't think you commented on it.</div></div></div>