<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Hi, </div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Gentle ping on this proposal. James has finished his internship, but I'm eager to see the work through. :)</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Thanks,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Yitzhak</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 29, 2021 at 5:21 PM James King via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" id="gmail-m_607791738163287788gmail-docs-internal-guid-08f16711-7fff-a074-2bf4-ba5b4f3b75ad"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Hello Aaron,</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Thanks for the feedback! It seems that the goal and comments here can be broken down into two categories:</span></p><br><ul style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" role="presentation"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Establishing some infrastructure to provide information about matchers</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt" role="presentation"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Creating tooling to access this information</span></p></li></ul><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Focusing on the first point, the decision regarding what to return from `getName` has many different feasible options. Potentially, we may want the function to return additional information about the matcher. In that sense, I agree that there may be a better name than `getName`. Additionally, instead of returning a string or an enum directly, perhaps it would be viable to return a “MatcherInfo” object (exposed through `getMatcherInfo`). That object could initially carry the name (exposed through `getName`), but we could later add support for an enum and other useful information (ex. interesting properties, child matchers, etc.).</span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">As for the second point, I agree that there are a lot of options to consider for presenting this information (integrating with clang-query, structuring output, etc). All the ideas you proposed seem to be viable - one of the benefits of breaking this problem down into the two above points is that once the infrastructure is established, all of these solutions can be explored. This should allow us to achieve the goal of supporting multiple debugging workflows. The `withTag` matcher, specifically, can be useful for cases such as analyzing unit tests which fail due to faulty matchers - wrapping the matcher in a `withTag` matcher would be a quick way to gain some insight into its behavior. Ultimately, the effort of providing better matcher introspection can be continued by expanding the functionality of the `withTag` matcher, or by working on different approaches that utilize `getMatcherInfo`. </span></p><br><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Thanks,</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:11pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">James</span></p></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 29, 2021 at 10:18 AM Aaron Ballman <<a href="mailto:aaron@aaronballman.com" target="_blank">aaron@aaronballman.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thank you for proposing this! I think making AST matchers easier to<br>
use and debug is a fantastic goal. Some thoughts below.<br>
<br>
On Tue, Nov 23, 2021 at 1:20 PM James King via cfe-dev<br>
<<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br>
><br>
> Objective<br>
><br>
> The goal for this proposal is to improve the experience of debugging Clang’s AST matchers and of understanding what they are doing.<br>
><br>
> Background<br>
><br>
> AST Matchers are a powerful tool that allows users to pick out nodes in the Clang AST, provided they match the specified constraints and criteria. Yet, creating a well-formed matcher can be difficult. Matchers can become complex when multiple layers of matchers are nested to create a larger compound matcher, which can be necessary to select parts of the AST that fulfill strict criteria. Since matchers only provide a boolean response as to whether they obtained any matches or did not obtain any matches, the user receives no explanation for when a matcher fails to produce any bindings. Even when a matcher produces bindings, performing introspection to debug false positive bindings can be tedious. When the result of a matcher is not expected, determining how the result was obtained is made difficult by this limited response. Without any other options to interact with matchers, other than trial-and-error based on binary feedback, this lack of transparency makes it difficult to introspect matcher behavior and identify problems (such as which part of a compound matcher is inconsistent with expectations). This often causes the process of debugging a faulty matcher to be slow, confusing, and frustrating.<br>
><br>
> Design ideas<br>
><br>
> Proposed Solution<br>
><br>
> The proposed solution is composed of two parts.<br>
><br>
> The first part (<a href="https://reviews.llvm.org/D113917" rel="noreferrer" target="_blank">https://reviews.llvm.org/D113917</a>) is to expose a getName method on the Matcher class. Currently, there is no easy way to access any form of identification information for any matcher. This method will act as a hook that allows for accessing the name of a matcher programmatically. While this method will be helpful for the implementation for the withTag matcher, it could also prove to be useful for other forms of matcher introspection/debugging.<br>
><br>
> The second part (<a href="https://reviews.llvm.org/D113943" rel="noreferrer" target="_blank">https://reviews.llvm.org/D113943</a>) is to create an introspection node matcher (named the withTag matcher). This matcher will act as a way to enable verbose logging for matchers, which introduces the possibility for a workflow similar to printf debugging. This matcher is templated - it should accept an inner matcher of any type, and itself be a matcher of that type. For example, if the withTag matcher has an inner matcher of type Matcher<TypeLocMatcher>, it will return a Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher will not affect the outcome of the inner matcher and should directly return whatever the inner matcher returns.<br>
><br>
> Before:<br>
><br>
> varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))<br>
><br>
><br>
> After:<br>
><br>
> withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))<br>
><br>
><br>
> This matcher will be able to provide information such as:<br>
><br>
> Matcher Type<br>
><br>
> Matcher Name<br>
><br>
> Matcher Success<br>
><br>
> Node Kind<br>
><br>
> Node Source<br>
><br>
> The matcher will provide output by writing to stderr.<br>
<br>
I'm not opposed to it, but making this user configurable seems like a<br>
better approach (not everyone wants to write to stderr or go through<br>
redirections to get the data into a file). Also, we may want to<br>
consider what formats we want to provide this information. You give an<br>
example below, but this sounds a lot like a case where we might want<br>
to provide structured output (e.g., JSON) so that other tools can<br>
consume it to provide a better user interface. (That seems like<br>
something that can be done in follow-ups rather than something<br>
required for the initial feature.)<br>
<br>
> Though the exact format of the output is subject to change, it could look like this, for the matcher withTag(varDecl(withTag(hasName("x")), hasTypeLoc(pointerTypeLoc()))):<br>
><br>
> ⭐ Attempting new match<br>
><br>
> Matcher Name: `VarDecl` Node<br>
><br>
> Node Kind: TypedefDecl<br>
><br>
> Node Value:<br>
><br>
> ```<br>
><br>
> typedef char *__builtin_va_list<br>
><br>
> ```<br>
><br>
> Node AST:<br>
><br>
> TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'char *'<br>
><br>
> `-PointerType 0x47e6360 'char *'<br>
><br>
> `-BuiltinType 0x47e54d0 'char'<br>
><br>
> Result: Unsuccessful<br>
><br>
> ✔️ Concluding attempt<br>
><br>
><br>
> ⭐ Attempting new match<br>
><br>
> Matcher Name: `VarDecl` Node<br>
><br>
> Node Kind: VarDecl<br>
><br>
> Node Value:<br>
><br>
> ```<br>
><br>
> void *x<br>
><br>
> ```<br>
><br>
> Node AST:<br>
><br>
> VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'<br>
><br>
> ⭐ Attempting new match<br>
><br>
> Matcher Name: HasName<br>
><br>
> Node Kind: VarDecl<br>
><br>
> Node Value:<br>
><br>
> ```<br>
><br>
> void *x<br>
><br>
> ```<br>
><br>
> Node AST:<br>
><br>
> VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'<br>
><br>
> Result: Successful<br>
><br>
> ✔️ Concluding attempt<br>
><br>
><br>
> Result: Successful<br>
><br>
> ✔️ Concluding attempt<br>
><br>
><br>
><br>
> From this output, a user is able to see things such as information about each node the matcher attempts to match, information about which part of the compound matcher is being matched, and the success of each matcher.<br>
><br>
><br>
> Pros<br>
><br>
> The design revolves around a matcher, which is a familiar pattern/construct<br>
><br>
> The design has low overhead for using and removing - it requires wrapping one or more matchers with the withTag matcher.<br>
><br>
> The design allows for a great degree of control - one can pick out certain matchers within a compound matcher to analyze.<br>
><br>
> Cons<br>
><br>
> One withTag matcher will only be able to report information about the inner matcher, and not necessarily matchers within the inner matcher<br>
><br>
> However, it may be possible to provide a hook to access the inner matcher’s inner matcher<br>
<br>
Another con that I see is: this is likely to get very chatty when<br>
working on complex matchers. We might need some notion of a verbosity<br>
level for the information.<br>
<br>
> Printf debugging is just one method of diagnosing issues, and may not be everyone’s preferred approach to debugging. Alternative debugging approaches could include something like steveire’s debugger (<a href="https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/" rel="noreferrer" target="_blank">https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/</a>).<br>
<br>
FWIW, I loved a lot of the concepts Stephen was developing, so<br>
enabling this kind of technology would be fantastic. One thing I think<br>
is worth considering is that clang-query is currently the go-to tool<br>
for developing complex AST matchers because of the REPL nature of the<br>
tool. While printf-style debugging gets more information, I do wonder<br>
if we should be considering how to integrate this into clang-query<br>
more directly.<br>
<br>
> Questions<br>
><br>
> Currently, getName returns a string. Should we consider returning an enum, instead?<br>
<br>
We sometimes rename matchers, so I think using an enumeration may have<br>
some nice (or possibly annoying, depending on perspective) properties.<br>
If we rename something and getName() returns a string, it's more of a<br>
silent change; using an enumeration is more likely to produce a<br>
compile error if someone is relying on the name for some reason. I<br>
don't currently have a strong preference because I can see arguments<br>
either way.<br>
<br>
> One reason we have opted for a string, as of now, is because users may define their own private matchers which may not have a value in the enum.<br>
<br>
That's a pretty compelling reason to go with the string approach.<br>
<br>
> If getName were to return a string, what formats/conventions should we adhere to?<br>
><br>
> Currently, the returned strings are not named in a very formal approach. For example, most traversal and narrowing matchers have a getName that returns the name of the matcher exactly, whereas most node matchers have a getName that returns a string containing the name of the node (ie. VarDecl instead of varDecl)<br>
<br>
Naively, I think users are going to expect getName() to return the<br>
name of the matcher as it is written (that also makes it easier to<br>
visually distinguish between the matcher and the matched node).<br>
<br>
~Aaron<br>
<br>
><br>
> _______________________________________________<br>
> cfe-dev mailing list<br>
> <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div>