[cfe-dev] [RFC] Improving Support for Debugging Matchers

James King via cfe-dev cfe-dev at lists.llvm.org
Mon Nov 29 14:20:48 PST 2021


Hello Aaron,

Thanks for the feedback! It seems that the goal and comments here can be
broken down into two categories:


   -

   Establishing some infrastructure to provide information about matchers
   -

   Creating tooling to access this information


Focusing on the first point, the decision regarding what to return from
`getName` has many different feasible options. Potentially, we may want the
function to return additional information about the matcher. In that sense,
I agree that there may be a better name than `getName`. Additionally,
instead of returning a string or an enum directly, perhaps it would be
viable to return a “MatcherInfo” object (exposed through `getMatcherInfo`).
That object could initially carry the name (exposed through `getName`), but
we could later add support for an enum and other useful information (ex.
interesting properties, child matchers, etc.).

As for the second point, I agree that there are a lot of options to
consider for presenting this information (integrating with clang-query,
structuring output, etc). All the ideas you proposed seem to be viable -
one of the benefits of breaking this problem down into the two above points
is that once the infrastructure is established, all of these solutions can
be explored. This should allow us to achieve the goal of supporting
multiple debugging workflows. The `withTag` matcher, specifically, can be
useful for cases such as analyzing unit tests which fail due to faulty
matchers - wrapping the matcher in a `withTag` matcher would be a quick way
to gain some insight into its behavior. Ultimately, the effort of providing
better matcher introspection can be continued by expanding the
functionality of the `withTag` matcher, or by working on different
approaches that utilize `getMatcherInfo`.

Thanks,

James

On Mon, Nov 29, 2021 at 10:18 AM Aaron Ballman <aaron at aaronballman.com>
wrote:

> Thank you for proposing this! I think making AST matchers easier to
> use and debug is a fantastic goal. Some thoughts below.
>
> On Tue, Nov 23, 2021 at 1:20 PM James King via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > Objective
> >
> > The goal for this proposal is to improve the experience of debugging
> Clang’s AST matchers and of understanding what they are doing.
> >
> > Background
> >
> > AST Matchers are a powerful tool that allows users to pick out nodes in
> the Clang AST, provided they match the specified constraints and criteria.
> Yet, creating a well-formed matcher can be difficult. Matchers can become
> complex when multiple layers of matchers are nested to create a larger
> compound matcher, which can be necessary to select parts of the AST that
> fulfill strict criteria. Since matchers only provide a boolean response as
> to whether they obtained any matches or did not obtain any matches, the
> user receives no explanation for when a matcher fails to produce any
> bindings. Even when a matcher produces bindings, performing introspection
> to debug false positive bindings can be tedious. When the result of a
> matcher is not expected, determining how the result was obtained is made
> difficult by this limited response. Without any other options to interact
> with matchers, other than trial-and-error based on binary feedback, this
> lack of transparency makes it difficult to introspect matcher behavior and
> identify problems (such as which part of a compound matcher is inconsistent
> with expectations). This often causes the process of debugging a faulty
> matcher to be slow, confusing, and frustrating.
> >
> > Design ideas
> >
> > Proposed Solution
> >
> > The proposed solution is composed of two parts.
> >
> > The first part (https://reviews.llvm.org/D113917) is to expose a
> getName method on the Matcher class. Currently, there is no easy way to
> access any form of identification information for any matcher. This method
> will act as a hook that allows for accessing the name of a matcher
> programmatically. While this method will be helpful for the implementation
> for the withTag matcher, it could also prove to be useful for other forms
> of matcher introspection/debugging.
> >
> > The second part (https://reviews.llvm.org/D113943) is to create an
> introspection node matcher (named the withTag matcher). This matcher will
> act as a way to enable verbose logging for matchers, which introduces the
> possibility for a workflow similar to printf debugging. This matcher is
> templated - it should accept an inner matcher of any type, and itself be a
> matcher of that type. For example, if the withTag matcher has an inner
> matcher of type Matcher<TypeLocMatcher>, it will return a
> Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher
> will not affect the outcome of the inner matcher and should directly return
> whatever the inner matcher returns.
> >
> > Before:
> >
> > varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))
> >
> >
> > After:
> >
> > withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))
> >
> >
> > This matcher will be able to provide information such as:
> >
> > Matcher Type
> >
> > Matcher Name
> >
> > Matcher Success
> >
> > Node Kind
> >
> > Node Source
> >
> > The matcher will provide output by writing to stderr.
>
> I'm not opposed to it, but making this user configurable seems like a
> better approach (not everyone wants to write to stderr or go through
> redirections to get the data into a file). Also, we may want to
> consider what formats we want to provide this information. You give an
> example below, but this sounds a lot like a case where we might want
> to provide structured output (e.g., JSON) so that other tools can
> consume it to provide a better user interface. (That seems like
> something that can be done in follow-ups rather than something
> required for the initial feature.)
>
> > Though the exact format of the output is subject to change, it could
> look like this, for the matcher withTag(varDecl(withTag(hasName("x")),
> hasTypeLoc(pointerTypeLoc()))):
> >
> > ⭐ Attempting new match
> >
> > Matcher Name: `VarDecl` Node
> >
> > Node Kind: TypedefDecl
> >
> > Node Value:
> >
> > ```
> >
> > typedef char *__builtin_va_list
> >
> > ```
> >
> > Node AST:
> >
> > TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit
> __builtin_va_list 'char *'
> >
> > `-PointerType 0x47e6360 'char *'
> >
> >   `-BuiltinType 0x47e54d0 'char'
> >
> > Result: Unsuccessful
> >
> > ✔️ Concluding attempt
> >
> >
> > ⭐ Attempting new match
> >
> > Matcher Name: `VarDecl` Node
> >
> > Node Kind: VarDecl
> >
> > Node Value:
> >
> > ```
> >
> > void *x
> >
> > ```
> >
> > Node AST:
> >
> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
> >
> > ⭐ Attempting new match
> >
> > Matcher Name: HasName
> >
> > Node Kind: VarDecl
> >
> > Node Value:
> >
> > ```
> >
> > void *x
> >
> > ```
> >
> > Node AST:
> >
> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
> >
> > Result: Successful
> >
> > ✔️ Concluding attempt
> >
> >
> > Result: Successful
> >
> > ✔️ Concluding attempt
> >
> >
> >
> > From this output, a user is able to see things such as information about
> each node the matcher attempts to match, information about which part of
> the compound matcher is being matched, and the success of each matcher.
> >
> >
> > Pros
> >
> > The design revolves around a matcher, which is a familiar
> pattern/construct
> >
> > The design has low overhead for using and removing - it requires
> wrapping one or more matchers with the withTag matcher.
> >
> > The design allows for a great degree of control - one can pick out
> certain matchers within a compound matcher to analyze.
> >
> > Cons
> >
> > One withTag matcher will only be able to report information about the
> inner matcher, and not necessarily matchers within the inner matcher
> >
> > However, it may be possible to provide a hook to access the inner
> matcher’s inner matcher
>
> Another con that I see is: this is likely to get very chatty when
> working on complex matchers. We might need some notion of a verbosity
> level for the information.
>
> > Printf debugging is just one method of diagnosing issues, and may not be
> everyone’s preferred approach to debugging. Alternative debugging
> approaches could include something like steveire’s debugger (
> https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/).
>
> FWIW, I loved a lot of the concepts Stephen was developing, so
> enabling this kind of technology would be fantastic. One thing I think
> is worth considering is that clang-query is currently the go-to tool
> for developing complex AST matchers because of the REPL nature of the
> tool. While printf-style debugging gets more information, I do wonder
> if we should be considering how to integrate this into clang-query
> more directly.
>
> > Questions
> >
> > Currently, getName returns a string. Should we consider returning an
> enum, instead?
>
> We sometimes rename matchers, so I think using an enumeration may have
> some nice (or possibly annoying, depending on perspective) properties.
> If we rename something and getName() returns a string, it's more of a
> silent change; using an enumeration is more likely to produce a
> compile error if someone is relying on the name for some reason. I
> don't currently have a strong preference because I can see arguments
> either way.
>
> > One reason we have opted for a string, as of now, is because users may
> define their own private matchers which may not have a value in the enum.
>
> That's a pretty compelling reason to go with the string approach.
>
> > If getName were to return a string, what formats/conventions should we
> adhere to?
> >
> > Currently, the returned strings are not named in a very formal approach.
> For example, most traversal and narrowing matchers have a getName that
> returns the name of the matcher exactly, whereas most node matchers have a
> getName that returns a string containing the name of the node (ie. VarDecl
> instead of varDecl)
>
> Naively, I think users are going to expect getName() to return the
> name of the matcher as it is written (that also makes it easier to
> visually distinguish between the matcher and the matched node).
>
> ~Aaron
>
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211129/2f3d635b/attachment-0001.html>


More information about the cfe-dev mailing list