[cfe-dev] [RFC] Improving Support for Debugging Matchers

Tue Dec 7 13:59:32 PST 2021

Hi,

Gentle ping on this proposal. James has finished his internship, but I'm
eager to see the work through. :)

Thanks,
Yitzhak

On Mon, Nov 29, 2021 at 5:21 PM James King via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hello Aaron,
>
> Thanks for the feedback! It seems that the goal and comments here can be
> broken down into two categories:
>
>
>    -
>
>    Establishing some infrastructure to provide information about matchers
>    -
>
>    Creating tooling to access this information
>
>
> Focusing on the first point, the decision regarding what to return from
> `getName` has many different feasible options. Potentially, we may want the
> function to return additional information about the matcher. In that sense,
> I agree that there may be a better name than `getName`. Additionally,
> instead of returning a string or an enum directly, perhaps it would be
> viable to return a “MatcherInfo” object (exposed through `getMatcherInfo`).
> That object could initially carry the name (exposed through `getName`), but
> we could later add support for an enum and other useful information (ex.
> interesting properties, child matchers, etc.).
>
> As for the second point, I agree that there are a lot of options to
> consider for presenting this information (integrating with clang-query,
> structuring output, etc). All the ideas you proposed seem to be viable -
> one of the benefits of breaking this problem down into the two above points
> is that once the infrastructure is established, all of these solutions can
> be explored. This should allow us to achieve the goal of supporting
> multiple debugging workflows. The `withTag` matcher, specifically, can be
> useful for cases such as analyzing unit tests which fail due to faulty
> matchers - wrapping the matcher in a `withTag` matcher would be a quick way
> to gain some insight into its behavior. Ultimately, the effort of providing
> better matcher introspection can be continued by expanding the
> functionality of the `withTag` matcher, or by working on different
> approaches that utilize `getMatcherInfo`.
>
> Thanks,
>
> James
>
> On Mon, Nov 29, 2021 at 10:18 AM Aaron Ballman <aaron at aaronballman.com>
> wrote:
>
>> Thank you for proposing this! I think making AST matchers easier to
>> use and debug is a fantastic goal. Some thoughts below.
>>
>> On Tue, Nov 23, 2021 at 1:20 PM James King via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>> >
>> > Objective
>> >
>> > The goal for this proposal is to improve the experience of debugging
>> Clang’s AST matchers and of understanding what they are doing.
>> >
>> > Background
>> >
>> > AST Matchers are a powerful tool that allows users to pick out nodes in
>> the Clang AST, provided they match the specified constraints and criteria.
>> Yet, creating a well-formed matcher can be difficult. Matchers can become
>> complex when multiple layers of matchers are nested to create a larger
>> compound matcher, which can be necessary to select parts of the AST that
>> fulfill strict criteria. Since matchers only provide a boolean response as
>> to whether they obtained any matches or did not obtain any matches, the
>> user receives no explanation for when a matcher fails to produce any
>> bindings. Even when a matcher produces bindings, performing introspection
>> to debug false positive bindings can be tedious. When the result of a
>> matcher is not expected, determining how the result was obtained is made
>> difficult by this limited response. Without any other options to interact
>> with matchers, other than trial-and-error based on binary feedback, this
>> lack of transparency makes it difficult to introspect matcher behavior and
>> identify problems (such as which part of a compound matcher is inconsistent
>> with expectations). This often causes the process of debugging a faulty
>> matcher to be slow, confusing, and frustrating.
>> >
>> > Design ideas
>> >
>> > Proposed Solution
>> >
>> > The proposed solution is composed of two parts.
>> >
>> > The first part (https://reviews.llvm.org/D113917) is to expose a
>> getName method on the Matcher class. Currently, there is no easy way to
>> access any form of identification information for any matcher. This method
>> will act as a hook that allows for accessing the name of a matcher
>> programmatically. While this method will be helpful for the implementation
>> for the withTag matcher, it could also prove to be useful for other forms
>> of matcher introspection/debugging.
>> >
>> > The second part (https://reviews.llvm.org/D113943) is to create an
>> introspection node matcher (named the withTag matcher). This matcher will
>> act as a way to enable verbose logging for matchers, which introduces the
>> possibility for a workflow similar to printf debugging. This matcher is
>> templated - it should accept an inner matcher of any type, and itself be a
>> matcher of that type. For example, if the withTag matcher has an inner
>> matcher of type Matcher<TypeLocMatcher>, it will return a
>> Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher
>> will not affect the outcome of the inner matcher and should directly return
>> whatever the inner matcher returns.
>> >
>> > Before:
>> >
>> > varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))
>> >
>> >
>> > After:
>> >
>> > withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))
>> >
>> >
>> > This matcher will be able to provide information such as:
>> >
>> > Matcher Type
>> >
>> > Matcher Name
>> >
>> > Matcher Success
>> >
>> > Node Kind
>> >
>> > Node Source
>> >
>> > The matcher will provide output by writing to stderr.
>>
>> I'm not opposed to it, but making this user configurable seems like a
>> better approach (not everyone wants to write to stderr or go through
>> redirections to get the data into a file). Also, we may want to
>> consider what formats we want to provide this information. You give an
>> example below, but this sounds a lot like a case where we might want
>> to provide structured output (e.g., JSON) so that other tools can
>> consume it to provide a better user interface. (That seems like
>> something that can be done in follow-ups rather than something
>> required for the initial feature.)
>>
>> > Though the exact format of the output is subject to change, it could
>> look like this, for the matcher withTag(varDecl(withTag(hasName("x")),
>> hasTypeLoc(pointerTypeLoc()))):
>> >
>> > ⭐ Attempting new match
>> >
>> > Matcher Name: `VarDecl` Node
>> >
>> > Node Kind: TypedefDecl
>> >
>> > Node Value:
>> >
>> > ```
>> >
>> > typedef char *__builtin_va_list
>> >
>> > ```
>> >
>> > Node AST:
>> >
>> > TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit
>> __builtin_va_list 'char *'
>> >
>> > `-PointerType 0x47e6360 'char *'
>> >
>> >   `-BuiltinType 0x47e54d0 'char'
>> >
>> > Result: Unsuccessful
>> >
>> > ✔️ Concluding attempt
>> >
>> >
>> > ⭐ Attempting new match
>> >
>> > Matcher Name: `VarDecl` Node
>> >
>> > Node Kind: VarDecl
>> >
>> > Node Value:
>> >
>> > ```
>> >
>> > void *x
>> >
>> > ```
>> >
>> > Node AST:
>> >
>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>> >
>> > ⭐ Attempting new match
>> >
>> > Matcher Name: HasName
>> >
>> > Node Kind: VarDecl
>> >
>> > Node Value:
>> >
>> > ```
>> >
>> > void *x
>> >
>> > ```
>> >
>> > Node AST:
>> >
>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>> >
>> > Result: Successful
>> >
>> > ✔️ Concluding attempt
>> >
>> >
>> > Result: Successful
>> >
>> > ✔️ Concluding attempt
>> >
>> >
>> >
>> > From this output, a user is able to see things such as information
>> about each node the matcher attempts to match, information about which part
>> of the compound matcher is being matched, and the success of each matcher.
>> >
>> >
>> > Pros
>> >
>> > The design revolves around a matcher, which is a familiar
>> pattern/construct
>> >
>> > The design has low overhead for using and removing - it requires
>> wrapping one or more matchers with the withTag matcher.
>> >
>> > The design allows for a great degree of control - one can pick out
>> certain matchers within a compound matcher to analyze.
>> >
>> > Cons
>> >
>> > One withTag matcher will only be able to report information about the
>> inner matcher, and not necessarily matchers within the inner matcher
>> >
>> > However, it may be possible to provide a hook to access the inner
>> matcher’s inner matcher
>>
>> Another con that I see is: this is likely to get very chatty when
>> working on complex matchers. We might need some notion of a verbosity
>> level for the information.
>>
>> > Printf debugging is just one method of diagnosing issues, and may not
>> be everyone’s preferred approach to debugging. Alternative debugging
>> approaches could include something like steveire’s debugger (
>> https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/).
>>
>> FWIW, I loved a lot of the concepts Stephen was developing, so
>> enabling this kind of technology would be fantastic. One thing I think
>> is worth considering is that clang-query is currently the go-to tool
>> for developing complex AST matchers because of the REPL nature of the
>> tool. While printf-style debugging gets more information, I do wonder
>> if we should be considering how to integrate this into clang-query
>> more directly.
>>
>> > Questions
>> >
>> > Currently, getName returns a string. Should we consider returning an
>> enum, instead?
>>
>> We sometimes rename matchers, so I think using an enumeration may have
>> some nice (or possibly annoying, depending on perspective) properties.
>> If we rename something and getName() returns a string, it's more of a
>> silent change; using an enumeration is more likely to produce a
>> compile error if someone is relying on the name for some reason. I
>> don't currently have a strong preference because I can see arguments
>> either way.
>>
>> > One reason we have opted for a string, as of now, is because users may
>> define their own private matchers which may not have a value in the enum.
>>
>> That's a pretty compelling reason to go with the string approach.
>>
>> > If getName were to return a string, what formats/conventions should we
>> adhere to?
>> >
>> > Currently, the returned strings are not named in a very formal
>> approach. For example, most traversal and narrowing matchers have a getName
>> that returns the name of the matcher exactly, whereas most node matchers
>> have a getName that returns a string containing the name of the node (ie.
>> VarDecl instead of varDecl)
>>
>> Naively, I think users are going to expect getName() to return the
>> name of the matcher as it is written (that also makes it easier to
>> visually distinguish between the matcher and the matched node).
>>
>> ~Aaron
>>
>> >
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211207/9e081f71/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4000 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211207/9e081f71/attachment-0001.bin>