[cfe-dev] [RFC] Improving Support for Debugging Matchers

Tue Nov 23 10:21:30 PST 2021

+Stephen Kelly <steveire at gmail.com>

On Tue, Nov 23, 2021 at 1:20 PM James King <jcking1034 at gmail.com> wrote:

> Objective
>
> The goal for this proposal is to improve the experience of debugging
> Clang’s AST matchers and of understanding what they are doing.
> Background
>
> AST Matchers are a powerful tool that allows users to pick out nodes in
> the Clang AST, provided they match the specified constraints and criteria.
> Yet, creating a well-formed matcher can be difficult. Matchers can become
> complex when multiple layers of matchers are nested to create a larger
> compound matcher, which can be necessary to select parts of the AST that
> fulfill strict criteria. Since matchers only provide a boolean response as
> to whether they obtained any matches or did not obtain any matches, the
> user receives no explanation for when a matcher fails to produce any
> bindings. Even when a matcher produces bindings, performing introspection
> to debug false positive bindings can be tedious. When the result of a
> matcher is not expected, determining how the result was obtained is made
> difficult by this limited response. Without any other options to interact
> with matchers, other than trial-and-error based on binary feedback, this
> lack of transparency makes it difficult to introspect matcher behavior and
> identify problems (such as which part of a compound matcher is inconsistent
> with expectations). This often causes the process of debugging a faulty
> matcher to be slow, confusing, and frustrating.
> Design ideasProposed Solution
>
> The proposed solution is composed of two parts.
>
> The first part (https://reviews.llvm.org/D113917) is to expose a getName
> method on the Matcher class. Currently, there is no easy way to access any
> form of identification information for any matcher. This method will act as
> a hook that allows for accessing the name of a matcher programmatically.
> While this method will be helpful for the implementation for the withTag
> matcher, it could also prove to be useful for other forms of matcher
> introspection/debugging.
>
> The second part (https://reviews.llvm.org/D113943) is to create an
> introspection node matcher (named the withTag matcher). This matcher will
> act as a way to enable verbose logging for matchers, which introduces the
> possibility for a workflow similar to printf debugging. This matcher is
> templated - it should accept an inner matcher of any type, and itself be a
> matcher of that type. For example, if the withTag matcher has an inner
> matcher of type Matcher<TypeLocMatcher>, it will return a
> Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher
> will not affect the outcome of the inner matcher and should directly return
> whatever the inner matcher returns.
>
> Before:
>
> varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))
>
> After:
>
> withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))
>
> This matcher will be able to provide information such as:
>
>    -
>
>    Matcher Type
>    -
>
>    Matcher Name
>    -
>
>    Matcher Success
>    -
>
>    Node Kind
>    -
>
>    Node Source
>
> The matcher will provide output by writing to stderr. Though the exact
> format of the output is subject to change, it could look like this, for the
> matcher withTag(varDecl(withTag(hasName("x")),
> hasTypeLoc(pointerTypeLoc()))):
>
> ⭐ Attempting new match
>
> Matcher Name: `VarDecl` Node
>
> Node Kind: TypedefDecl
>
> Node Value:
>
> ```
>
> typedef char *__builtin_va_list
>
> ```
>
> Node AST:
>
> TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit
> __builtin_va_list 'char *'
>
> `-PointerType 0x47e6360 'char *'
>
>   `-BuiltinType 0x47e54d0 'char'
>
> Result: Unsuccessful
>
> ✔️ Concluding attempt
>
> ⭐ Attempting new match
>
> Matcher Name: `VarDecl` Node
>
> Node Kind: VarDecl
>
> Node Value:
>
> ```
>
> void *x
>
> ```
>
> Node AST:
>
> VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>
> ⭐ Attempting new match
>
> Matcher Name: HasName
>
> Node Kind: VarDecl
>
> Node Value:
>
> ```
>
> void *x
>
> ```
>
> Node AST:
>
> VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>
> Result: Successful
>
> ✔️ Concluding attempt
>
> Result: Successful
>
> ✔️ Concluding attempt
>
>
> From this output, a user is able to see things such as information about
> each node the matcher attempts to match, information about which part of
> the compound matcher is being matched, and the success of each matcher.
>
> Pros
>
>    -
>
>    The design revolves around a matcher, which is a familiar
>    pattern/construct
>    -
>
>    The design has low overhead for using and removing - it requires
>    wrapping one or more matchers with the withTag matcher.
>    -
>
>    The design allows for a great degree of control - one can pick out
>    certain matchers within a compound matcher to analyze.
>
> Cons
>
>    -
>
>    One withTag matcher will only be able to report information about the
>    inner matcher, and not necessarily matchers within the inner matcher
>    -
>
>       However, it may be possible to provide a hook to access the inner
>       matcher’s inner matcher
>       -
>
>    Printf debugging is just one method of diagnosing issues, and may not
>    be everyone’s preferred approach to debugging. Alternative debugging
>    approaches could include something like steveire’s debugger (
>    https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/
>    ).
>
> Questions
>
>    -
>
>    Currently, getName returns a string. Should we consider returning an
>    enum, instead?
>    -
>
>       One reason we have opted for a string, as of now, is because users
>       may define their own private matchers which may not have a value in the
>       enum.
>
>
>    -
>
>    If getName were to return a string, what formats/conventions should we
>    adhere to?
>
>    -
>
>       Currently, the returned strings are not named in a very formal
>       approach. For example, most traversal and narrowing matchers have a
>       getName that returns the name of the matcher exactly, whereas most
>       node matchers have a getName that returns a string containing the
>       name of the node (ie. VarDecl instead of varDecl)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211123/2ad7443d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4000 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211123/2ad7443d/attachment-0001.bin>