[cfe-dev] [RFC] Improving Support for Debugging Matchers

Tue Nov 23 10:19:44 PST 2021

Objective

The goal for this proposal is to improve the experience of debugging
Clang’s AST matchers and of understanding what they are doing.
Background

AST Matchers are a powerful tool that allows users to pick out nodes in the
Clang AST, provided they match the specified constraints and criteria. Yet,
creating a well-formed matcher can be difficult. Matchers can become
complex when multiple layers of matchers are nested to create a larger
compound matcher, which can be necessary to select parts of the AST that
fulfill strict criteria. Since matchers only provide a boolean response as
to whether they obtained any matches or did not obtain any matches, the
user receives no explanation for when a matcher fails to produce any
bindings. Even when a matcher produces bindings, performing introspection
to debug false positive bindings can be tedious. When the result of a
matcher is not expected, determining how the result was obtained is made
difficult by this limited response. Without any other options to interact
with matchers, other than trial-and-error based on binary feedback, this
lack of transparency makes it difficult to introspect matcher behavior and
identify problems (such as which part of a compound matcher is inconsistent
with expectations). This often causes the process of debugging a faulty
matcher to be slow, confusing, and frustrating.
Design ideasProposed Solution

The proposed solution is composed of two parts.

The first part (https://reviews.llvm.org/D113917) is to expose a getName
method on the Matcher class. Currently, there is no easy way to access any
form of identification information for any matcher. This method will act as
a hook that allows for accessing the name of a matcher programmatically.
While this method will be helpful for the implementation for the withTag
matcher, it could also prove to be useful for other forms of matcher
introspection/debugging.

The second part (https://reviews.llvm.org/D113943) is to create an
introspection node matcher (named the withTag matcher). This matcher will
act as a way to enable verbose logging for matchers, which introduces the
possibility for a workflow similar to printf debugging. This matcher is
templated - it should accept an inner matcher of any type, and itself be a
matcher of that type. For example, if the withTag matcher has an inner
matcher of type Matcher<TypeLocMatcher>, it will return a
Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher
will not affect the outcome of the inner matcher and should directly return
whatever the inner matcher returns.

Before:

varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))

After:

withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))

This matcher will be able to provide information such as:

   -

   Matcher Type
   -

   Matcher Name
   -

   Matcher Success
   -

   Node Kind
   -

   Node Source

The matcher will provide output by writing to stderr. Though the exact
format of the output is subject to change, it could look like this, for the
matcher withTag(varDecl(withTag(hasName("x")),
hasTypeLoc(pointerTypeLoc()))):

⭐ Attempting new match

Matcher Name: `VarDecl` Node

Node Kind: TypedefDecl

Node Value:

```

typedef char *__builtin_va_list

```

Node AST:

TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit
__builtin_va_list 'char *'

`-PointerType 0x47e6360 'char *'

  `-BuiltinType 0x47e54d0 'char'

Result: Unsuccessful

✔️ Concluding attempt

⭐ Attempting new match

Matcher Name: `VarDecl` Node

Node Kind: VarDecl

Node Value:

```

void *x

```

Node AST:

VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'

⭐ Attempting new match

Matcher Name: HasName

Node Kind: VarDecl

Node Value:

```

void *x

```

Node AST:

VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'

Result: Successful

✔️ Concluding attempt

Result: Successful

✔️ Concluding attempt

>From this output, a user is able to see things such as information about
each node the matcher attempts to match, information about which part of
the compound matcher is being matched, and the success of each matcher.

Pros

   -

   The design revolves around a matcher, which is a familiar
   pattern/construct
   -

   The design has low overhead for using and removing - it requires
   wrapping one or more matchers with the withTag matcher.
   -

   The design allows for a great degree of control - one can pick out
   certain matchers within a compound matcher to analyze.

Cons

   -

   One withTag matcher will only be able to report information about the
   inner matcher, and not necessarily matchers within the inner matcher
   -

      However, it may be possible to provide a hook to access the inner
      matcher’s inner matcher
      -

   Printf debugging is just one method of diagnosing issues, and may not be
   everyone’s preferred approach to debugging. Alternative debugging
   approaches could include something like steveire’s debugger (
   https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/).

Questions

   -

   Currently, getName returns a string. Should we consider returning an
   enum, instead?
   -

      One reason we have opted for a string, as of now, is because users
      may define their own private matchers which may not have a value in the
      enum.

   -

   If getName were to return a string, what formats/conventions should we
   adhere to?

   -

      Currently, the returned strings are not named in a very formal
      approach. For example, most traversal and narrowing matchers have a
      getName that returns the name of the matcher exactly, whereas most
      node matchers have a getName that returns a string containing the
      name of the node (ie. VarDecl instead of varDecl)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211123/5c8f29c1/attachment-0001.html>