[cfe-dev] [RFC] Improving Support for Debugging Matchers

Wed Dec 8 01:01:21 PST 2021

Hi,

Very nice proposal! I am very new to AST matchers and clang in general, and
having gone through quite a few misunderstandings and debugging lately, I
would very much appreciate an improved debugging experience! (At the same
time, take my opinions with a grain of salt - I don't have a good overview
about the AST matcher design, yet)

   - I consider the printf-debugging provided by
   `withTag`/`withIntrospection` as useful. Even if there were other, more
   powerful debugging utilities, I would still consider it useful.
   - I would probably use `withTag` primarily from clang-query. Is
   https://reviews.llvm.org/D113943 sufficient for `withTag` to work also
   in clang-query? Or is there more plumbing required?
   - Output format:
      - To me, `Matcher Name: `VarDecl` Node` feels like unnecessary
      clutter. I am usually debugging a matcher which I just wrote a couple of
      minutes ago. I still remember which matcher I used
      - Instead, I would prefer a possibility to provide a custom name.
      Something like `withTag("myName", varDecl(hasName("x"),
      hasTypeLoc(typeLoc().bind("TL"))))`. This would allow me to
distinguish the
      output for, e.g., multiple matchers of the same type. (Effectively, this
      approach might make `getName` unnecessary?)
      - How would output for nested `withTag` look like? E.g. for
      `withTag(cxxMethodDecl(ofClass(hasName("C")),
      forEachOverridden(withTag(cxxMethodDecl()))))`? Would the match
attempts be
      nested similar to ``? Or will it all be flattened?

Cheers,
Adrian

On Tue, Dec 7, 2021 at 8:59 PM Yitzhak Mandelbaum via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi,
>
> Gentle ping on this proposal. James has finished his internship, but I'm
> eager to see the work through. :)
>
> Thanks,
> Yitzhak
>
> On Mon, Nov 29, 2021 at 5:21 PM James King via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hello Aaron,
>>
>> Thanks for the feedback! It seems that the goal and comments here can be
>> broken down into two categories:
>>
>>
>>    -
>>
>>    Establishing some infrastructure to provide information about matchers
>>    -
>>
>>    Creating tooling to access this information
>>
>>
>> Focusing on the first point, the decision regarding what to return from
>> `getName` has many different feasible options. Potentially, we may want the
>> function to return additional information about the matcher. In that sense,
>> I agree that there may be a better name than `getName`. Additionally,
>> instead of returning a string or an enum directly, perhaps it would be
>> viable to return a “MatcherInfo” object (exposed through `getMatcherInfo`).
>> That object could initially carry the name (exposed through `getName`), but
>> we could later add support for an enum and other useful information (ex.
>> interesting properties, child matchers, etc.).
>>
>> As for the second point, I agree that there are a lot of options to
>> consider for presenting this information (integrating with clang-query,
>> structuring output, etc). All the ideas you proposed seem to be viable -
>> one of the benefits of breaking this problem down into the two above points
>> is that once the infrastructure is established, all of these solutions can
>> be explored. This should allow us to achieve the goal of supporting
>> multiple debugging workflows. The `withTag` matcher, specifically, can be
>> useful for cases such as analyzing unit tests which fail due to faulty
>> matchers - wrapping the matcher in a `withTag` matcher would be a quick way
>> to gain some insight into its behavior. Ultimately, the effort of providing
>> better matcher introspection can be continued by expanding the
>> functionality of the `withTag` matcher, or by working on different
>> approaches that utilize `getMatcherInfo`.
>>
>> Thanks,
>>
>> James
>>
>> On Mon, Nov 29, 2021 at 10:18 AM Aaron Ballman <aaron at aaronballman.com>
>> wrote:
>>
>>> Thank you for proposing this! I think making AST matchers easier to
>>> use and debug is a fantastic goal. Some thoughts below.
>>>
>>> On Tue, Nov 23, 2021 at 1:20 PM James King via cfe-dev
>>> <cfe-dev at lists.llvm.org> wrote:
>>> >
>>> > Objective
>>> >
>>> > The goal for this proposal is to improve the experience of debugging
>>> Clang’s AST matchers and of understanding what they are doing.
>>> >
>>> > Background
>>> >
>>> > AST Matchers are a powerful tool that allows users to pick out nodes
>>> in the Clang AST, provided they match the specified constraints and
>>> criteria. Yet, creating a well-formed matcher can be difficult. Matchers
>>> can become complex when multiple layers of matchers are nested to create a
>>> larger compound matcher, which can be necessary to select parts of the AST
>>> that fulfill strict criteria. Since matchers only provide a boolean
>>> response as to whether they obtained any matches or did not obtain any
>>> matches, the user receives no explanation for when a matcher fails to
>>> produce any bindings. Even when a matcher produces bindings, performing
>>> introspection to debug false positive bindings can be tedious. When the
>>> result of a matcher is not expected, determining how the result was
>>> obtained is made difficult by this limited response. Without any other
>>> options to interact with matchers, other than trial-and-error based on
>>> binary feedback, this lack of transparency makes it difficult to introspect
>>> matcher behavior and identify problems (such as which part of a compound
>>> matcher is inconsistent with expectations). This often causes the process
>>> of debugging a faulty matcher to be slow, confusing, and frustrating.
>>> >
>>> > Design ideas
>>> >
>>> > Proposed Solution
>>> >
>>> > The proposed solution is composed of two parts.
>>> >
>>> > The first part (https://reviews.llvm.org/D113917) is to expose a
>>> getName method on the Matcher class. Currently, there is no easy way to
>>> access any form of identification information for any matcher. This method
>>> will act as a hook that allows for accessing the name of a matcher
>>> programmatically. While this method will be helpful for the implementation
>>> for the withTag matcher, it could also prove to be useful for other forms
>>> of matcher introspection/debugging.
>>> >
>>> > The second part (https://reviews.llvm.org/D113943) is to create an
>>> introspection node matcher (named the withTag matcher). This matcher will
>>> act as a way to enable verbose logging for matchers, which introduces the
>>> possibility for a workflow similar to printf debugging. This matcher is
>>> templated - it should accept an inner matcher of any type, and itself be a
>>> matcher of that type. For example, if the withTag matcher has an inner
>>> matcher of type Matcher<TypeLocMatcher>, it will return a
>>> Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher
>>> will not affect the outcome of the inner matcher and should directly return
>>> whatever the inner matcher returns.
>>> >
>>> > Before:
>>> >
>>> > varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))
>>> >
>>> >
>>> > After:
>>> >
>>> > withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))
>>> >
>>> >
>>> > This matcher will be able to provide information such as:
>>> >
>>> > Matcher Type
>>> >
>>> > Matcher Name
>>> >
>>> > Matcher Success
>>> >
>>> > Node Kind
>>> >
>>> > Node Source
>>> >
>>> > The matcher will provide output by writing to stderr.
>>>
>>> I'm not opposed to it, but making this user configurable seems like a
>>> better approach (not everyone wants to write to stderr or go through
>>> redirections to get the data into a file). Also, we may want to
>>> consider what formats we want to provide this information. You give an
>>> example below, but this sounds a lot like a case where we might want
>>> to provide structured output (e.g., JSON) so that other tools can
>>> consume it to provide a better user interface. (That seems like
>>> something that can be done in follow-ups rather than something
>>> required for the initial feature.)
>>>
>>> > Though the exact format of the output is subject to change, it could
>>> look like this, for the matcher withTag(varDecl(withTag(hasName("x")),
>>> hasTypeLoc(pointerTypeLoc()))):
>>> >
>>> > ⭐ Attempting new match
>>> >
>>> > Matcher Name: `VarDecl` Node
>>> >
>>> > Node Kind: TypedefDecl
>>> >
>>> > Node Value:
>>> >
>>> > ```
>>> >
>>> > typedef char *__builtin_va_list
>>> >
>>> > ```
>>> >
>>> > Node AST:
>>> >
>>> > TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit
>>> __builtin_va_list 'char *'
>>> >
>>> > `-PointerType 0x47e6360 'char *'
>>> >
>>> >   `-BuiltinType 0x47e54d0 'char'
>>> >
>>> > Result: Unsuccessful
>>> >
>>> > ✔️ Concluding attempt
>>> >
>>> >
>>> > ⭐ Attempting new match
>>> >
>>> > Matcher Name: `VarDecl` Node
>>> >
>>> > Node Kind: VarDecl
>>> >
>>> > Node Value:
>>> >
>>> > ```
>>> >
>>> > void *x
>>> >
>>> > ```
>>> >
>>> > Node AST:
>>> >
>>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>>> >
>>> > ⭐ Attempting new match
>>> >
>>> > Matcher Name: HasName
>>> >
>>> > Node Kind: VarDecl
>>> >
>>> > Node Value:
>>> >
>>> > ```
>>> >
>>> > void *x
>>> >
>>> > ```
>>> >
>>> > Node AST:
>>> >
>>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>>> >
>>> > Result: Successful
>>> >
>>> > ✔️ Concluding attempt
>>> >
>>> >
>>> > Result: Successful
>>> >
>>> > ✔️ Concluding attempt
>>> >
>>> >
>>> >
>>> > From this output, a user is able to see things such as information
>>> about each node the matcher attempts to match, information about which part
>>> of the compound matcher is being matched, and the success of each matcher.
>>> >
>>> >
>>> > Pros
>>> >
>>> > The design revolves around a matcher, which is a familiar
>>> pattern/construct
>>> >
>>> > The design has low overhead for using and removing - it requires
>>> wrapping one or more matchers with the withTag matcher.
>>> >
>>> > The design allows for a great degree of control - one can pick out
>>> certain matchers within a compound matcher to analyze.
>>> >
>>> > Cons
>>> >
>>> > One withTag matcher will only be able to report information about the
>>> inner matcher, and not necessarily matchers within the inner matcher
>>> >
>>> > However, it may be possible to provide a hook to access the inner
>>> matcher’s inner matcher
>>>
>>> Another con that I see is: this is likely to get very chatty when
>>> working on complex matchers. We might need some notion of a verbosity
>>> level for the information.
>>>
>>> > Printf debugging is just one method of diagnosing issues, and may not
>>> be everyone’s preferred approach to debugging. Alternative debugging
>>> approaches could include something like steveire’s debugger (
>>> https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/
>>> ).
>>>
>>> FWIW, I loved a lot of the concepts Stephen was developing, so
>>> enabling this kind of technology would be fantastic. One thing I think
>>> is worth considering is that clang-query is currently the go-to tool
>>> for developing complex AST matchers because of the REPL nature of the
>>> tool. While printf-style debugging gets more information, I do wonder
>>> if we should be considering how to integrate this into clang-query
>>> more directly.
>>>
>>> > Questions
>>> >
>>> > Currently, getName returns a string. Should we consider returning an
>>> enum, instead?
>>>
>>> We sometimes rename matchers, so I think using an enumeration may have
>>> some nice (or possibly annoying, depending on perspective) properties.
>>> If we rename something and getName() returns a string, it's more of a
>>> silent change; using an enumeration is more likely to produce a
>>> compile error if someone is relying on the name for some reason. I
>>> don't currently have a strong preference because I can see arguments
>>> either way.
>>>
>>> > One reason we have opted for a string, as of now, is because users may
>>> define their own private matchers which may not have a value in the enum.
>>>
>>> That's a pretty compelling reason to go with the string approach.
>>>
>>> > If getName were to return a string, what formats/conventions should we
>>> adhere to?
>>> >
>>> > Currently, the returned strings are not named in a very formal
>>> approach. For example, most traversal and narrowing matchers have a getName
>>> that returns the name of the matcher exactly, whereas most node matchers
>>> have a getName that returns a string containing the name of the node (ie.
>>> VarDecl instead of varDecl)
>>>
>>> Naively, I think users are going to expect getName() to return the
>>> name of the matcher as it is written (that also makes it easier to
>>> visually distinguish between the matcher and the matched node).
>>>
>>> ~Aaron
>>>
>>> >
>>> > _______________________________________________
>>> > cfe-dev mailing list
>>> > cfe-dev at lists.llvm.org
>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211208/f1d420de/attachment-0001.html>