[cfe-dev] [RFC] Improving Support for Debugging Matchers

Yitzhak Mandelbaum via cfe-dev cfe-dev at lists.llvm.org
Fri Dec 10 05:46:23 PST 2021


Agreed -- visual nesting would be nice. However, that seems to require
support within the context (like `traverse`) which we haven't planned for
yet.

On Thu, Dec 9, 2021 at 6:09 PM Adrian Vogelsgesang <
avogelsgesang at salesforce.com> wrote:

> > Can you clarify your question about nesting? I'm not sure what you mean
> by "similar to ``".  Are you referring to visual indentation?
>
> I just realized that your initial example already answered my question.
> Yes, I was referring to visual indentation.
> E.g., I would
> > withTag(cxxMethodDecl(ofClass(hasName("C")),
> forEachOverridden(withTag(cxxMethodDecl()))))
> prefer to produce the output
> > ⭐ Attempting new match (outer `cxxMethodDecl`)
> > ⭐⭐ Attempting new match (inner `cxxMethodDecl`)
> > ⭐✔️ Concluding match (inner `cxxMethodDecl`)
> > ⭐⭐ Attempting new match (inner `cxxMethodDecl`)
> > ⭐✔️ Concluding match (inner `cxxMethodDecl`)
> > ✔️ Concluding attempt (outer `cxxMethodDecl`)
> instead of
> > ⭐ Attempting new match (outer `cxxMethodDecl`)
> > ⭐ Attempting new match (inner `cxxMethodDecl`)
> > ✔️ Concluding match (inner `cxxMethodDecl`)
> > ⭐ Attempting new match (inner `cxxMethodDecl`)
> > ✔️ Concluding match (inner `cxxMethodDecl`)
> > ✔️ Concluding attempt (outer `cxxMethodDecl`
> as it would allow me to see the nesting depth more easily.
> Your initial example shows that the proposed patch would print the 2nd
> variant, i.e. without indentation
>
> On Wed, Dec 8, 2021 at 2:19 PM Yitzhak Mandelbaum <yitzhakm at google.com>
> wrote:
>
>> Adrian,
>>
>> I'm glad to hear this sounds useful!
>>
>> Regarding the name: we think it's important for the case of nested
>> annotations. I agree that if there's just one, you don't need the name, but
>> it's helpful once you have multiple logging matchers. That said, the
>> source of names can be flexible. We actually started w/ custom names
>> (because that's easier, avoiding any need for infrastructure changes), but
>> moved to auto-naming because it reduced the clutter in the matcher (and
>> user burden). We're also considering a modal version of `withTag` just like
>> `traverse`, where surrounding a given matcher turns on logging for it and
>> its descendants. In that case, auto-naming becomes even more important.
>>
>> It seems reasonable to allow the user to choose between all the naming
>> options we've discussed: auto-name, custom name and no-name. The question
>> will only be what the default choice is.
>>
>> Can you clarify your question about nesting? I'm not sure what you mean
>> by "similar to ``".  Are you referring to visual indentation?
>>
>> On Wed, Dec 8, 2021 at 4:01 AM Adrian Vogelsgesang <
>> avogelsgesang at salesforce.com> wrote:
>>
>>> Hi,
>>>
>>> Very nice proposal! I am very new to AST matchers and clang in general,
>>> and having gone through quite a few misunderstandings and debugging lately,
>>> I would very much appreciate an improved debugging experience! (At the same
>>> time, take my opinions with a grain of salt - I don't have a good overview
>>> about the AST matcher design, yet)
>>>
>>>    - I consider the printf-debugging provided by
>>>    `withTag`/`withIntrospection` as useful. Even if there were other, more
>>>    powerful debugging utilities, I would still consider it useful.
>>>    - I would probably use `withTag` primarily from clang-query. Is
>>>    https://reviews.llvm.org/D113943 sufficient for `withTag` to work
>>>    also in clang-query? Or is there more plumbing required?
>>>    - Output format:
>>>       - To me, `Matcher Name: `VarDecl` Node` feels like unnecessary
>>>       clutter. I am usually debugging a matcher which I just wrote a couple of
>>>       minutes ago. I still remember which matcher I used
>>>       - Instead, I would prefer a possibility to provide a custom name.
>>>       Something like `withTag("myName", varDecl(hasName("x"),
>>>       hasTypeLoc(typeLoc().bind("TL"))))`. This would allow me to distinguish the
>>>       output for, e.g., multiple matchers of the same type. (Effectively, this
>>>       approach might make `getName` unnecessary?)
>>>       - How would output for nested `withTag` look like? E.g. for
>>>       `withTag(cxxMethodDecl(ofClass(hasName("C")),
>>>       forEachOverridden(withTag(cxxMethodDecl()))))`? Would the match attempts be
>>>       nested similar to ``? Or will it all be flattened?
>>>
>>> Cheers,
>>> Adrian
>>>
>>> On Tue, Dec 7, 2021 at 8:59 PM Yitzhak Mandelbaum via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Gentle ping on this proposal. James has finished his internship, but
>>>> I'm eager to see the work through. :)
>>>>
>>>> Thanks,
>>>> Yitzhak
>>>>
>>>> On Mon, Nov 29, 2021 at 5:21 PM James King via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hello Aaron,
>>>>>
>>>>> Thanks for the feedback! It seems that the goal and comments here can
>>>>> be broken down into two categories:
>>>>>
>>>>>
>>>>>    -
>>>>>
>>>>>    Establishing some infrastructure to provide information about
>>>>>    matchers
>>>>>    -
>>>>>
>>>>>    Creating tooling to access this information
>>>>>
>>>>>
>>>>> Focusing on the first point, the decision regarding what to return
>>>>> from `getName` has many different feasible options. Potentially, we may
>>>>> want the function to return additional information about the matcher. In
>>>>> that sense, I agree that there may be a better name than `getName`.
>>>>> Additionally, instead of returning a string or an enum directly, perhaps it
>>>>> would be viable to return a “MatcherInfo” object (exposed through
>>>>> `getMatcherInfo`). That object could initially carry the name (exposed
>>>>> through `getName`), but we could later add support for an enum and other
>>>>> useful information (ex. interesting properties, child matchers, etc.).
>>>>>
>>>>> As for the second point, I agree that there are a lot of options to
>>>>> consider for presenting this information (integrating with clang-query,
>>>>> structuring output, etc). All the ideas you proposed seem to be viable -
>>>>> one of the benefits of breaking this problem down into the two above points
>>>>> is that once the infrastructure is established, all of these solutions can
>>>>> be explored. This should allow us to achieve the goal of supporting
>>>>> multiple debugging workflows. The `withTag` matcher, specifically, can be
>>>>> useful for cases such as analyzing unit tests which fail due to faulty
>>>>> matchers - wrapping the matcher in a `withTag` matcher would be a quick way
>>>>> to gain some insight into its behavior. Ultimately, the effort of providing
>>>>> better matcher introspection can be continued by expanding the
>>>>> functionality of the `withTag` matcher, or by working on different
>>>>> approaches that utilize `getMatcherInfo`.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>>>
>>>>> On Mon, Nov 29, 2021 at 10:18 AM Aaron Ballman <aaron at aaronballman.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you for proposing this! I think making AST matchers easier to
>>>>>> use and debug is a fantastic goal. Some thoughts below.
>>>>>>
>>>>>> On Tue, Nov 23, 2021 at 1:20 PM James King via cfe-dev
>>>>>> <cfe-dev at lists.llvm.org> wrote:
>>>>>> >
>>>>>> > Objective
>>>>>> >
>>>>>> > The goal for this proposal is to improve the experience of
>>>>>> debugging Clang’s AST matchers and of understanding what they are doing.
>>>>>> >
>>>>>> > Background
>>>>>> >
>>>>>> > AST Matchers are a powerful tool that allows users to pick out
>>>>>> nodes in the Clang AST, provided they match the specified constraints and
>>>>>> criteria. Yet, creating a well-formed matcher can be difficult. Matchers
>>>>>> can become complex when multiple layers of matchers are nested to create a
>>>>>> larger compound matcher, which can be necessary to select parts of the AST
>>>>>> that fulfill strict criteria. Since matchers only provide a boolean
>>>>>> response as to whether they obtained any matches or did not obtain any
>>>>>> matches, the user receives no explanation for when a matcher fails to
>>>>>> produce any bindings. Even when a matcher produces bindings, performing
>>>>>> introspection to debug false positive bindings can be tedious. When the
>>>>>> result of a matcher is not expected, determining how the result was
>>>>>> obtained is made difficult by this limited response. Without any other
>>>>>> options to interact with matchers, other than trial-and-error based on
>>>>>> binary feedback, this lack of transparency makes it difficult to introspect
>>>>>> matcher behavior and identify problems (such as which part of a compound
>>>>>> matcher is inconsistent with expectations). This often causes the process
>>>>>> of debugging a faulty matcher to be slow, confusing, and frustrating.
>>>>>> >
>>>>>> > Design ideas
>>>>>> >
>>>>>> > Proposed Solution
>>>>>> >
>>>>>> > The proposed solution is composed of two parts.
>>>>>> >
>>>>>> > The first part (https://reviews.llvm.org/D113917) is to expose a
>>>>>> getName method on the Matcher class. Currently, there is no easy way to
>>>>>> access any form of identification information for any matcher. This method
>>>>>> will act as a hook that allows for accessing the name of a matcher
>>>>>> programmatically. While this method will be helpful for the implementation
>>>>>> for the withTag matcher, it could also prove to be useful for other forms
>>>>>> of matcher introspection/debugging.
>>>>>> >
>>>>>> > The second part (https://reviews.llvm.org/D113943) is to create an
>>>>>> introspection node matcher (named the withTag matcher). This matcher will
>>>>>> act as a way to enable verbose logging for matchers, which introduces the
>>>>>> possibility for a workflow similar to printf debugging. This matcher is
>>>>>> templated - it should accept an inner matcher of any type, and itself be a
>>>>>> matcher of that type. For example, if the withTag matcher has an inner
>>>>>> matcher of type Matcher<TypeLocMatcher>, it will return a
>>>>>> Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher
>>>>>> will not affect the outcome of the inner matcher and should directly return
>>>>>> whatever the inner matcher returns.
>>>>>> >
>>>>>> > Before:
>>>>>> >
>>>>>> > varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))
>>>>>> >
>>>>>> >
>>>>>> > After:
>>>>>> >
>>>>>> > withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))
>>>>>> >
>>>>>> >
>>>>>> > This matcher will be able to provide information such as:
>>>>>> >
>>>>>> > Matcher Type
>>>>>> >
>>>>>> > Matcher Name
>>>>>> >
>>>>>> > Matcher Success
>>>>>> >
>>>>>> > Node Kind
>>>>>> >
>>>>>> > Node Source
>>>>>> >
>>>>>> > The matcher will provide output by writing to stderr.
>>>>>>
>>>>>> I'm not opposed to it, but making this user configurable seems like a
>>>>>> better approach (not everyone wants to write to stderr or go through
>>>>>> redirections to get the data into a file). Also, we may want to
>>>>>> consider what formats we want to provide this information. You give an
>>>>>> example below, but this sounds a lot like a case where we might want
>>>>>> to provide structured output (e.g., JSON) so that other tools can
>>>>>> consume it to provide a better user interface. (That seems like
>>>>>> something that can be done in follow-ups rather than something
>>>>>> required for the initial feature.)
>>>>>>
>>>>>> > Though the exact format of the output is subject to change, it
>>>>>> could look like this, for the matcher
>>>>>> withTag(varDecl(withTag(hasName("x")), hasTypeLoc(pointerTypeLoc()))):
>>>>>> >
>>>>>> > ⭐ Attempting new match
>>>>>> >
>>>>>> > Matcher Name: `VarDecl` Node
>>>>>> >
>>>>>> > Node Kind: TypedefDecl
>>>>>> >
>>>>>> > Node Value:
>>>>>> >
>>>>>> > ```
>>>>>> >
>>>>>> > typedef char *__builtin_va_list
>>>>>> >
>>>>>> > ```
>>>>>> >
>>>>>> > Node AST:
>>>>>> >
>>>>>> > TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit
>>>>>> __builtin_va_list 'char *'
>>>>>> >
>>>>>> > `-PointerType 0x47e6360 'char *'
>>>>>> >
>>>>>> >   `-BuiltinType 0x47e54d0 'char'
>>>>>> >
>>>>>> > Result: Unsuccessful
>>>>>> >
>>>>>> > ✔️ Concluding attempt
>>>>>> >
>>>>>> >
>>>>>> > ⭐ Attempting new match
>>>>>> >
>>>>>> > Matcher Name: `VarDecl` Node
>>>>>> >
>>>>>> > Node Kind: VarDecl
>>>>>> >
>>>>>> > Node Value:
>>>>>> >
>>>>>> > ```
>>>>>> >
>>>>>> > void *x
>>>>>> >
>>>>>> > ```
>>>>>> >
>>>>>> > Node AST:
>>>>>> >
>>>>>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>>>>>> >
>>>>>> > ⭐ Attempting new match
>>>>>> >
>>>>>> > Matcher Name: HasName
>>>>>> >
>>>>>> > Node Kind: VarDecl
>>>>>> >
>>>>>> > Node Value:
>>>>>> >
>>>>>> > ```
>>>>>> >
>>>>>> > void *x
>>>>>> >
>>>>>> > ```
>>>>>> >
>>>>>> > Node AST:
>>>>>> >
>>>>>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>>>>>> >
>>>>>> > Result: Successful
>>>>>> >
>>>>>> > ✔️ Concluding attempt
>>>>>> >
>>>>>> >
>>>>>> > Result: Successful
>>>>>> >
>>>>>> > ✔️ Concluding attempt
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > From this output, a user is able to see things such as information
>>>>>> about each node the matcher attempts to match, information about which part
>>>>>> of the compound matcher is being matched, and the success of each matcher.
>>>>>> >
>>>>>> >
>>>>>> > Pros
>>>>>> >
>>>>>> > The design revolves around a matcher, which is a familiar
>>>>>> pattern/construct
>>>>>> >
>>>>>> > The design has low overhead for using and removing - it requires
>>>>>> wrapping one or more matchers with the withTag matcher.
>>>>>> >
>>>>>> > The design allows for a great degree of control - one can pick out
>>>>>> certain matchers within a compound matcher to analyze.
>>>>>> >
>>>>>> > Cons
>>>>>> >
>>>>>> > One withTag matcher will only be able to report information about
>>>>>> the inner matcher, and not necessarily matchers within the inner matcher
>>>>>> >
>>>>>> > However, it may be possible to provide a hook to access the inner
>>>>>> matcher’s inner matcher
>>>>>>
>>>>>> Another con that I see is: this is likely to get very chatty when
>>>>>> working on complex matchers. We might need some notion of a verbosity
>>>>>> level for the information.
>>>>>>
>>>>>> > Printf debugging is just one method of diagnosing issues, and may
>>>>>> not be everyone’s preferred approach to debugging. Alternative debugging
>>>>>> approaches could include something like steveire’s debugger (
>>>>>> https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/
>>>>>> ).
>>>>>>
>>>>>> FWIW, I loved a lot of the concepts Stephen was developing, so
>>>>>> enabling this kind of technology would be fantastic. One thing I think
>>>>>> is worth considering is that clang-query is currently the go-to tool
>>>>>> for developing complex AST matchers because of the REPL nature of the
>>>>>> tool. While printf-style debugging gets more information, I do wonder
>>>>>> if we should be considering how to integrate this into clang-query
>>>>>> more directly.
>>>>>>
>>>>>> > Questions
>>>>>> >
>>>>>> > Currently, getName returns a string. Should we consider returning
>>>>>> an enum, instead?
>>>>>>
>>>>>> We sometimes rename matchers, so I think using an enumeration may have
>>>>>> some nice (or possibly annoying, depending on perspective) properties.
>>>>>> If we rename something and getName() returns a string, it's more of a
>>>>>> silent change; using an enumeration is more likely to produce a
>>>>>> compile error if someone is relying on the name for some reason. I
>>>>>> don't currently have a strong preference because I can see arguments
>>>>>> either way.
>>>>>>
>>>>>> > One reason we have opted for a string, as of now, is because users
>>>>>> may define their own private matchers which may not have a value in the
>>>>>> enum.
>>>>>>
>>>>>> That's a pretty compelling reason to go with the string approach.
>>>>>>
>>>>>> > If getName were to return a string, what formats/conventions should
>>>>>> we adhere to?
>>>>>> >
>>>>>> > Currently, the returned strings are not named in a very formal
>>>>>> approach. For example, most traversal and narrowing matchers have a getName
>>>>>> that returns the name of the matcher exactly, whereas most node matchers
>>>>>> have a getName that returns a string containing the name of the node (ie.
>>>>>> VarDecl instead of varDecl)
>>>>>>
>>>>>> Naively, I think users are going to expect getName() to return the
>>>>>> name of the matcher as it is written (that also makes it easier to
>>>>>> visually distinguish between the matcher and the matched node).
>>>>>>
>>>>>> ~Aaron
>>>>>>
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > cfe-dev mailing list
>>>>>> > cfe-dev at lists.llvm.org
>>>>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211210/769de179/attachment-0001.html>


More information about the cfe-dev mailing list