[cfe-dev] [RFC] Improving Support for Debugging Matchers

Thu Dec 9 15:09:06 PST 2021

> Can you clarify your question about nesting? I'm not sure what you mean
by "similar to ``".  Are you referring to visual indentation?

I just realized that your initial example already answered my question.
Yes, I was referring to visual indentation.
E.g., I would
> withTag(cxxMethodDecl(ofClass(hasName("C")),
forEachOverridden(withTag(cxxMethodDecl()))))
prefer to produce the output
> ⭐ Attempting new match (outer `cxxMethodDecl`)
> ⭐⭐ Attempting new match (inner `cxxMethodDecl`)
> ⭐✔️ Concluding match (inner `cxxMethodDecl`)
> ⭐⭐ Attempting new match (inner `cxxMethodDecl`)
> ⭐✔️ Concluding match (inner `cxxMethodDecl`)
> ✔️ Concluding attempt (outer `cxxMethodDecl`)
instead of
> ⭐ Attempting new match (outer `cxxMethodDecl`)
> ⭐ Attempting new match (inner `cxxMethodDecl`)
> ✔️ Concluding match (inner `cxxMethodDecl`)
> ⭐ Attempting new match (inner `cxxMethodDecl`)
> ✔️ Concluding match (inner `cxxMethodDecl`)
> ✔️ Concluding attempt (outer `cxxMethodDecl`
as it would allow me to see the nesting depth more easily.
Your initial example shows that the proposed patch would print the 2nd
variant, i.e. without indentation

On Wed, Dec 8, 2021 at 2:19 PM Yitzhak Mandelbaum <yitzhakm at google.com>
wrote:

> Adrian,
>
> I'm glad to hear this sounds useful!
>
> Regarding the name: we think it's important for the case of nested
> annotations. I agree that if there's just one, you don't need the name, but
> it's helpful once you have multiple logging matchers. That said, the
> source of names can be flexible. We actually started w/ custom names
> (because that's easier, avoiding any need for infrastructure changes), but
> moved to auto-naming because it reduced the clutter in the matcher (and
> user burden). We're also considering a modal version of `withTag` just like
> `traverse`, where surrounding a given matcher turns on logging for it and
> its descendants. In that case, auto-naming becomes even more important.
>
> It seems reasonable to allow the user to choose between all the naming
> options we've discussed: auto-name, custom name and no-name. The question
> will only be what the default choice is.
>
> Can you clarify your question about nesting? I'm not sure what you mean by
> "similar to ``".  Are you referring to visual indentation?
>
> On Wed, Dec 8, 2021 at 4:01 AM Adrian Vogelsgesang <
> avogelsgesang at salesforce.com> wrote:
>
>> Hi,
>>
>> Very nice proposal! I am very new to AST matchers and clang in general,
>> and having gone through quite a few misunderstandings and debugging lately,
>> I would very much appreciate an improved debugging experience! (At the same
>> time, take my opinions with a grain of salt - I don't have a good overview
>> about the AST matcher design, yet)
>>
>>    - I consider the printf-debugging provided by
>>    `withTag`/`withIntrospection` as useful. Even if there were other, more
>>    powerful debugging utilities, I would still consider it useful.
>>    - I would probably use `withTag` primarily from clang-query. Is
>>    https://reviews.llvm.org/D113943 sufficient for `withTag` to work
>>    also in clang-query? Or is there more plumbing required?
>>    - Output format:
>>       - To me, `Matcher Name: `VarDecl` Node` feels like unnecessary
>>       clutter. I am usually debugging a matcher which I just wrote a couple of
>>       minutes ago. I still remember which matcher I used
>>       - Instead, I would prefer a possibility to provide a custom name.
>>       Something like `withTag("myName", varDecl(hasName("x"),
>>       hasTypeLoc(typeLoc().bind("TL"))))`. This would allow me to distinguish the
>>       output for, e.g., multiple matchers of the same type. (Effectively, this
>>       approach might make `getName` unnecessary?)
>>       - How would output for nested `withTag` look like? E.g. for
>>       `withTag(cxxMethodDecl(ofClass(hasName("C")),
>>       forEachOverridden(withTag(cxxMethodDecl()))))`? Would the match attempts be
>>       nested similar to ``? Or will it all be flattened?
>>
>> Cheers,
>> Adrian
>>
>> On Tue, Dec 7, 2021 at 8:59 PM Yitzhak Mandelbaum via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> Gentle ping on this proposal. James has finished his internship, but I'm
>>> eager to see the work through. :)
>>>
>>> Thanks,
>>> Yitzhak
>>>
>>> On Mon, Nov 29, 2021 at 5:21 PM James King via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> Hello Aaron,
>>>>
>>>> Thanks for the feedback! It seems that the goal and comments here can
>>>> be broken down into two categories:
>>>>
>>>>
>>>>    -
>>>>
>>>>    Establishing some infrastructure to provide information about
>>>>    matchers
>>>>    -
>>>>
>>>>    Creating tooling to access this information
>>>>
>>>>
>>>> Focusing on the first point, the decision regarding what to return from
>>>> `getName` has many different feasible options. Potentially, we may want the
>>>> function to return additional information about the matcher. In that sense,
>>>> I agree that there may be a better name than `getName`. Additionally,
>>>> instead of returning a string or an enum directly, perhaps it would be
>>>> viable to return a “MatcherInfo” object (exposed through `getMatcherInfo`).
>>>> That object could initially carry the name (exposed through `getName`), but
>>>> we could later add support for an enum and other useful information (ex.
>>>> interesting properties, child matchers, etc.).
>>>>
>>>> As for the second point, I agree that there are a lot of options to
>>>> consider for presenting this information (integrating with clang-query,
>>>> structuring output, etc). All the ideas you proposed seem to be viable -
>>>> one of the benefits of breaking this problem down into the two above points
>>>> is that once the infrastructure is established, all of these solutions can
>>>> be explored. This should allow us to achieve the goal of supporting
>>>> multiple debugging workflows. The `withTag` matcher, specifically, can be
>>>> useful for cases such as analyzing unit tests which fail due to faulty
>>>> matchers - wrapping the matcher in a `withTag` matcher would be a quick way
>>>> to gain some insight into its behavior. Ultimately, the effort of providing
>>>> better matcher introspection can be continued by expanding the
>>>> functionality of the `withTag` matcher, or by working on different
>>>> approaches that utilize `getMatcherInfo`.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>>
>>>> On Mon, Nov 29, 2021 at 10:18 AM Aaron Ballman <aaron at aaronballman.com>
>>>> wrote:
>>>>
>>>>> Thank you for proposing this! I think making AST matchers easier to
>>>>> use and debug is a fantastic goal. Some thoughts below.
>>>>>
>>>>> On Tue, Nov 23, 2021 at 1:20 PM James King via cfe-dev
>>>>> <cfe-dev at lists.llvm.org> wrote:
>>>>> >
>>>>> > Objective
>>>>> >
>>>>> > The goal for this proposal is to improve the experience of debugging
>>>>> Clang’s AST matchers and of understanding what they are doing.
>>>>> >
>>>>> > Background
>>>>> >
>>>>> > AST Matchers are a powerful tool that allows users to pick out nodes
>>>>> in the Clang AST, provided they match the specified constraints and
>>>>> criteria. Yet, creating a well-formed matcher can be difficult. Matchers
>>>>> can become complex when multiple layers of matchers are nested to create a
>>>>> larger compound matcher, which can be necessary to select parts of the AST
>>>>> that fulfill strict criteria. Since matchers only provide a boolean
>>>>> response as to whether they obtained any matches or did not obtain any
>>>>> matches, the user receives no explanation for when a matcher fails to
>>>>> produce any bindings. Even when a matcher produces bindings, performing
>>>>> introspection to debug false positive bindings can be tedious. When the
>>>>> result of a matcher is not expected, determining how the result was
>>>>> obtained is made difficult by this limited response. Without any other
>>>>> options to interact with matchers, other than trial-and-error based on
>>>>> binary feedback, this lack of transparency makes it difficult to introspect
>>>>> matcher behavior and identify problems (such as which part of a compound
>>>>> matcher is inconsistent with expectations). This often causes the process
>>>>> of debugging a faulty matcher to be slow, confusing, and frustrating.
>>>>> >
>>>>> > Design ideas
>>>>> >
>>>>> > Proposed Solution
>>>>> >
>>>>> > The proposed solution is composed of two parts.
>>>>> >
>>>>> > The first part (https://reviews.llvm.org/D113917) is to expose a
>>>>> getName method on the Matcher class. Currently, there is no easy way to
>>>>> access any form of identification information for any matcher. This method
>>>>> will act as a hook that allows for accessing the name of a matcher
>>>>> programmatically. While this method will be helpful for the implementation
>>>>> for the withTag matcher, it could also prove to be useful for other forms
>>>>> of matcher introspection/debugging.
>>>>> >
>>>>> > The second part (https://reviews.llvm.org/D113943) is to create an
>>>>> introspection node matcher (named the withTag matcher). This matcher will
>>>>> act as a way to enable verbose logging for matchers, which introduces the
>>>>> possibility for a workflow similar to printf debugging. This matcher is
>>>>> templated - it should accept an inner matcher of any type, and itself be a
>>>>> matcher of that type. For example, if the withTag matcher has an inner
>>>>> matcher of type Matcher<TypeLocMatcher>, it will return a
>>>>> Matcher<TypeLocMatcher>. In terms of functionality, the withTag matcher
>>>>> will not affect the outcome of the inner matcher and should directly return
>>>>> whatever the inner matcher returns.
>>>>> >
>>>>> > Before:
>>>>> >
>>>>> > varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL")))
>>>>> >
>>>>> >
>>>>> > After:
>>>>> >
>>>>> > withTag(varDecl(hasName("x"), hasTypeLoc(typeLoc().bind("TL"))))
>>>>> >
>>>>> >
>>>>> > This matcher will be able to provide information such as:
>>>>> >
>>>>> > Matcher Type
>>>>> >
>>>>> > Matcher Name
>>>>> >
>>>>> > Matcher Success
>>>>> >
>>>>> > Node Kind
>>>>> >
>>>>> > Node Source
>>>>> >
>>>>> > The matcher will provide output by writing to stderr.
>>>>>
>>>>> I'm not opposed to it, but making this user configurable seems like a
>>>>> better approach (not everyone wants to write to stderr or go through
>>>>> redirections to get the data into a file). Also, we may want to
>>>>> consider what formats we want to provide this information. You give an
>>>>> example below, but this sounds a lot like a case where we might want
>>>>> to provide structured output (e.g., JSON) so that other tools can
>>>>> consume it to provide a better user interface. (That seems like
>>>>> something that can be done in follow-ups rather than something
>>>>> required for the initial feature.)
>>>>>
>>>>> > Though the exact format of the output is subject to change, it could
>>>>> look like this, for the matcher withTag(varDecl(withTag(hasName("x")),
>>>>> hasTypeLoc(pointerTypeLoc()))):
>>>>> >
>>>>> > ⭐ Attempting new match
>>>>> >
>>>>> > Matcher Name: `VarDecl` Node
>>>>> >
>>>>> > Node Kind: TypedefDecl
>>>>> >
>>>>> > Node Value:
>>>>> >
>>>>> > ```
>>>>> >
>>>>> > typedef char *__builtin_va_list
>>>>> >
>>>>> > ```
>>>>> >
>>>>> > Node AST:
>>>>> >
>>>>> > TypedefDecl 0x479af10 <<invalid sloc>> <invalid sloc> implicit
>>>>> __builtin_va_list 'char *'
>>>>> >
>>>>> > `-PointerType 0x47e6360 'char *'
>>>>> >
>>>>> >   `-BuiltinType 0x47e54d0 'char'
>>>>> >
>>>>> > Result: Unsuccessful
>>>>> >
>>>>> > ✔️ Concluding attempt
>>>>> >
>>>>> >
>>>>> > ⭐ Attempting new match
>>>>> >
>>>>> > Matcher Name: `VarDecl` Node
>>>>> >
>>>>> > Node Kind: VarDecl
>>>>> >
>>>>> > Node Value:
>>>>> >
>>>>> > ```
>>>>> >
>>>>> > void *x
>>>>> >
>>>>> > ```
>>>>> >
>>>>> > Node AST:
>>>>> >
>>>>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>>>>> >
>>>>> > ⭐ Attempting new match
>>>>> >
>>>>> > Matcher Name: HasName
>>>>> >
>>>>> > Node Kind: VarDecl
>>>>> >
>>>>> > Node Value:
>>>>> >
>>>>> > ```
>>>>> >
>>>>> > void *x
>>>>> >
>>>>> > ```
>>>>> >
>>>>> > Node AST:
>>>>> >
>>>>> > VarDecl 0x479af80 <input.cc:1:1, col:7> col:7 x 'void *'
>>>>> >
>>>>> > Result: Successful
>>>>> >
>>>>> > ✔️ Concluding attempt
>>>>> >
>>>>> >
>>>>> > Result: Successful
>>>>> >
>>>>> > ✔️ Concluding attempt
>>>>> >
>>>>> >
>>>>> >
>>>>> > From this output, a user is able to see things such as information
>>>>> about each node the matcher attempts to match, information about which part
>>>>> of the compound matcher is being matched, and the success of each matcher.
>>>>> >
>>>>> >
>>>>> > Pros
>>>>> >
>>>>> > The design revolves around a matcher, which is a familiar
>>>>> pattern/construct
>>>>> >
>>>>> > The design has low overhead for using and removing - it requires
>>>>> wrapping one or more matchers with the withTag matcher.
>>>>> >
>>>>> > The design allows for a great degree of control - one can pick out
>>>>> certain matchers within a compound matcher to analyze.
>>>>> >
>>>>> > Cons
>>>>> >
>>>>> > One withTag matcher will only be able to report information about
>>>>> the inner matcher, and not necessarily matchers within the inner matcher
>>>>> >
>>>>> > However, it may be possible to provide a hook to access the inner
>>>>> matcher’s inner matcher
>>>>>
>>>>> Another con that I see is: this is likely to get very chatty when
>>>>> working on complex matchers. We might need some notion of a verbosity
>>>>> level for the information.
>>>>>
>>>>> > Printf debugging is just one method of diagnosing issues, and may
>>>>> not be everyone’s preferred approach to debugging. Alternative debugging
>>>>> approaches could include something like steveire’s debugger (
>>>>> https://steveire.wordpress.com/2019/04/16/debugging-clang-ast-matchers/
>>>>> ).
>>>>>
>>>>> FWIW, I loved a lot of the concepts Stephen was developing, so
>>>>> enabling this kind of technology would be fantastic. One thing I think
>>>>> is worth considering is that clang-query is currently the go-to tool
>>>>> for developing complex AST matchers because of the REPL nature of the
>>>>> tool. While printf-style debugging gets more information, I do wonder
>>>>> if we should be considering how to integrate this into clang-query
>>>>> more directly.
>>>>>
>>>>> > Questions
>>>>> >
>>>>> > Currently, getName returns a string. Should we consider returning an
>>>>> enum, instead?
>>>>>
>>>>> We sometimes rename matchers, so I think using an enumeration may have
>>>>> some nice (or possibly annoying, depending on perspective) properties.
>>>>> If we rename something and getName() returns a string, it's more of a
>>>>> silent change; using an enumeration is more likely to produce a
>>>>> compile error if someone is relying on the name for some reason. I
>>>>> don't currently have a strong preference because I can see arguments
>>>>> either way.
>>>>>
>>>>> > One reason we have opted for a string, as of now, is because users
>>>>> may define their own private matchers which may not have a value in the
>>>>> enum.
>>>>>
>>>>> That's a pretty compelling reason to go with the string approach.
>>>>>
>>>>> > If getName were to return a string, what formats/conventions should
>>>>> we adhere to?
>>>>> >
>>>>> > Currently, the returned strings are not named in a very formal
>>>>> approach. For example, most traversal and narrowing matchers have a getName
>>>>> that returns the name of the matcher exactly, whereas most node matchers
>>>>> have a getName that returns a string containing the name of the node (ie.
>>>>> VarDecl instead of varDecl)
>>>>>
>>>>> Naively, I think users are going to expect getName() to return the
>>>>> name of the matcher as it is written (that also makes it easier to
>>>>> visually distinguish between the matcher and the matched node).
>>>>>
>>>>> ~Aaron
>>>>>
>>>>> >
>>>>> > _______________________________________________
>>>>> > cfe-dev mailing list
>>>>> > cfe-dev at lists.llvm.org
>>>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211209/b0bb57a5/attachment-0001.html>