[cfe-dev] [RFC] Dynamic AST Matcher library

Samuel Benzaquen sbenza at google.com
Wed Apr 17 13:33:43 PDT 2013

Pinging this thread. There is still interest on this on our side.
I'll edit below with some comments.
Thanks, Sam

On Wed, Dec 5, 2012 at 12:09 PM, Samuel Benzaquen <sbenza at google.com> wrote:

> Hello all,
> We have been working on a layer on top of the AST Matcher library to
> support creating arbitrary matchers at runtime.
> I have a proof of concept working on the tooling branch and wanted to send
> a proposal to make it a real thing.
> The proposal doc is on google docs at:
> https://docs.google.com/document/d/1H3queuLjEYWdgpHzo7Mj3ReBXJdpSI8vcH9Y2p_4Y-8/edit
> Inlined copy below for those that prefer it this way. Feel free to comment
> on either.
> _Sam
> * Design: Dynamic AST MatchersThis document contains a design proposal
> for a dynamic AST matcher layer that supports creating arbitrary matchers
> at runtime, which will allow for generic tools to be developed.
> ContextThe existing AST matcher framework provides a very powerful and
> compact way to traverse the AST and find specific nodes by providing
> complex predicates.
> However, this framework requires that all predicates to be used at runtime
> are fully defined at compile time. The creation of these predicates can be
> a big challenge in some cases, and this limitation forces a ‘recompile’
> step on this creative loop.
> Also, without being able to specify matchers at runtime we can’t support
> more generic search use cases. For example, an editor-based search tool
> that supports an AST matcher predicates.
> Goals
>    - Runtime parsing of matcher expressions, including literals and
>    predicates. Feedback for syntax/semantic errors in the expressions
> *

The parser is just a tool to glue together the UI with the matching system.
It is not a goal on itself.
The simplified syntax (something in the line of S-expressions) keeps the
parser simple. Also, it allows for other features to be built on top of the
matcher generation, like:
 - Inject Id() nodes automatically to extract more info about the matches.
A web interface could use this information to color code the pieces, or add
 - Extract known submatchers that can facilitate indexing. Eg. any
hasName() submatcher could be found and used to drop compilation units
right away if they don't contain anything that has that name.

>    - Create matchers at runtime by name, with dynamically created
>    arguments if required by the matcher
>    - Query command line tool that runs the matcher on a specific
>    compilation unit and reports all matches
> *

This command line tool is both an example of how to use the system and a
nice to have utility.

Another tool that could be created on top, but is not part of this first
proposal, would be a server that can load/parse/index/cache code and can
respond to Matcher queries.

> *
> Non-goals
>    - Dynamic refactoring tools. They might be implemented using dynamic
>    matchers, but it is out of the scope for this library
>    - Full C++ expression support for predicates. Values are going to be
>    limited to literals (eg. 1, “foo”). Expressions like 1+2 will not be
>    parsed. Some useful constants (like INT_MAX) might be available
> *

As explained above, having a full C++ syntax would difficult the usage of
the matcher expressions. Also, it might require JIT'n of the expression to
actually support any C++ expression, which would make it a security issue.

> *Code locationThe main output of the project will be a library to be used
> by AST matcher related clang-tools, and should live close to the already
> existing AST matcher library, but as a separate library
> Matcher implementationOne aspect that simplifies the implementation of
> dynamic matchers is to have a uniform generic signature for creating them.
> Otherwise, whatever code that creates them will require explicit knowledge
> of each matcher, or at least of each signature.
> The generic signature is of the form VariantType (*)(const
> std::vector<VariantType>& args). A variant type instance can contain
> numbers, booleans, strings and Matchers, as well as error related
> information for return values.
> Instead of reimplementing all matchers to have a generic signature, we
> implement marshaller functions that provide the generic signature on top of
> the regular static one. The marshallers verify that the argument count and
> types are correct and calls the underlying constructor function.
> RegistryAt initialization of the framework, all known matchers are added
> into a registry. This registry provides a way to create matchers at runtime
> by name.
> This registry is the only component that would require maintenance when
> the ASTMatcher library is updated. Any new matcher, and new overloads for
> existing matchers, would have to be added to the registry. In most cases, a
> single line of code per matcher does the job.
> Matcher expression parsingA simple matcher expression parser, combined
> with the matcher registry, allows clients to go from a user provided string
> to a Matcher<T>. The output of the parser is a variant type also, which
> allows it to return Matchers as well as syntax and semantic errors.
> The parser interface provides a way for the client to inject and/or modify
> the matcher tree as it is being created. For example, a client could insert
> Id() matcher around specific nodes without having to do it in the string
> form.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130417/0ea22ff5/attachment.html>

More information about the cfe-dev mailing list