[cfe-dev] Patch: AST Matcher Framwork and an example tool

Tue Jun 14 22:10:56 PDT 2011

On Tue, Jun 14, 2011 at 9:46 PM, Chris Lattner <clattner at apple.com> wrote:

>
> On Jun 2, 2011, at 10:32 AM, Manuel Klimek wrote:
>
> +cfe-dev, -cfe-commits
>
> As requested by Chris I have reverted the Tooling stuff, and am bumping
> this thread on cfe-dev.
>
>
> Hi Manuel,
>
> Sorry for the delay, I've been swamped with WWDC and trying to catch up on
> a huge backlog caused by it.
>

I figured :) That's why I waited with pinging until post-WWDC...

> Please find the original mail proposing the patch sent to cfe-commits on
> May 23rd inlined below.
>
> References to the original threads:
> Proposal:
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042163.html
> Commit:
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110530/042375.html
>
> Patches produced by the example tool sent out (has not got feedback yet):
> Clang:
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110523/042212.html
> LLVM:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110523/121387.html
>
> More context on why this is useful: when writing code transformation tools
> to do large scale changes on libraries with clang, we noticed that a lot of
> the code we wrote was "unrolled" matching of tree structures on the AST,
> which required a lot of dynamic casting and deeply nested control flow
> structures.
>
>
> Ok, I agree that our current infrastructure for pattern matching ASTs is...
> ehh... "suboptimal" ;-).  I'm really thrilled you guys are working on
> improving this.
>
> The future work (as outlined to some extend in the original email) is to
> build up code transformation tools (including the standard refactoring tools
> we know from other languages) and libraries to make it easy for developers
> to contribute both generally useful tools and make it easy to write one-off
> tools for cleanup transformations on larger code bases.
>
>
> This goal is really really fantastic, something that I've been hoping that
> Clang would grow into from the very beginning.
>
> The library we implemented based on those findings allows a very
> declarative way of expressing interesting parts of the AST that you want to
> work on, with a callback mechanism to have an imperative part that does the
> actual transformation. The way this library uses templates to build up a
> reverse hierarchy of the clang AST allows us to expand the library of
> matchers without needing to manually copy the structure of the AST while
> still providing full compile time type checking.
> We have used this library for many internal transformations (one more
> broadly applicable of them being included as an example tool), and Nico may
> be able to explain more how he's using the infrastructure for Chromium.
>
>
> My primary concern with this work is that it is built as a series of
> *compile-time* code constructs.  This means that someone working on a
> rewriter needs to rebuild the clang executable to do this.  Instead of doing
> this as a compile-time construct, have you considered building this as an
> extensible pattern matching tool where the pattern and replacement can be
> specified at runtime?  I'm envisioning something like a "regular expression
> for an AST".  You don't need to rebuild grep every time you want to search
> for something new in text.
>
> Even in the case where compiled-in code is required, having a more dynamic
> matcher would greatly simplify the code doing the matching.  Have you
> considered this approach?
>

Yes, we've considered this approach early on. We looked into both some Java
 and C-based solutions (see for example http://nighthacks.com/roller/jag/;
for Java there are really really bad examples that match Java to XML and do
xpath queries). The problem is that building that pattern matcher language
would not be straight-forward (simply writing C++ with a few globs would not
be enough for our current use cases, since C++ has a lot of implicit stuff
going on which we want to match, and just creating an arbitrary new language
doesn't necessarily seem better than the in-language DSL).

Considering that we want to eventually get a dynamic pattern matching
language, but we also want to get it right, we are currently spending our
time on the in-language DSL, and especially for the large scale stuff the
developers we work with need surprisingly little help (the included example
for replacing reduncant c_str() calls was created by a contributor who's not
worked on the implementation of the matcher library).

And when we get to the higher-level refactoring tools, the dynamic aspect
will be parameters to the refactoring, so the non-dynamic nature of the AST
matchers does not matter for that case.

When we look at the actual transformations, being in C++ again provides the
benefit that we can just work with the AST nodes we matched instead of
having to define some new way of dynamically specifying the transformations
- and re-binding the AST in a dynamic language is definitely out-of-scope
for us...

In the end, I agree that the vision to have a really nice dynamic
description of the matches is the ultimate goal, but for us this is
currently still a few quarters out. The C++ code provides really useful
abstractions to quickly describe matches and transformations on the AST,
with little code (as we can use C++ to provide the type safety and thus the
error checking on the AST nodes). The cost of a link-step while writing the
tools has so far not been a big obstacle, especially considering that our
main target users currently are a) C++ experts doing large scale code
transformations and b) writing refactoring tools that end users can use
without any knowledge about the AST.

Cheers,
/Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110614/b0314a3c/attachment.html>