[cfe-dev] new design of RecursiveASTVisitor
Zhanyong Wan (λx.x x)
wan at google.com
Tue Jun 8 14:44:30 PDT 2010
Hi, all,
Craig and I implemented most of the new design. I uploaded the patch
to http://codereview.appspot.com/1607042 for easy reviewing - you can
download it from
http://codereview.appspot.com/download/issue1607042_1.diff
In the implementation we made a slight change to the proposed design:
instead of overloading Traverse() for Stmt, Type, and Decl, we prefer
to have separate TraverseStmt, TraverseType, and TraverseDecl
functions. We think that makes the code clearer.
Chandler plan to review it after 3pm today. Doug, would you have some
time to look at this today? Thanks!
On Fri, Jun 4, 2010 at 4:35 PM, Zhanyong Wan (λx.x x) <wan at google.com> wrote:
> Hi,
>
> While working on improving the RecursiveASTVisitor class, some of us saw an
> opportunity for a different design that's cleaner and easier to understand
> and use. Here's the proposal. Comments and suggestions welcome!
>
> Clang AST Visitor Design
>
> This document discusses the design of a customizable visitor class for the
> AST generated by the Clang C++ parser.
>
> Background
>
> Clang comes with two AST visitor classes: ASTVisitor and
> RecursiveASTVisitor. The ideas behind the two are similar. Both are
> implemented using the CRTP pattern. The former is considered an internal
> class and not for use in tools developed on top of Clang. The latter is
> supposed to be a public API.
>
> At this time, neither class is complete (i.e. they'll miss some AST nodes)
> and there are known bugs (e.g. some AST nodes may be visited more than
> once). We are working on improving RecursiveASTVisitor. Our experience shows
> that its design has much space for improvement. Therefore it would be a
> worthwhile exercise to see what we can come up with if we do it from
> scratch, without being constrained or influenced by the current
> RecursiveASTVisitor design.
>
> Concepts and Terminology
>
> Muddy concepts lead to poor, confusing designs. Before we explore the design
> space, it's important to clarify some key concepts.
>
> An AST is a tree-shaped data structure. A node in an AST can have 0 or many
> other nodes as its children.
> There a three categories of AST nodes in Clang: statements,
> declarations/definitions, and types. They are derived from class Stmt, Decl,
> and TypeLoc, respectively. In general, a node may have children of any or
> all of the three categories. The class hierarchies for Stmt, Decl, and
> TypeLoc can be found here.
> The AST visitor has three distinct tasks:
>
> traverse the entire AST (i.e. go to every node),
> at each node, walk up the class hierarchy, starting with the dynamic type of
> the node,
> for a given (node, type) combination (where 'type' is some base class of the
> dynamic type of 'node'), call a user-overridable function to actually
> "visit" the node.
>
> Design Goals
>
> easy-to-understand API
> performs a depth-first, preorder traversal on the AST (no child of a node
> can be visited until all work on the node has been done)
> complete coverage: each node in the AST must be visited
> no duplication: a node cannot be visited more than once
> common use cases should be easy
> fast
>
> Our Design
>
> The distinction between the AST visitor's three tasks (see the "Concepts and
> Terminology" section above) is important for understanding how the AST
> visitor works and using it effectively. Therefore we use naming conventions
> to highlight the role of a method of the AST visitor:
>
> TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the
> sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's
> responsibility to ensure the correct type).
> Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x)
> where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc*
> x) work similarly.
> WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to
> visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is
> the direct parent class of Foo (unless Foo has no parent), and then calls
> VisitFoo(x) (see the next bullet).
> VisitFoo(Foo* x) does task #3.
>
> Here's some pesudo code to show what these methods' implementation looks
> like:
>
> // A function returns false if it wants the traversal to abort (note that
> // the current RecursiveASTVisitor class returns true for abortion - we
> // feel that 'false' is much more intuitive).
>
> // Therefore, each caller needs to check the return code and propagate
> // it up appropriately. Such plumbing code is omitted here for clarify.
>
>
> // The entry point for visiting an AST rooted at x.
> bool Traverse(Decl* x) {
>
> switch(x->getKind()) {
>
> case kFoo:
> TraverseFoo((Foo*)x);
>
> break;
> case kBar:
> TraverseBar((Bar*)x);
>
> break;
> ...
> }
> }
>
>
> bool TraverseFoo(Foo* x) {
>
> WalkUpFromFoo(x);
> for each child node n of x {
>
> Traverse(n)
> }
> }
>
>
> bool WalkUpFromFoo(Foo* x) {
>
> WalkUpFromBar(x); // Bar is Foo's parent.
>
> VisitFoo(x);
> }
>
> bool WalkUpFromDecl(Decl* x) {
>
> // Decl has no parent, so no more walking up.
> VisitDecl(x);
>
> }
>
> // All Visit*() methods do nothing by default.
> bool VisitFoo(Foo* x) { return true; }
>
> Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function -
> they are the "skeleton" of the AST visitor and do the mundane plumbing that
> a normal user is not interested in. We recommend that only people truly
> understand how the traversal works can override them.
>
> A user is expected to override the Visit*() methods as needed. The default
> implementation of them does nothing.
>
> Returning a bool to indicate whether the traversal needs to be aborted
> requires much tedious, error-prone plumbing in the implementation.
> Exceptions are the most natural solution here, but unfortunately they are
> banned from Clang code.
>
> One advantage of this design over the existing RecursiveASTVisitor is that
> when a user overrides VisitFoo(), he only needs to do what he's interested
> in, without worrying about calling VisitFoo() or any other methods in the
> base AST visitor class (so less chance to shoot his own feet).
>
> With this design, all Visit*() methods on an AST ndoe are called before any
> Visit*() method is called on a child node of it. So the trace of the
> Visit*() calls is easy to understand and reason about.
>
> One thing people might not like about the design is that it uses more
> methods. I actually think it's a good thing that we have separate families
> of methods for separate roles. RecursiveASTVisitor's approach of lumping all
> logic (traversal, walking up the type hierarchy, and the custom visiting
> logic) in the same Visit*() method makes the implementation more tricky and
> error-prone - that's probably why there was so much confusion when we were
> trying to improve RecursiveASTVisitor.
>
> To ensure performance, we would use CRTP as the existing AST visitors do.
> That means two things:
>
> No need for virtual functions.
> The entire code is templated s.t. the compiler can easily inline trivial
> functions and do cross-function optimization.
>
> Use Cases
>
> Let's see how some typical use cases can be implemented using our AST
> visitor.
>
> Dump the AST
>
> Only 3 methods need to be overridden:
>
> VisitDecl(Decl* x) {
>
> print information about x;
> }
>
> VisitStmt(Stmt* x) {
>
> print information about x;
> }
>
> VisitTypeLoc(TypeLoc* x) {
>
> print information about x;
> }
>
> Search for an AST Node of Type Foo
>
> The tool would override VisitFoo(Foo* x) to return false if x is the node
> it's looking for.
>
> Find the Parents of an AST Node
>
> The tool would maintain a stack of the nodes that lead to the current node.
> It can build the stack by overriding exactly 3 methods (plumbing of the
> return code omitted):
>
> Traverse(Decl* d) {
>
> Push(d);
> Base::Traverse(d);
>
> Pop();
> }
>
> Traverse(Stmt* s) {
>
> Push(s);
> Base::Traverse(s);
>
> Pop();
> }
>
> Traverse(TypeLoc* t) {
>
> Push(t);
> Base::Traverse(t);
>
> Pop();
> }
>
> --
> Zhanyong
>
>
--
Zhanyong
More information about the cfe-dev
mailing list