[cfe-dev] new design of RecursiveASTVisitor
Douglas Gregor
dgregor at apple.com
Tue Jun 8 17:33:16 PDT 2010
On Jun 8, 2010, at 2:44 PM, Zhanyong Wan (λx.x x) wrote:
> Hi, all,
>
> Craig and I implemented most of the new design. I uploaded the patch
> to http://codereview.appspot.com/1607042 for easy reviewing - you can
> download it from
> http://codereview.appspot.com/download/issue1607042_1.diff
>
> In the implementation we made a slight change to the proposed design:
> instead of overloading Traverse() for Stmt, Type, and Decl, we prefer
> to have separate TraverseStmt, TraverseType, and TraverseDecl
> functions. We think that makes the code clearer.
>
> Chandler plan to review it after 3pm today. Doug, would you have some
> time to look at this today? Thanks!
I've added my comments up on codereview.appspot.com. Looks good!
- Doug
> On Fri, Jun 4, 2010 at 4:35 PM, Zhanyong Wan (λx.x x) <wan at google.com> wrote:
>> Hi,
>>
>> While working on improving the RecursiveASTVisitor class, some of us saw an
>> opportunity for a different design that's cleaner and easier to understand
>> and use. Here's the proposal. Comments and suggestions welcome!
>>
>> Clang AST Visitor Design
>>
>> This document discusses the design of a customizable visitor class for the
>> AST generated by the Clang C++ parser.
>>
>> Background
>>
>> Clang comes with two AST visitor classes: ASTVisitor and
>> RecursiveASTVisitor. The ideas behind the two are similar. Both are
>> implemented using the CRTP pattern. The former is considered an internal
>> class and not for use in tools developed on top of Clang. The latter is
>> supposed to be a public API.
>>
>> At this time, neither class is complete (i.e. they'll miss some AST nodes)
>> and there are known bugs (e.g. some AST nodes may be visited more than
>> once). We are working on improving RecursiveASTVisitor. Our experience shows
>> that its design has much space for improvement. Therefore it would be a
>> worthwhile exercise to see what we can come up with if we do it from
>> scratch, without being constrained or influenced by the current
>> RecursiveASTVisitor design.
>>
>> Concepts and Terminology
>>
>> Muddy concepts lead to poor, confusing designs. Before we explore the design
>> space, it's important to clarify some key concepts.
>>
>> An AST is a tree-shaped data structure. A node in an AST can have 0 or many
>> other nodes as its children.
>> There a three categories of AST nodes in Clang: statements,
>> declarations/definitions, and types. They are derived from class Stmt, Decl,
>> and TypeLoc, respectively. In general, a node may have children of any or
>> all of the three categories. The class hierarchies for Stmt, Decl, and
>> TypeLoc can be found here.
>> The AST visitor has three distinct tasks:
>>
>> traverse the entire AST (i.e. go to every node),
>> at each node, walk up the class hierarchy, starting with the dynamic type of
>> the node,
>> for a given (node, type) combination (where 'type' is some base class of the
>> dynamic type of 'node'), call a user-overridable function to actually
>> "visit" the node.
>>
>> Design Goals
>>
>> easy-to-understand API
>> performs a depth-first, preorder traversal on the AST (no child of a node
>> can be visited until all work on the node has been done)
>> complete coverage: each node in the AST must be visited
>> no duplication: a node cannot be visited more than once
>> common use cases should be easy
>> fast
>>
>> Our Design
>>
>> The distinction between the AST visitor's three tasks (see the "Concepts and
>> Terminology" section above) is important for understanding how the AST
>> visitor works and using it effectively. Therefore we use naming conventions
>> to highlight the role of a method of the AST visitor:
>>
>> TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the
>> sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's
>> responsibility to ensure the correct type).
>> Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x)
>> where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc*
>> x) work similarly.
>> WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to
>> visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is
>> the direct parent class of Foo (unless Foo has no parent), and then calls
>> VisitFoo(x) (see the next bullet).
>> VisitFoo(Foo* x) does task #3.
>>
>> Here's some pesudo code to show what these methods' implementation looks
>> like:
>>
>> // A function returns false if it wants the traversal to abort (note that
>> // the current RecursiveASTVisitor class returns true for abortion - we
>> // feel that 'false' is much more intuitive).
>>
>> // Therefore, each caller needs to check the return code and propagate
>> // it up appropriately. Such plumbing code is omitted here for clarify.
>>
>>
>> // The entry point for visiting an AST rooted at x.
>> bool Traverse(Decl* x) {
>>
>> switch(x->getKind()) {
>>
>> case kFoo:
>> TraverseFoo((Foo*)x);
>>
>> break;
>> case kBar:
>> TraverseBar((Bar*)x);
>>
>> break;
>> ...
>> }
>> }
>>
>>
>> bool TraverseFoo(Foo* x) {
>>
>> WalkUpFromFoo(x);
>> for each child node n of x {
>>
>> Traverse(n)
>> }
>> }
>>
>>
>> bool WalkUpFromFoo(Foo* x) {
>>
>> WalkUpFromBar(x); // Bar is Foo's parent.
>>
>> VisitFoo(x);
>> }
>>
>> bool WalkUpFromDecl(Decl* x) {
>>
>> // Decl has no parent, so no more walking up.
>> VisitDecl(x);
>>
>> }
>>
>> // All Visit*() methods do nothing by default.
>> bool VisitFoo(Foo* x) { return true; }
>>
>> Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function -
>> they are the "skeleton" of the AST visitor and do the mundane plumbing that
>> a normal user is not interested in. We recommend that only people truly
>> understand how the traversal works can override them.
>>
>> A user is expected to override the Visit*() methods as needed. The default
>> implementation of them does nothing.
>>
>> Returning a bool to indicate whether the traversal needs to be aborted
>> requires much tedious, error-prone plumbing in the implementation.
>> Exceptions are the most natural solution here, but unfortunately they are
>> banned from Clang code.
>>
>> One advantage of this design over the existing RecursiveASTVisitor is that
>> when a user overrides VisitFoo(), he only needs to do what he's interested
>> in, without worrying about calling VisitFoo() or any other methods in the
>> base AST visitor class (so less chance to shoot his own feet).
>>
>> With this design, all Visit*() methods on an AST ndoe are called before any
>> Visit*() method is called on a child node of it. So the trace of the
>> Visit*() calls is easy to understand and reason about.
>>
>> One thing people might not like about the design is that it uses more
>> methods. I actually think it's a good thing that we have separate families
>> of methods for separate roles. RecursiveASTVisitor's approach of lumping all
>> logic (traversal, walking up the type hierarchy, and the custom visiting
>> logic) in the same Visit*() method makes the implementation more tricky and
>> error-prone - that's probably why there was so much confusion when we were
>> trying to improve RecursiveASTVisitor.
>>
>> To ensure performance, we would use CRTP as the existing AST visitors do.
>> That means two things:
>>
>> No need for virtual functions.
>> The entire code is templated s.t. the compiler can easily inline trivial
>> functions and do cross-function optimization.
>>
>> Use Cases
>>
>> Let's see how some typical use cases can be implemented using our AST
>> visitor.
>>
>> Dump the AST
>>
>> Only 3 methods need to be overridden:
>>
>> VisitDecl(Decl* x) {
>>
>> print information about x;
>> }
>>
>> VisitStmt(Stmt* x) {
>>
>> print information about x;
>> }
>>
>> VisitTypeLoc(TypeLoc* x) {
>>
>> print information about x;
>> }
>>
>> Search for an AST Node of Type Foo
>>
>> The tool would override VisitFoo(Foo* x) to return false if x is the node
>> it's looking for.
>>
>> Find the Parents of an AST Node
>>
>> The tool would maintain a stack of the nodes that lead to the current node.
>> It can build the stack by overriding exactly 3 methods (plumbing of the
>> return code omitted):
>>
>> Traverse(Decl* d) {
>>
>> Push(d);
>> Base::Traverse(d);
>>
>> Pop();
>> }
>>
>> Traverse(Stmt* s) {
>>
>> Push(s);
>> Base::Traverse(s);
>>
>> Pop();
>> }
>>
>> Traverse(TypeLoc* t) {
>>
>> Push(t);
>> Base::Traverse(t);
>>
>> Pop();
>> }
>>
>> --
>> Zhanyong
>>
>>
>
>
>
> --
> Zhanyong
More information about the cfe-dev
mailing list