[cfe-dev] new design of RecursiveASTVisitor

Douglas Gregor dgregor at apple.com
Tue Jun 8 17:33:16 PDT 2010


On Jun 8, 2010, at 2:44 PM, Zhanyong Wan (λx.x x) wrote:

> Hi, all,
> 
> Craig and I implemented most of the new design.  I uploaded the patch
> to http://codereview.appspot.com/1607042 for easy reviewing - you can
> download it from
> http://codereview.appspot.com/download/issue1607042_1.diff
> 
> In the implementation we made a slight change to the proposed design:
> instead of overloading Traverse() for Stmt, Type, and Decl, we prefer
> to have separate TraverseStmt, TraverseType, and TraverseDecl
> functions.  We think that makes the code clearer.
> 
> Chandler plan to review it after 3pm today.  Doug, would you have some
> time to look at this today?  Thanks!

I've added my comments up on codereview.appspot.com. Looks good!

	- Doug

> On Fri, Jun 4, 2010 at 4:35 PM, Zhanyong Wan (λx.x x) <wan at google.com> wrote:
>> Hi,
>> 
>> While working on improving the RecursiveASTVisitor class, some of us saw an
>> opportunity for a different design that's cleaner and easier to understand
>> and use.  Here's the proposal.  Comments and suggestions welcome!
>> 
>> Clang AST Visitor Design
>> 
>> This document discusses the design of a customizable visitor class for the
>> AST generated by the Clang C++ parser.
>> 
>> Background
>> 
>> Clang comes with two AST visitor classes: ASTVisitor and
>> RecursiveASTVisitor. The ideas behind the two are similar. Both are
>> implemented using the CRTP pattern. The former is considered an internal
>> class and not for use in tools developed on top of Clang. The latter is
>> supposed to be a public API.
>> 
>> At this time, neither class is complete (i.e. they'll miss some AST nodes)
>> and there are known bugs (e.g. some AST nodes may be visited more than
>> once). We are working on improving RecursiveASTVisitor. Our experience shows
>> that its design has much space for improvement. Therefore it would be a
>> worthwhile exercise to see what we can come up with if we do it from
>> scratch, without being constrained or influenced by the current
>> RecursiveASTVisitor design.
>> 
>> Concepts and Terminology
>> 
>> Muddy concepts lead to poor, confusing designs. Before we explore the design
>> space, it's important to clarify some key concepts.
>> 
>> An AST is a tree-shaped data structure. A node in an AST can have 0 or many
>> other nodes as its children.
>> There a three categories of AST nodes in Clang: statements,
>> declarations/definitions, and types. They are derived from class Stmt, Decl,
>> and TypeLoc, respectively. In general, a node may have children of any or
>> all of the three categories. The class hierarchies for Stmt, Decl, and
>> TypeLoc can be found here.
>> The AST visitor has three distinct tasks:
>> 
>> traverse the entire AST (i.e. go to every node),
>> at each node, walk up the class hierarchy, starting with the dynamic type of
>> the node,
>> for a given (node, type) combination (where 'type' is some base class of the
>> dynamic type of 'node'), call a user-overridable function to actually
>> "visit" the node.
>> 
>> Design Goals
>> 
>> easy-to-understand API
>> performs a depth-first, preorder traversal on the AST (no child of a node
>> can be visited until all work on the node has been done)
>> complete coverage: each node in the AST must be visited
>> no duplication: a node cannot be visited more than once
>> common use cases should be easy
>> fast
>> 
>> Our Design
>> 
>> The distinction between the AST visitor's three tasks (see the "Concepts and
>> Terminology" section above) is important for understanding how the AST
>> visitor works and using it effectively. Therefore we use naming conventions
>> to highlight the role of a method of the AST visitor:
>> 
>> TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the
>> sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's
>> responsibility to ensure the correct type).
>> Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x)
>> where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc*
>> x) work similarly.
>> WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to
>> visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is
>> the direct parent class of Foo (unless Foo has no parent), and then calls
>> VisitFoo(x) (see the next bullet).
>> VisitFoo(Foo* x) does task #3.
>> 
>> Here's some pesudo code to show what these methods' implementation looks
>> like:
>> 
>> // A function returns false if it wants the traversal to abort (note that
>> // the current RecursiveASTVisitor class returns true for abortion - we
>> // feel that 'false' is much more intuitive).
>> 
>> // Therefore, each caller needs to check the return code and propagate
>> // it up appropriately.  Such plumbing code is omitted here for clarify.
>> 
>> 
>> // The entry point for visiting an AST rooted at x.
>> bool Traverse(Decl* x) {
>> 
>>   switch(x->getKind()) {
>> 
>>   case kFoo:
>>     TraverseFoo((Foo*)x);
>> 
>>     break;
>>   case kBar:
>>     TraverseBar((Bar*)x);
>> 
>>     break;
>>   ...
>>   }
>> }
>> 
>> 
>> bool TraverseFoo(Foo* x) {
>> 
>>   WalkUpFromFoo(x);
>>   for each child node n of x {
>> 
>>     Traverse(n)
>>   }
>> }
>> 
>> 
>> bool WalkUpFromFoo(Foo* x) {
>> 
>>   WalkUpFromBar(x);  // Bar is Foo's parent.
>> 
>>   VisitFoo(x);
>> }
>> 
>> bool WalkUpFromDecl(Decl* x) {
>> 
>>   // Decl has no parent, so no more walking up.
>>   VisitDecl(x);
>> 
>> }
>> 
>> // All Visit*() methods do nothing by default.
>> bool VisitFoo(Foo* x) { return true; }
>> 
>> Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function -
>> they are the "skeleton" of the AST visitor and do the mundane plumbing that
>> a normal user is not interested in. We recommend that only people truly
>> understand how the traversal works can override them.
>> 
>> A user is expected to override the Visit*() methods as needed. The default
>> implementation of them does nothing.
>> 
>> Returning a bool to indicate whether the traversal needs to be aborted
>> requires much tedious, error-prone plumbing in the implementation.
>> Exceptions are the most natural solution here, but unfortunately they are
>> banned from Clang code.
>> 
>> One advantage of this design over the existing RecursiveASTVisitor is that
>> when a user overrides VisitFoo(), he only needs to do what he's interested
>> in, without worrying about calling VisitFoo() or any other methods in the
>> base AST visitor class (so less chance to shoot his own feet).
>> 
>> With this design, all Visit*() methods on an AST ndoe are called before any
>> Visit*() method is called on a child node of it.  So the trace of the
>> Visit*() calls is easy to understand and reason about.
>> 
>> One thing people might not like about the design is that it uses more
>> methods. I actually think it's a good thing that we have separate families
>> of methods for separate roles. RecursiveASTVisitor's approach of lumping all
>> logic (traversal, walking up the type hierarchy, and the custom visiting
>> logic) in the same Visit*() method makes the implementation more tricky and
>> error-prone - that's probably why there was so much confusion when we were
>> trying to improve RecursiveASTVisitor.
>> 
>> To ensure performance, we would use CRTP as the existing AST visitors do.
>> That means two things:
>> 
>> No need for virtual functions.
>> The entire code is templated s.t. the compiler can easily inline trivial
>> functions and do cross-function optimization.
>> 
>> Use Cases
>> 
>> Let's see how some typical use cases can be implemented using our AST
>> visitor.
>> 
>> Dump the AST
>> 
>> Only 3 methods need to be overridden:
>> 
>> VisitDecl(Decl* x) {
>> 
>>   print information about x;
>> }
>> 
>> VisitStmt(Stmt* x) {
>> 
>>   print information about x;
>> }
>> 
>> VisitTypeLoc(TypeLoc* x) {
>> 
>>   print information about x;
>> }
>> 
>> Search for an AST Node of Type Foo
>> 
>> The tool would override VisitFoo(Foo* x) to return false if x is the node
>> it's looking for.
>> 
>> Find the Parents of an AST Node
>> 
>> The tool would maintain a stack of the nodes that lead to the current node.
>> It can build the stack by overriding exactly 3 methods (plumbing of the
>> return code omitted):
>> 
>> Traverse(Decl* d) {
>> 
>>   Push(d);
>>   Base::Traverse(d);
>> 
>>   Pop();
>> }
>> 
>> Traverse(Stmt* s) {
>> 
>>   Push(s);
>>   Base::Traverse(s);
>> 
>>   Pop();
>> }
>> 
>> Traverse(TypeLoc* t) {
>> 
>>   Push(t);
>>   Base::Traverse(t);
>> 
>>   Pop();
>> }
>> 
>> --
>> Zhanyong
>> 
>> 
> 
> 
> 
> -- 
> Zhanyong





More information about the cfe-dev mailing list