[cfe-dev] new design of RecursiveASTVisitor

Zhanyong Wan (λx.x x) wan at google.com
Wed Jun 9 00:32:01 PDT 2010


Thanks for the excellent comments, Doug and Chandler!  I've uploaded a
new patch to the codereview app.

On Tue, Jun 8, 2010 at 5:33 PM, Douglas Gregor <dgregor at apple.com> wrote:
>
> On Jun 8, 2010, at 2:44 PM, Zhanyong Wan (λx.x x) wrote:
>
>> Hi, all,
>>
>> Craig and I implemented most of the new design.  I uploaded the patch
>> to http://codereview.appspot.com/1607042 for easy reviewing - you can
>> download it from
>> http://codereview.appspot.com/download/issue1607042_1.diff
>>
>> In the implementation we made a slight change to the proposed design:
>> instead of overloading Traverse() for Stmt, Type, and Decl, we prefer
>> to have separate TraverseStmt, TraverseType, and TraverseDecl
>> functions.  We think that makes the code clearer.
>>
>> Chandler plan to review it after 3pm today.  Doug, would you have some
>> time to look at this today?  Thanks!
>
> I've added my comments up on codereview.appspot.com. Looks good!
>
>        - Doug
>
>> On Fri, Jun 4, 2010 at 4:35 PM, Zhanyong Wan (λx.x x) <wan at google.com> wrote:
>>> Hi,
>>>
>>> While working on improving the RecursiveASTVisitor class, some of us saw an
>>> opportunity for a different design that's cleaner and easier to understand
>>> and use.  Here's the proposal.  Comments and suggestions welcome!
>>>
>>> Clang AST Visitor Design
>>>
>>> This document discusses the design of a customizable visitor class for the
>>> AST generated by the Clang C++ parser.
>>>
>>> Background
>>>
>>> Clang comes with two AST visitor classes: ASTVisitor and
>>> RecursiveASTVisitor. The ideas behind the two are similar. Both are
>>> implemented using the CRTP pattern. The former is considered an internal
>>> class and not for use in tools developed on top of Clang. The latter is
>>> supposed to be a public API.
>>>
>>> At this time, neither class is complete (i.e. they'll miss some AST nodes)
>>> and there are known bugs (e.g. some AST nodes may be visited more than
>>> once). We are working on improving RecursiveASTVisitor. Our experience shows
>>> that its design has much space for improvement. Therefore it would be a
>>> worthwhile exercise to see what we can come up with if we do it from
>>> scratch, without being constrained or influenced by the current
>>> RecursiveASTVisitor design.
>>>
>>> Concepts and Terminology
>>>
>>> Muddy concepts lead to poor, confusing designs. Before we explore the design
>>> space, it's important to clarify some key concepts.
>>>
>>> An AST is a tree-shaped data structure. A node in an AST can have 0 or many
>>> other nodes as its children.
>>> There a three categories of AST nodes in Clang: statements,
>>> declarations/definitions, and types. They are derived from class Stmt, Decl,
>>> and TypeLoc, respectively. In general, a node may have children of any or
>>> all of the three categories. The class hierarchies for Stmt, Decl, and
>>> TypeLoc can be found here.
>>> The AST visitor has three distinct tasks:
>>>
>>> traverse the entire AST (i.e. go to every node),
>>> at each node, walk up the class hierarchy, starting with the dynamic type of
>>> the node,
>>> for a given (node, type) combination (where 'type' is some base class of the
>>> dynamic type of 'node'), call a user-overridable function to actually
>>> "visit" the node.
>>>
>>> Design Goals
>>>
>>> easy-to-understand API
>>> performs a depth-first, preorder traversal on the AST (no child of a node
>>> can be visited until all work on the node has been done)
>>> complete coverage: each node in the AST must be visited
>>> no duplication: a node cannot be visited more than once
>>> common use cases should be easy
>>> fast
>>>
>>> Our Design
>>>
>>> The distinction between the AST visitor's three tasks (see the "Concepts and
>>> Terminology" section above) is important for understanding how the AST
>>> visitor works and using it effectively. Therefore we use naming conventions
>>> to highlight the role of a method of the AST visitor:
>>>
>>> TraverseFoo(Foo* x) does task #1. It traveses (i.e. recursively visits) the
>>> sub-tree rooted at x, where Foo is *x's dynamic type (it's the caller's
>>> responsibility to ensure the correct type).
>>> Traverse(Decl* x) simply dispatches (i.e. forwards) to TraverseFoo(Foo* x)
>>> where Foo is the dynamic type of *x. Traverse(Stmt* x) and Traverse(TypeLoc*
>>> x) work similarly.
>>> WalkUpFromFoo(Foo* x) (better name welcome) does task #2. It does not try to
>>> visit any child of x. Instead, it first calls WalkUpFromBar(x) where Bar is
>>> the direct parent class of Foo (unless Foo has no parent), and then calls
>>> VisitFoo(x) (see the next bullet).
>>> VisitFoo(Foo* x) does task #3.
>>>
>>> Here's some pesudo code to show what these methods' implementation looks
>>> like:
>>>
>>> // A function returns false if it wants the traversal to abort (note that
>>> // the current RecursiveASTVisitor class returns true for abortion - we
>>> // feel that 'false' is much more intuitive).
>>>
>>> // Therefore, each caller needs to check the return code and propagate
>>> // it up appropriately.  Such plumbing code is omitted here for clarify.
>>>
>>>
>>> // The entry point for visiting an AST rooted at x.
>>> bool Traverse(Decl* x) {
>>>
>>>   switch(x->getKind()) {
>>>
>>>   case kFoo:
>>>     TraverseFoo((Foo*)x);
>>>
>>>     break;
>>>   case kBar:
>>>     TraverseBar((Bar*)x);
>>>
>>>     break;
>>>   ...
>>>   }
>>> }
>>>
>>>
>>> bool TraverseFoo(Foo* x) {
>>>
>>>   WalkUpFromFoo(x);
>>>   for each child node n of x {
>>>
>>>     Traverse(n)
>>>   }
>>> }
>>>
>>>
>>> bool WalkUpFromFoo(Foo* x) {
>>>
>>>   WalkUpFromBar(x);  // Bar is Foo's parent.
>>>
>>>   VisitFoo(x);
>>> }
>>>
>>> bool WalkUpFromDecl(Decl* x) {
>>>
>>>   // Decl has no parent, so no more walking up.
>>>   VisitDecl(x);
>>>
>>> }
>>>
>>> // All Visit*() methods do nothing by default.
>>> bool VisitFoo(Foo* x) { return true; }
>>>
>>> Usually, a user shouldn't override a Traverse*() or WalkUpFrom*() function -
>>> they are the "skeleton" of the AST visitor and do the mundane plumbing that
>>> a normal user is not interested in. We recommend that only people truly
>>> understand how the traversal works can override them.
>>>
>>> A user is expected to override the Visit*() methods as needed. The default
>>> implementation of them does nothing.
>>>
>>> Returning a bool to indicate whether the traversal needs to be aborted
>>> requires much tedious, error-prone plumbing in the implementation.
>>> Exceptions are the most natural solution here, but unfortunately they are
>>> banned from Clang code.
>>>
>>> One advantage of this design over the existing RecursiveASTVisitor is that
>>> when a user overrides VisitFoo(), he only needs to do what he's interested
>>> in, without worrying about calling VisitFoo() or any other methods in the
>>> base AST visitor class (so less chance to shoot his own feet).
>>>
>>> With this design, all Visit*() methods on an AST ndoe are called before any
>>> Visit*() method is called on a child node of it.  So the trace of the
>>> Visit*() calls is easy to understand and reason about.
>>>
>>> One thing people might not like about the design is that it uses more
>>> methods. I actually think it's a good thing that we have separate families
>>> of methods for separate roles. RecursiveASTVisitor's approach of lumping all
>>> logic (traversal, walking up the type hierarchy, and the custom visiting
>>> logic) in the same Visit*() method makes the implementation more tricky and
>>> error-prone - that's probably why there was so much confusion when we were
>>> trying to improve RecursiveASTVisitor.
>>>
>>> To ensure performance, we would use CRTP as the existing AST visitors do.
>>> That means two things:
>>>
>>> No need for virtual functions.
>>> The entire code is templated s.t. the compiler can easily inline trivial
>>> functions and do cross-function optimization.
>>>
>>> Use Cases
>>>
>>> Let's see how some typical use cases can be implemented using our AST
>>> visitor.
>>>
>>> Dump the AST
>>>
>>> Only 3 methods need to be overridden:
>>>
>>> VisitDecl(Decl* x) {
>>>
>>>   print information about x;
>>> }
>>>
>>> VisitStmt(Stmt* x) {
>>>
>>>   print information about x;
>>> }
>>>
>>> VisitTypeLoc(TypeLoc* x) {
>>>
>>>   print information about x;
>>> }
>>>
>>> Search for an AST Node of Type Foo
>>>
>>> The tool would override VisitFoo(Foo* x) to return false if x is the node
>>> it's looking for.
>>>
>>> Find the Parents of an AST Node
>>>
>>> The tool would maintain a stack of the nodes that lead to the current node.
>>> It can build the stack by overriding exactly 3 methods (plumbing of the
>>> return code omitted):
>>>
>>> Traverse(Decl* d) {
>>>
>>>   Push(d);
>>>   Base::Traverse(d);
>>>
>>>   Pop();
>>> }
>>>
>>> Traverse(Stmt* s) {
>>>
>>>   Push(s);
>>>   Base::Traverse(s);
>>>
>>>   Pop();
>>> }
>>>
>>> Traverse(TypeLoc* t) {
>>>
>>>   Push(t);
>>>   Base::Traverse(t);
>>>
>>>   Pop();
>>> }
>>>
>>> --
>>> Zhanyong
>>>
>>>
>>
>>
>>
>> --
>> Zhanyong
>
>



-- 
Zhanyong




More information about the cfe-dev mailing list