Hi,<br><br>While working on improving the RecursiveASTVisitor class, some of us saw an opportunity for a different design that's cleaner and easier to understand and use.  Here's the proposal.  Comments and suggestions welcome!<br>

<h1>Clang AST Visitor Design </h1>
<p>
This document discusses the design of a customizable visitor class for
the AST generated by the Clang C++ parser.
</p><p>
</p><h2><a name="Background"> </a> Background </h2>
<p>
Clang comes with two AST visitor classes:
ASTVisitor
and
RecursiveASTVisitor.
The ideas behind the two are similar.  Both are implemented using the
CRTP pattern.  The former is considered an internal class and not for
use in tools developed on top of Clang.  The latter is supposed to be
a public API.
</p><p>
At this time, neither class is complete (i.e. they'll miss some AST
nodes) and there are known bugs (e.g. some AST nodes may be visited
more than once).  We are working on improving <code>RecursiveASTVisitor</code>.
Our experience shows that its design has much space for improvement.  Therefore it
would be a worthwhile exercise to see what we can come up with if we
do it from scratch, without being constrained or influenced by the
current <code>RecursiveASTVisitor</code> design.
</p><p>
</p><h2><a name="Concepts_and_Terminology"> </a> Concepts and Terminology </h2>
<p>
Muddy concepts lead to poor, confusing designs.  Before we explore the
design space, it's important to clarify some key concepts.
</p><p>
</p><ul><li> An <strong>AST</strong> is a tree-shaped data structure.  A node in an AST can
     have 0 or many other nodes as its children.
</li><li> There a <em>three categories</em> of AST nodes in Clang: statements,
     declarations/definitions, and types.  They are derived from class
     Stmt,
     Decl,
     and
     TypeLoc,
     respectively.  In general, a node may have children of any or all
     of the three categories.  The class hierarchies for <code>Stmt</code>,
     <code>Decl</code>, and <code>TypeLoc</code> can be found
     <a href="http://clang.llvm.org/doxygen/hierarchy.html" target="_top">here</a>.
</li><li> The AST visitor has <em>three</em> distinct tasks:
<ol><li> traverse the entire AST (i.e. go to every node),
</li><li> at each node, walk up the class hierarchy, starting with the
         dynamic type of the node,
</li><li> for a given (node, type) combination (where 'type' is some
         base class of the dynamic type of 'node'), call a
         user-overridable function to actually "visit" the node.
</li></ol>
</li></ul>
<p>
</p><h2><a name="Design_Goals"> </a> Design Goals </h2>
<p>
</p><ul><li> easy-to-understand API
</li><li> performs a <a href="http://www.google.com/url?sa=D&q=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FTree_traversal%23Depth-first_Traversal" target="_top">depth-first</a>, preorder traversal on the AST (no child of a node can be visited until all work on the node has been done)
</li><li> complete coverage: each node in the AST must be visited
</li><li> no duplication: a node cannot be visited more than once
</li><li> common use cases should be easy
</li><li> fast
</li></ul>
<p>
</p><h2><a name="Our_Design"> </a> Our Design </h2>
<p>
The distinction between the AST visitor's three tasks (see the
"Concepts and Terminology" section above) is important for
understanding how the AST visitor works and using it effectively.
Therefore we use naming conventions to highlight the role of a method
of the AST visitor:
</p><p>
</p><ul><li> <code>TraverseFoo(Foo* x)</code> does task #1.  It traveses
     (i.e. recursively visits) the sub-tree rooted at <code>x</code>, where <code>Foo</code>
     is <code>*x</code>'s dynamic type (it's the caller's responsibility to
     ensure the correct type).
</li><li> <code>Traverse(Decl* x)</code> simply dispatches (i.e. forwards) to
     <code>TraverseFoo(Foo* x)</code> where <code>Foo</code> is the dynamic type of <code>*x</code>.
     <code>Traverse(Stmt* x)</code> and <code>Traverse(TypeLoc* x)</code> work similarly.
</li><li> <code>WalkUpFromFoo(Foo* x)</code> (better name welcome) does task #2.  It
     does not try to visit any child of <code>x</code>.  Instead, it first calls
     <code>WalkUpFromBar(x)</code> where <code>Bar</code> is the direct parent class of <code>Foo</code>
     (unless <code>Foo</code> has no parent), and then calls <code>VisitFoo(x)</code> (see the
     next bullet).
</li><li> <code>VisitFoo(Foo* x)</code> does task #3.
</li></ul>
<p>
Here's some pesudo code to show what these methods' implementation
looks like:
</p><p>
</p><pre class="prettyprint"><span class="com">// A function returns false if it wants the traversal to abort (note that<br>// the current RecursiveASTVisitor class returns true for abortion - we<br>// feel that 'false' is much more intuitive).</span><span class="pln"><br>

</span><span class="com">// Therefore, each caller needs to check the return code and propagate</span><span class="pln"><br></span><span class="com">// it up appropriately.  Such plumbing code is omitted here for clarify.</span><span class="pln"><br>

<br></span><span class="com">// The entry point for visiting an AST rooted at x.</span><span class="pln"><br></span><span class="kwd">bool</span><span class="pln"> </span><span class="typ">Traverse</span><span class="pun">(</span><span class="typ">Decl</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="kwd">switch</span><span class="pun">(</span><span class="pln">x</span><span class="pun">-</span>><span class="pln">getKind</span><span class="pun">())</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="kwd">case</span><span class="pln"> kFoo</span><span class="pun">:</span><span class="pln"><br>    </span><span class="typ">TraverseFoo</span><span class="pun">((</span><span class="typ">Foo</span><span class="pun">*)</span><span class="pln">x</span><span class="pun">);</span><span class="pln"><br>

    </span><span class="kwd">break</span><span class="pun">;</span><span class="pln"><br>  </span><span class="kwd">case</span><span class="pln"> kBar</span><span class="pun">:</span><span class="pln"><br>    </span><span class="typ">TraverseBar</span><span class="pun">((</span><span class="typ">Bar</span><span class="pun">*)</span><span class="pln">x</span><span class="pun">);</span><span class="pln"><br>

    </span><span class="kwd">break</span><span class="pun">;</span><span class="pln"><br>  </span><span class="pun">...</span><span class="pln"><br>  </span><span class="pun">}</span><span class="pln"><br></span><span class="pun">}</span><span class="pln"><br>

<br></span><span class="kwd">bool</span><span class="pln"> </span><span class="typ">TraverseFoo</span><span class="pun">(</span><span class="typ">Foo</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="typ">WalkUpFromFoo</span><span class="pun">(</span><span class="pln">x</span><span class="pun">);</span><span class="pln"><br>  </span><span class="kwd">for</span><span class="pln"> each child node n of x </span><span class="pun">{</span><span class="pln"><br>

    </span><span class="typ">Traverse</span><span class="pun">(</span><span class="pln">n</span><span class="pun">)</span><span class="pln"><br>  </span><span class="pun">}</span><span class="pln"><br></span><span class="pun">}</span><span class="pln"><br>

<br></span><span class="kwd">bool</span><span class="pln"> </span><span class="typ">WalkUpFromFoo</span><span class="pun">(</span><span class="typ">Foo</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="typ">WalkUpFromBar</span><span class="pun">(</span><span class="pln">x</span><span class="pun">);</span><span class="pln">  </span><span class="com">// Bar is Foo's parent.</span><span class="pln"><br>

  </span><span class="typ">VisitFoo</span><span class="pun">(</span><span class="pln">x</span><span class="pun">);</span><span class="pln"><br></span><span class="pun">}</span><span class="pln"><br><br></span><span class="kwd">bool</span><span class="pln"> </span><span class="typ">WalkUpFromDecl</span><span class="pun">(</span><span class="typ">Decl</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="com">// Decl has no parent, so no more walking up.</span><span class="pln"><br>  </span><span class="typ">VisitDecl</span><span class="pun">(</span><span class="pln">x</span><span class="pun">);</span><span class="pln"><br>

</span><span class="pun">}</span><span class="pln"><br><br></span><span class="com">// All Visit*() methods do nothing by default.</span><span class="pln"><br></span><span class="kwd">bool</span><span class="pln"> </span><span class="typ">VisitFoo</span><span class="pun">(</span><span class="typ">Foo</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"> </span><span class="kwd">return</span><span class="pln"> </span><span class="kwd">true</span><span class="pun">;</span><span class="pln"> </span><span class="pun">}</span></pre>


<p>
Usually, a user shouldn't override a <code>Traverse*()</code> or <code>WalkUpFrom*()</code>
function - they are the "skeleton" of the AST visitor and do the
mundane plumbing that a normal user is not interested in.  We
recommend that only people truly understand how the traversal works
can override them.
</p><p>
A user is expected to override the <code>Visit*()</code> methods as needed.  The
default implementation of them does nothing.
</p><p>
Returning a <code>bool</code> to indicate whether the traversal needs to be
aborted requires much tedious, error-prone plumbing in the
implementation.  Exceptions are the most natural solution here, but
unfortunately they are banned from Clang code.
</p><p>One advantage of this design over the existing <code>RecursiveASTVisitor</code>  is that when a user overrides <code>VisitFoo()</code>, he only
needs to do what he's interested in, without worrying about calling
<code>VisitFoo()</code> or any other methods in the base AST visitor class (so
less chance to shoot his own feet).</p><p>With this design, all Visit*() methods on an AST ndoe are called before any Visit*() method is called on a child node of it.  So the trace of the Visit*() calls is easy to understand and reason about.<br>

</p><p>
One thing people might not like about the design is that it uses more
methods.  I actually think it's a good thing that we have separate
families of methods for separate roles.  <code>RecursiveASTVisitor</code>'s
approach of lumping all logic (traversal, walking up the type
hierarchy, and the custom visiting logic) in the same <code>Visit*()</code>
method makes the implementation more tricky and error-prone - that's
probably why there was so much confusion when we were trying to
improve <code>RecursiveASTVisitor</code>.
</p><p>
To ensure performance, we would use CRTP as the existing AST visitors
do.  That means two things:
</p><p>
</p><ol><li> No need for virtual functions.
</li><li> The entire code is templated s.t. the compiler can easily inline
      trivial functions and do cross-function optimization.
</li></ol>
<p>
</p><h2><a name="Use_Cases"> </a> Use Cases </h2>
<p>
Let's see how some typical use cases can be implemented using our AST
visitor.
</p><p>
</p><h3><a name="Dump_the_AST"> </a> Dump the AST </h3>
<p>
Only 3 methods need to be overridden:
</p><p>
</p><pre class="prettyprint"><span class="typ">VisitDecl</span><span class="pun">(</span><span class="typ">Decl</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="kwd">print</span><span class="pln"> information about x</span><span class="pun">;</span><span class="pln"><br></span><span class="pun">}</span><span class="pln"><br><br></span><span class="typ">VisitStmt</span><span class="pun">(</span><span class="typ">Stmt</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="kwd">print</span><span class="pln"> information about x</span><span class="pun">;</span><span class="pln"><br></span><span class="pun">}</span><span class="pln"><br><br></span><span class="typ">VisitTypeLoc</span><span class="pun">(</span><span class="typ">TypeLoc</span><span class="pun">*</span><span class="pln"> x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="kwd">print</span><span class="pln"> information about x</span><span class="pun">;</span><span class="pln"><br></span><span class="pun">}</span></pre>
<p>
</p><h3>Search for an AST Node of Type Foo </h3>
<p>
The tool would override <code>VisitFoo(Foo* x)</code> to return false if x is the
node it's looking for.
</p><p>
</p><h3><a name="Find_the_Parents_of_an_AST_Node"> </a> Find the Parents of an AST Node </h3>
<p>
The tool would maintain a stack of the nodes that lead to the current
node.  It can build the stack by overriding exactly 3 methods
(plumbing of the return code omitted):
</p><p>
</p><pre class="prettyprint"><span class="typ">Traverse</span><span class="pun">(</span><span class="typ">Decl</span><span class="pun">*</span><span class="pln"> d</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="typ">Push</span><span class="pun">(</span><span class="pln">d</span><span class="pun">);</span><span class="pln"><br>  </span><span class="typ">Base</span><span class="pun">::</span><span class="typ">Traverse</span><span class="pun">(</span><span class="pln">d</span><span class="pun">);</span><span class="pln"><br>

  </span><span class="typ">Pop</span><span class="pun">();</span><span class="pln"><br></span><span class="pun">}</span><span class="pln"><br><br></span><span class="typ">Traverse</span><span class="pun">(</span><span class="typ">Stmt</span><span class="pun">*</span><span class="pln"> s</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="typ">Push</span><span class="pun">(</span><span class="pln">s</span><span class="pun">);</span><span class="pln"><br>  </span><span class="typ">Base</span><span class="pun">::</span><span class="typ">Traverse</span><span class="pun">(</span><span class="pln">s</span><span class="pun">);</span><span class="pln"><br>

  </span><span class="typ">Pop</span><span class="pun">();</span><span class="pln"><br></span><span class="pun">}</span><span class="pln"><br><br></span><span class="typ">Traverse</span><span class="pun">(</span><span class="typ">TypeLoc</span><span class="pun">*</span><span class="pln"> t</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln"><br>

  </span><span class="typ">Push</span><span class="pun">(</span><span class="pln">t</span><span class="pun">);</span><span class="pln"><br>  </span><span class="typ">Base</span><span class="pun">::</span><span class="typ">Traverse</span><span class="pun">(</span><span class="pln">t</span><span class="pun">);</span><span class="pln"><br>

  </span><span class="typ">Pop</span><span class="pun">();</span><span class="pln"><br></span><span class="pun">}</span></pre><br>-- <br>Zhanyong<br><br>