[cfe-dev] About AST rewriting / manipulation

Fri Jan 23 09:09:06 PST 2009

Hello Simone,

On Jan 23, 2009, at 8:42 AM, Simone Pellegrini wrote:
> I am trying to use Clang as a source-to-source compiler. Through the  
> API
> I've found the way to rewrite back the syntax tree into source code,  
> and
> that's not difficult. However, before writing back the syntax tree I
> would like to manipulate the syntax tree in order to apply some code
> transformations.
>
> For example I would like to rewrite something like f(a,b) into g(b,  
> a, c)

Okay.

> Now I guess I should create the AST nodes I need (building a new
> CallExpr... object and so on...) and then substitute the old f(...)  
> with
> the new g(...). The Clang API for creating AST nodes is nevertheless
> quite complex to use, it's really too demanding.

Interesting. I guess the demanding part of the API is that you need to  
be careful to ensure that you build semantically-correct ASTs.

> Now, I am wonder that It would be very nice if I could write the
> statement I want to substitute (or to add) as a string and then use  
> the
> the Clang parser to create the syntax tree of the piece of code I have
> written in a way it can be easily plugged in the old main syntax tree
> (of course the new instance of the parser should be invoked  
> considering
> the previous context...). Is it possible to have this kind of  
> behavior?

I believe it is possible to extend Clang to do this, but there is no  
API to do so right now. The parser can be handed a set of tokens and  
told to "go parse these" by calling into the appropriate parse  
function; we do this to implement some C++ semantics, such as inline  
definitions of member functions.

However, the hard part---that nobody has even thought about how to  
implement---is that you would need to be able to take an AST node and  
instruct the parser *and semantic analysis* to set its internal state  
to the point where that AST node was parsed. That means reconstructing  
the scope stack, the information about which identifiers bind to which  
declarations, and so on. Not all of this information is present in the  
AST, so this is a major undertaking.

	- Doug