[cfe-dev] Should we build semantically invalid nodes?

Sun Oct 26 11:39:26 PDT 2008

On Oct 26, 2008, at 10:23 AM, Argiris Kirtzidis wrote:
>> Another set of clients are things like "indexers" that want to  
>> find  all the function definitions and global variables so you can  
>> "click on  a function and jump to its definition".  For this sort  
>> of use, a  simple actions module plugging into the parser is just  
>> fine.
>>
>
> Hmm.. I don't quite understand how can this be simple, are you  
> talking about only building declarations and not expressions ?
> Say that you want all references of a global variable in the  
> program, how are you going to find them without building a full AST  
> of the program, including the expressions ?

Typically an IDE isn't interested in full def/use chains: it just  
wants to know where definitions are.  If you click on (e.g.) a class  
name, it wants to let you jump to its definition.  Good IDEs would use  
simple syntactic information about (e.g.) namespaces etc.  While it's  
true that you have to do full AST building to handle the fully general  
cases, the tradeoff is that you can't keep the index up to date as  
fast and it requires a bunch more memory.  Because of this, most IDE's  
use a fuzzy parse that does not do AST building or type checking to  
get this info.

>> What sort of clients would benefit substantially from a broken and   
>> partially formed AST?
>
> There's a difference between a program with broken "syntax" (the  
> Parser doesn't accept it), and broken "semantics" (the Sema rejects  
> it).

Sure, the difference is the difference between an AST and a parse tree.

> "reinterpret_cast<int>(x)" is correct syntax but with broken  
> semantics.
> There's a lot of benefit found in getting a AST which is the  
> representation of the syntax of the program,  
> "reinterpret_cast<int>(x)" conveys the information that a  
> reinterpret_cast is using the 'x' variable in this source location.

What is the benefit?  You've stated that there is a lot of benefit but  
haven't given an example :)

>>  If we really wanted this sort of thing, it  seems like it would be  
>> cleanest to do what Steve said: define a new  actions module that  
>> just builds an AST (which can even use the same or  an extended set  
>> of nodes as Sema) but doesn't do any real checks,  doesn't assign  
>> types, etc.  At this point, you have more parse tree  than an AST.
>
> This will be a maintainance burden; I'm pretty sure such an action  
> module will eventually bitrot and become irrelevant since all the  
> focus will be on the Sema AST.

You're right, one example is the '-parse-print-callbacks' option which  
was out of date almost as soon as it was started :).  However, if  
there is a well maintained client, this wouldn't happen.

> The current AST has lots of syntactic information (apart from the  
> missing "TypeSpecifier" node), there's no need for another one.
> If it's possible to combine a ASTBuilder action with the Sema action  
> like I suggest here:
> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-October/003125.html
> it will result in an ASTBuilder that produces the syntactic AST, and  
> a Sema that uses it and emits the necessary diagnostics and possible  
> rejects invalid nodes. It may even help in the maintainability  
> department.

I'm still struggling to figure out what problem you're trying to solve.

-Chris