[cfe-dev] Should we build semantically invalid nodes?

Sat Oct 25 11:06:25 PDT 2008

On Oct 25, 2008, at 10:35 AM, steve naroff wrote:
> On Oct 25, 2008, at 11:06 AM, Sebastian Redl wrote:
>> Argiris Kirtzidis wrote:
>>> Hey Steve, breaking up Sema is a ridiculous awesome idea!
>>>
>>> Here's a thought I'd like to throw around..
>>> Could there be something like "composable Actions" ? The ASTBuilder
>>> would build the AST while Sema would do semantic checks and reject
>>> invalid nodes.
>>> This will cleanly separate the semantic checks from the AST
>>> building and, as you said, will make the code more maintainable.
>>>
>> Unless the complexity of creating and maintaining that separation
>> exceeds that of having the merged code.
>>
>
> I totally agree. I think this would be a fairly disruptive change. We
> would need a fairly compelling reason to tackle it.

I agree.

> At this point, it's just "food for thought". It is true that Sema has
> grown considerably and it would be nice to benefit from some of its
> functionality with having to take it all.

I think this something of a dangerous path.  Semantic analysis is  
complex and intertwined enough (even in C, but particularly in C++)  
that adding abstractions should only be done really carefully.

It seems to me that it comes down to the clients that are the ultimate  
consumers of this information.  Since Sema is perfectly fine for  
correct code, lets ignore all clients that require well-formed code  
(e.g. codegen, refactoring, etc) and those that aren't harmed by  
requiring it (static analysis).  These clients are incidentally the  
ones that are doing "deep analysis" of the AST and really benefit from  
having a lot of invariants in the AST that absolutely must be true for  
sanity.  Lack of these invariants would require sprinkling their  
(incredibly non-trivial) code with lots of special cases and hacks  
that I'd really like to avoid.

Another set of clients are things like "indexers" that want to find  
all the function definitions and global variables so you can "click on  
a function and jump to its definition".  For this sort of use, a  
simple actions module plugging into the parser is just fine.

What sort of clients would benefit substantially from a broken and  
partially formed AST?  If we really wanted this sort of thing, it  
seems like it would be cleanest to do what Steve said: define a new  
actions module that just builds an AST (which can even use the same or  
an extended set of nodes as Sema) but doesn't do any real checks,  
doesn't assign types, etc.  At this point, you have more parse tree  
than an AST.  I could imagine that something like this would be  
useful, but can't think of any specific clients.

To be clear, I want to separate out two notions from this.  First, we  
don't need anything like this to get loc info for types.  That is a  
straight-forward extension over what we have, and making sema  
(optionally) do it would be easy and non-invasive.  Second, this whole  
discussion started with a discussion of error recovery.  While related  
to the above, I still really really think that *Sema* shouldn't return  
invalid nodes and should only "correct" them in really obvious cases.   
Sema is extremely complex, and having it not be able to depend on the  
invariants we have on AST nodes would be very bad, as Doug pointed out.

-Chris