[cfe-dev] Should we build semantically invalid nodes?

Sat Oct 25 12:53:51 PDT 2008

Great points. As you say, it comes down to the ultimate consumers of  
the information.

 From my perspective, defining a new actions module will allow us to  
gain some experience with alternate representations. Once we have such  
experience, we can decide if it makes sense to do fold into Sema in  
some fashion.

I agree that getting loc info is a straight-forward extension to what  
we have. I didn't mean to suggest otherwise.

snaroff

On Oct 25, 2008, at 2:06 PM, Chris Lattner wrote:

> On Oct 25, 2008, at 10:35 AM, steve naroff wrote:
>> On Oct 25, 2008, at 11:06 AM, Sebastian Redl wrote:
>>> Argiris Kirtzidis wrote:
>>>> Hey Steve, breaking up Sema is a ridiculous awesome idea!
>>>>
>>>> Here's a thought I'd like to throw around..
>>>> Could there be something like "composable Actions" ? The ASTBuilder
>>>> would build the AST while Sema would do semantic checks and reject
>>>> invalid nodes.
>>>> This will cleanly separate the semantic checks from the AST
>>>> building and, as you said, will make the code more maintainable.
>>>>
>>> Unless the complexity of creating and maintaining that separation
>>> exceeds that of having the merged code.
>>>
>>
>> I totally agree. I think this would be a fairly disruptive change. We
>> would need a fairly compelling reason to tackle it.
>
> I agree.
>
>> At this point, it's just "food for thought". It is true that Sema has
>> grown considerably and it would be nice to benefit from some of its
>> functionality with having to take it all.
>
> I think this something of a dangerous path.  Semantic analysis is  
> complex and intertwined enough (even in C, but particularly in C++)  
> that adding abstractions should only be done really carefully.
>
> It seems to me that it comes down to the clients that are the  
> ultimate consumers of this information.  Since Sema is perfectly  
> fine for correct code, lets ignore all clients that require well- 
> formed code (e.g. codegen, refactoring, etc) and those that aren't  
> harmed by requiring it (static analysis).  These clients are  
> incidentally the ones that are doing "deep analysis" of the AST and  
> really benefit from having a lot of invariants in the AST that  
> absolutely must be true for sanity.  Lack of these invariants would  
> require sprinkling their (incredibly non-trivial) code with lots of  
> special cases and hacks that I'd really like to avoid.
>
> Another set of clients are things like "indexers" that want to find  
> all the function definitions and global variables so you can "click  
> on a function and jump to its definition".  For this sort of use, a  
> simple actions module plugging into the parser is just fine.
>
> What sort of clients would benefit substantially from a broken and  
> partially formed AST?  If we really wanted this sort of thing, it  
> seems like it would be cleanest to do what Steve said: define a new  
> actions module that just builds an AST (which can even use the same  
> or an extended set of nodes as Sema) but doesn't do any real checks,  
> doesn't assign types, etc.  At this point, you have more parse tree  
> than an AST.  I could imagine that something like this would be  
> useful, but can't think of any specific clients.
>
>
> To be clear, I want to separate out two notions from this.  First,  
> we don't need anything like this to get loc info for types.  That is  
> a straight-forward extension over what we have, and making sema  
> (optionally) do it would be easy and non-invasive.  Second, this  
> whole discussion started with a discussion of error recovery.  While  
> related to the above, I still really really think that *Sema*  
> shouldn't return invalid nodes and should only "correct" them in  
> really obvious cases.  Sema is extremely complex, and having it not  
> be able to depend on the invariants we have on AST nodes would be  
> very bad, as Doug pointed out.
>
> -Chris