[cfe-dev] Should we build semantically invalid nodes?

Thu Oct 23 11:49:04 PDT 2008

Doug Gregor wrote:
>
>> Sema is *huge* and the alternative Action option is not realistic (another
>> Action that deals with templates ? ;), this is what almost all of clients
>> will use.
>> There will be clients that care only about the syntax tree, Sema is fully
>> capable of servicing them too.
>>     
>
> Okay, that's my option (1), then. We don't really want to cater to
> clients trying to work on ill-formed code, do we?
>   

I think you underestimate the importance of being able to get a syntax 
tree as complete as possible even on ill-formed code. It allows Clang to 
be used effectively for a variety of purposes that we may not even 
currently imagine.
e.g. IDEs have to assist the programmer as they write their code, so in 
the vast majority of the time they have to work on ill-formed code.

>   
>> I don't see what is so bad about separating syntax from semantics.
>> -An expression node is produced for a syntactic construct.
>> -This expression node already conveys useful information about the program
>> structure.
>> -Semantic checks are done to it and diagnostics are emitted.
>> -Now why should we discard the syntactic information ? This is a concrete
>> expression with an actual type (even if it got it's type illegally according
>> to the language rules), so it won't lead to crashes, just to possible more
>> diagnostics.
>>     
>
> Oh, it will lead to crashes, because it opens up the semantic analysis
> to inputs that can never make sense. An expression of function type
> that isn't a reference to a function declaration? Nonsense, but it
> could happen if we allow reinterpret_cast<int(void)>(blah) to get its
> own AST node (even if it is marked as invalid). We can pretend that
> we're good enough to build a compiler that's robust against all of
> these bogus inputs, but it's just not going to happen. It's far better
> to only build semantically well-formed AST nodes.
>   

Hmm, we don't have any concrete examples, but I'm not sure that it's so 
bad. The vast majority of semantic checks are using the types of the 
expressions.
And to clarify, I definitely am not advocating that Sema starts to check 
an "invalid expr flag"; if the expression is so bad that it cannot be 
handled normally, it should indeed be disallowed from the start.
(I mentioned the "invalid expr flag" from the start as a nice to have 
flag for interested consumers, not for Sema use).

There is certainly some middle ground to tread here, not all semantic 
checks will wreak havoc if they allow the expression; I think it's a 
worthwhile goal to strive to have an AST as complete as possible, it 
will be better in the long run for interesting uses of Clang beyond 
compiling.

-Argiris