[cfe-dev] Should we build semantically invalid nodes?

Sun Oct 26 11:31:47 PDT 2008

On Oct 26, 2008, at 11:15 AM, Sebastian Redl wrote:

> Argiris Kirtzidis wrote:
>> Chris Lattner wrote:
>>
>>> It seems to me that it comes down to the clients that are the  
>>> ultimate  consumers of this information.  Since Sema is perfectly  
>>> fine for  correct code, lets ignore all clients that require well- 
>>> formed code  (e.g. codegen, refactoring, etc)
>>>
>>
>> Refactoring, as I see it, doesn't require well-formed code, e.g.  
>> "rename this parameter name" doesn't particularly care about only  
>> the well-formed uses, it just wants to find all the appearances of  
>> the parameter in the function, even if the parameter is used in an  
>> invalid reinterpret_cast.
>>
> Refactoring invalid code is extremely dangerous. If there are errors  
> in the code, then how can the refactory possibly preserve the  
> semantics? The semantics aren't even well-defined. I'm pretty sure  
> the Eclipse Java refactory requires valid code.

Yes, and it depends on just how broken it is.  If the parser skipped a  
bunch of tokens to do error recovery, you aren't guaranteed to have  
all instances of the identifier.

On top of that, refactoring isn't just about renaming things... it has  
to do a lot of verification to make sure that the transformation is  
safe.  For example, in:

int G;
int foo(int H) {
   return G+H;
}

A refactoring tool is supposed to not allow you to rename G to H  
because it would be shadowed (and thus change semantics) in foo.   
Doing these sorts of checks requires the "deep" analysis that requires  
traversing and reasoning about the AST.  If the AST is broken, these  
analyses have to tolerate this, which isn't worth it.

-Chris