[cfe-dev] Should we build semantically invalid nodes?

Sun Oct 26 18:44:25 PDT 2008

On Oct 26, 2008, at 3:33 PM, Argiris Kirtzidis wrote:
>> Unless the refactoring engine were maximally mature already, I'd  
>> almost always (and this is just my personal preference) put that  
>> energy into B.
>
> Come on, a refactoring engine that can't find all the references  
> unless the code compiles ? The java guys will laugh at us :-)

Java people already laugh at the state of tools for C, we're used to  
it. :)

>>> It's one thing to say that "the parse tree is important for this  
>>> stuff and we'll get to it in the future", and another that "I  
>>> don't think having the parse tree makes a difference to anything,  
>>> but if we find some use for it in the future we'll consider it".
>>> So which exactly is it ?
>>
>> For me, the later.
>
> Given that "find all references of this specific variable named  
> 'foo' here, with the added bonus of working on code that doesn't  
> compile" is the specific, non-theoretical problem, what better way  
> is to solve it than the parse tree ?

I don't buy it.  If you try to handle error cases, how far do you go?   
What if the code doesn't parse?  Are you suggesting we have multiple  
level of error cases and only some of them are handled and some  
aren't?  How do we determine what is a 'fatal' error vs not?  How do  
we know what sort of errors each client can tolerate?

If you want to define a clean interface and have a specific client in  
mind, that's fine.  It seems to me that you're very focused on one  
perceived need, but I don't see a clean interface here.  When the code  
correctly parses, we can make strong guarantees about what the trees  
mean, and we know that nothing got dropped on the floor.

If the code didn't parse, we can't guarantee anything (because  
skipping can jump over anything).  If it parsed but didn't type check,  
then the clients could not depend on types at all.  If it parsed but  
did not pass various semantic checks (e.g. invalid operation in a case  
value) it can't depend on sane exprs being in various places.  If sema  
starts building ASTs that are invalid, then it would have to handle  
them, which I strongly think is a bad idea for all the reasons already  
discussed.

Finally, if code was erroneous, then you really don't know enough to  
refactor safely.  For example:

int G;
void foo() {
    <<mumble>>

    print(G);
}

if "mumble" was a broken definition of a shadowed G, then you really  
don't want to rename the inner G.

I strongly believe that trying to refactor code that doesn't work is a  
bad idea.  There are specific things that can be done by (e.g.)  
matching braces that can make sense on invalid code (for example,  
Xcode has a textual "rename in scope") but I don't consider those to  
really be refactoring.  If/when we have refactoring and this becomes a  
blocker, we should figure out what the right approach is then IMO.

-Chris