[cfe-dev] Should we build semantically invalid nodes?

Sun Oct 26 19:39:12 PDT 2008

On Oct 26, 2008, at 9:44 PM, Chris Lattner wrote:

> On Oct 26, 2008, at 3:33 PM, Argiris Kirtzidis wrote:
>>> Unless the refactoring engine were maximally mature already, I'd
>>> almost always (and this is just my personal preference) put that
>>> energy into B.
>>
>> Come on, a refactoring engine that can't find all the references
>> unless the code compiles ? The java guys will laugh at us :-)
>
> Java people already laugh at the state of tools for C, we're used to
> it. :)
>
>>>> It's one thing to say that "the parse tree is important for this
>>>> stuff and we'll get to it in the future", and another that "I
>>>> don't think having the parse tree makes a difference to anything,
>>>> but if we find some use for it in the future we'll consider it".
>>>> So which exactly is it ?
>>>
>>> For me, the later.
>>
>> Given that "find all references of this specific variable named
>> 'foo' here, with the added bonus of working on code that doesn't
>> compile" is the specific, non-theoretical problem, what better way
>> is to solve it than the parse tree ?
>
> I don't buy it.  If you try to handle error cases, how far do you go?
> What if the code doesn't parse?  Are you suggesting we have multiple
> level of error cases and only some of them are handled and some
> aren't?  How do we determine what is a 'fatal' error vs not?  How do
> we know what sort of errors each client can tolerate?
>
> If you want to define a clean interface and have a specific client in
> mind, that's fine.  It seems to me that you're very focused on one
> perceived need, but I don't see a clean interface here.  When the code
> correctly parses, we can make strong guarantees about what the trees
> mean, and we know that nothing got dropped on the floor.
>
> If the code didn't parse, we can't guarantee anything (because
> skipping can jump over anything).  If it parsed but didn't type check,
> then the clients could not depend on types at all.  If it parsed but
> did not pass various semantic checks (e.g. invalid operation in a case
> value) it can't depend on sane exprs being in various places.  If sema
> starts building ASTs that are invalid, then it would have to handle
> them, which I strongly think is a bad idea for all the reasons already
> discussed.
>
> Finally, if code was erroneous, then you really don't know enough to
> refactor safely.  For example:
>
> int G;
> void foo() {
>    <<mumble>>
>
>    print(G);
> }
>
> if "mumble" was a broken definition of a shadowed G, then you really
> don't want to rename the inner G.
>
> I strongly believe that trying to refactor code that doesn't work is a
> bad idea.  There are specific things that can be done by (e.g.)
> matching braces that can make sense on invalid code (for example,
> Xcode has a textual "rename in scope") but I don't consider those to
> really be refactoring.  If/when we have refactoring and this becomes a
> blocker, we should figure out what the right approach is then IMO.
>

I believe most refactoring experts agree with you (including Robert  
Bowdidge, the engineer who developed Xcode's refactoring support).  
Traditional refactoring support focuses on improving the design of  
source code *after* is has already been written (and is working).

 From my perspective, there are a host of language sensitive features  
in an IDE that are nice to provide on source code that is *both*  
syntactically and semantically incomplete/invalid. Here are 3 examples:

- Function/method popup.
- Class browser.
- Syntax highlighting.

Implementing these particular features with *either* a parse tree or  
AST is less than great. I think smart "fuzzy parsers" that operate on  
token streams work very nicely for the above applications (though they  
can seem adhoc). The benefits are:

- The code doesn't have to parse.
- The code doesn't have to pass semantic analysis.
- The code doesn't even have to preprocess correctly!
- Speed (since you aren't dependent on preprocessing, parsing,  
checking, etc.).

I think you may have touched on this in one of your posts. I just  
wanted to expand on it (since I have seen the pendulum swing between  
fuzzy and precise parsers over the course of ProjectBuilder/Xcode  
development).

Most language sensitive features I can think of fall into the fuzzy or  
precise camp. I'd be interested in hearing about features that would  
directly benefit from a parse tree (which is a middle ground I don't  
have as much direct experience with).

snaroff

> -Chris
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev