[cfe-dev] Should we build semantically invalid nodes?

Mon Oct 27 00:55:40 PDT 2008

Chris Lattner wrote:
>
> I don't buy it.  If you try to handle error cases, how far do you go?  
> What if the code doesn't parse?  Are you suggesting we have multiple 
> level of error cases and only some of them are handled and some 
> aren't?  How do we determine what is a 'fatal' error vs not?  How do 
> we know what sort of errors each client can tolerate?
>
> If you want to define a clean interface and have a specific client in 
> mind, that's fine.  It seems to me that you're very focused on one 
> perceived need, but I don't see a clean interface here.  When the code 
> correctly parses, we can make strong guarantees about what the trees 
> mean, and we know that nothing got dropped on the floor.
>
> If the code didn't parse, we can't guarantee anything (because 
> skipping can jump over anything).  If it parsed but didn't type check, 
> then the clients could not depend on types at all.  If it parsed but 
> did not pass various semantic checks (e.g. invalid operation in a case 
> value) it can't depend on sane exprs being in various places.  If sema 
> starts building ASTs that are invalid, then it would have to handle 
> them, which I strongly think is a bad idea for all the reasons already 
> discussed.
>
> Finally, if code was erroneous, then you really don't know enough to 
> refactor safely.  For example:
>
> int G;
> void foo() {
>    <<mumble>>
>
>    print(G);
> }
>
> if "mumble" was a broken definition of a shadowed G, then you really 
> don't want to rename the inner G.
>
> I strongly believe that trying to refactor code that doesn't work is a 
> bad idea.  There are specific things that can be done by (e.g.) 
> matching braces that can make sense on invalid code (for example, 
> Xcode has a textual "rename in scope") but I don't consider those to 
> really be refactoring.  If/when we have refactoring and this becomes a 
> blocker, we should figure out what the right approach is then IMO.

I totally see your point about a powerful refactoring engine that has 
strong guarantees about what it does.
I'm talking about a more practical (I mean, "weak" guarantees) approach, 
on the style of "look, at the current state of code, I see these 
locations here as references", then a list is displayed to the 
programmer so he can check and approve/disprove each transformation.
The ideal situation would be that the code compiles so that all 
references are found, but this choice is left to the programmer.

Here's another suggestion:
Going the break-up-Sema way like:

> To me, it would be ideal if we could split up Sema into different 
> modules somehow based on purpose (types, decls, expr/stmts, c++, objc, 
> etc) rather than split up each individual action callback. 

Is awesomely valuable since it will actually make it easy to build 
custom/specialized action modules.
How about an Action library that holds the components from where Sema 
(and possible other modules) is built ?
And another action module that will help identify and get the components 
out of Sema for reuse ?

What exactly this module does doesn't affect anything else (since it's a 
different module altogether) but let's say that the purpose is to 
identify each identifier used in the program and categorize it based on 
its kind (which will probably be different for different contexts).
By kind I mean typename/function/global variable/parameter/local 
variable/instance field/static field etc.
With this information an IDE could do syntax coloring taking into 
account the kind of the identifier.

Any thoughts ?

(Yeah, you would be able to also get this information from a 
parse-tree.. oops, did I say the 'p' word again ? :-)

-Argiris