[cfe-dev] Should we build semantically invalid nodes?
clattner at apple.com
Sat Oct 25 11:06:25 PDT 2008
On Oct 25, 2008, at 10:35 AM, steve naroff wrote:
> On Oct 25, 2008, at 11:06 AM, Sebastian Redl wrote:
>> Argiris Kirtzidis wrote:
>>> Hey Steve, breaking up Sema is a ridiculous awesome idea!
>>> Here's a thought I'd like to throw around..
>>> Could there be something like "composable Actions" ? The ASTBuilder
>>> would build the AST while Sema would do semantic checks and reject
>>> invalid nodes.
>>> This will cleanly separate the semantic checks from the AST
>>> building and, as you said, will make the code more maintainable.
>> Unless the complexity of creating and maintaining that separation
>> exceeds that of having the merged code.
> I totally agree. I think this would be a fairly disruptive change. We
> would need a fairly compelling reason to tackle it.
> At this point, it's just "food for thought". It is true that Sema has
> grown considerably and it would be nice to benefit from some of its
> functionality with having to take it all.
I think this something of a dangerous path. Semantic analysis is
complex and intertwined enough (even in C, but particularly in C++)
that adding abstractions should only be done really carefully.
It seems to me that it comes down to the clients that are the ultimate
consumers of this information. Since Sema is perfectly fine for
correct code, lets ignore all clients that require well-formed code
(e.g. codegen, refactoring, etc) and those that aren't harmed by
requiring it (static analysis). These clients are incidentally the
ones that are doing "deep analysis" of the AST and really benefit from
having a lot of invariants in the AST that absolutely must be true for
sanity. Lack of these invariants would require sprinkling their
(incredibly non-trivial) code with lots of special cases and hacks
that I'd really like to avoid.
Another set of clients are things like "indexers" that want to find
all the function definitions and global variables so you can "click on
a function and jump to its definition". For this sort of use, a
simple actions module plugging into the parser is just fine.
What sort of clients would benefit substantially from a broken and
partially formed AST? If we really wanted this sort of thing, it
seems like it would be cleanest to do what Steve said: define a new
actions module that just builds an AST (which can even use the same or
an extended set of nodes as Sema) but doesn't do any real checks,
doesn't assign types, etc. At this point, you have more parse tree
than an AST. I could imagine that something like this would be
useful, but can't think of any specific clients.
To be clear, I want to separate out two notions from this. First, we
don't need anything like this to get loc info for types. That is a
straight-forward extension over what we have, and making sema
(optionally) do it would be easy and non-invasive. Second, this whole
discussion started with a discussion of error recovery. While related
to the above, I still really really think that *Sema* shouldn't return
invalid nodes and should only "correct" them in really obvious cases.
Sema is extremely complex, and having it not be able to depend on the
invariants we have on AST nodes would be very bad, as Doug pointed out.
More information about the cfe-dev