[cfe-dev] [PATCH] C++ decl/expr ambiguity resolution approach
Chris Lattner
clattner at apple.com
Sat Aug 23 18:22:40 PDT 2008
On Aug 22, 2008, at 8:16 PM, Argiris Kirtzidis wrote:
> The attached patch disambiguates between declaration or expression
> statements. It uses tentative parsing and backtracking.
> For the common cases no tentative parsing is needed; for the
> ambiguous ones, this will result in multiple type checks (for
> tentative parsing and for actual parsing).
> As I've mentioned in a previous post, this can be made efficient in
> a later patch by caching the type checking results of the Action
> module.
Hi Argiris,
You have obviously put a lot of time and thought into this code, and
it shows in its cleanliness and attention to detail. I have a few
specific comments about the patch, but I'd like to first talk about
the approach you've chosen to make sure we're on the same page.
The problem is that we don't know whether something is a declaration
or a statement without doing a significant amount of parsing to
decide. If it is a declaration, we have to accept it as such,
otherwise we should try parsing it as a statement. If neither work it
is an error.
At root, your approach basically boils down to a very simple approach:
any time when we can't tell whether something is an obvious statement
(an important optimization!), you fire off a special parser that looks
ahead and sees if the tokens match the declaration grammar. This
parser comes back with an answer, and based on that answer, the normal
parser goes off and parses with one approach or another.
This is conceptually an extremely clean approach, because it decouples
disambiguation from actual parsing. However, this has two big
downsides: 1) maintenance: it duplicates a lot of parsing rules,
because now we have to be able to "tentatively parse" and "really
parse" all the declaration stuff. 2) performance: in addition to the
basic cost of backtracking this has to duplicate parsing, and some
amount of sema (type lookups etc).
To me, these are pretty big drawbacks, and I want to make sure we're
all on the same page and agree that this is the right approach before
we go much farther.
Ignoring optimizations for common cases, the only other realistic
solution that I know of is to make use of tentative parsing to
actually *start parsing* the ambiguous cases as declarations, and then
roll back and re-parse if an error occurs. The code basically looks
like this:
// Parse declaration tentatively.
PP.StartTentativeParsing();
Res = ParseDeclarationStatement();
if (!Res.isError) {
PP.Commit();
return Res;
}
PP.Rollback();
// Parse statement non-tentatively
return ParseStatement();
}
This approach produces some very interesting tradeoffs. The drawback
of this approach is that any action made while parsing and sema'ing
the declaration statement *must be able to be rolled back*. This
means that warnings need to be buffered up (errors stop the tentative
parse) and that Sema needs to be able to undo any changes it makes to
its internal state (e.g. remove declared variables, delete initializer
expressions that are parsed and built, etc). Also, if the ambiguous
statement ends up being an expression, your approach would be superior
in space and time.
However, the advantage of this approach is that it is much faster than
doing a pre-parse when things turn out to actually be a declaration.
Since many statement cases get filtered out before reaching the
tentative case, I think this is a win. Additionally, this does not
require duplicating parsing code for a bunch of grammar productions,
which is my primary objection to your approach.
Did you consider the tentative parsing approach? What do you think of
it? Will you be able to reasonably expand your preparser to handle
cases like this?:
int X = X;
I think that this sort of example requires doing some amount of sema
in the disambiguation parser, though maybe the grammar is regular
enough to let you just skip over initializers completely (right now,
this doesn't matter for you because you stop lookahead at the =).
Once we agree on the overall approach, I'll hassle you about things
like whether we should factor declspec processing out of the tentative
part and try to find out if there are other important cases that can
get filtered out as expressions early on (e.g. what about a statement
that starts with an identifier that is known to be a variable/
function, can that ever start a declaration?). :)
Thoughts?
-Chris
More information about the cfe-dev
mailing list