[cfe-dev] [PATCH] C++ decl/expr ambiguity resolution approach

Sat Aug 23 20:37:22 PDT 2008

On Sat, Aug 23, 2008 at 6:22 PM, Chris Lattner <clattner at apple.com> wrote:
> This is conceptually an extremely clean approach, because it decouples
> disambiguation from actual parsing.  However, this has two big
> downsides: 1) maintenance: it  duplicates a lot of parsing rules,
> because now we have to be able to "tentatively parse" and "really
> parse" all the declaration stuff.  2) performance: in addition to the
> basic cost of backtracking this has to duplicate parsing, and some
> amount of sema (type lookups etc).
>
> To me, these are pretty big drawbacks, and I want to make sure we're
> all on the same page and agree that this is the right approach before
> we go much farther.

(1) is definitely an issue... it's a balancing act.  I won't try to
comment on it because I don't know the difficulty of the alternative.

For (2), the current code doesn't try to avoid activating
backtracking; we can easily avoid it for common cases depending on how
expensive it is to activate backtracking.  The only cases which need
the statement vs. expression disambiguation are those starting with T(
and N::T(, where T is a type. We can catch all the cases which don't
start with a complicated type with only a single token of lookahead.

The current version is already quite optimized in terms of the
distance it looks ahead; it tries to cut out as soon as possible, and
most of the time it shouldn't end up looking very far ahead.  This
approach also has the advantage that it doesn't need to completely
parse the code; it can take shortcuts in places where it doesn't
affect the statement vs. expression distinction.

> Also, if the ambiguous
> statement ends up being an expression, your approach would be superior
> in space and time.

If we want to know what the balance is here, we'll really have to
benchmark; hopefully the cases where we have to lookahead more than a
few tokens are rare, in which case performance isn't so much of an
issue.

> Did you consider the tentative parsing approach?  What do you think of
> it?  Will you be able to reasonably expand your preparser to handle
> cases like this?:
>
> int X = X;
>
> I think that this sort of example requires doing some amount of sema
> in the disambiguation parser, though maybe the grammar is regular
> enough to let you just skip over initializers completely (right now,
> this doesn't matter for you because you stop lookahead at the =).

If an identifier changes from a type to an identifier in the middle of
a statement, and that affects disambiguation, the program is
ill-formed per the standard; we can just assume for disambiguation
purposes that anything that isn't a type is a valid variable.  And for
initializers, I'm pretty sure the rules allow skipping forward to the
next comma/semicolon once we see an = sign; what follows the = sign is
just a regular assignment-expression, so it doesn't matter in terms of
disambiguation.

-Eli