[cfe-dev] [PATCH] C++ decl/expr ambiguity resolution approach

Sun Aug 24 14:30:27 PDT 2008

On Sat, Aug 23, 2008 at 9:22 PM, Chris Lattner <clattner at apple.com> wrote:
> At root, your approach basically boils down to a very simple approach:
> any time when we can't tell whether something is an obvious statement
> (an important optimization!), you fire off a special parser that looks
> ahead and sees if the tokens match the declaration grammar.  This
> parser comes back with an answer, and based on that answer, the normal
> parser goes off and parses with one approach or another.
>
> This is conceptually an extremely clean approach, because it decouples
> disambiguation from actual parsing.  However, this has two big
> downsides: 1) maintenance: it  duplicates a lot of parsing rules,
> because now we have to be able to "tentatively parse" and "really
> parse" all the declaration stuff.  2) performance: in addition to the
> basic cost of backtracking this has to duplicate parsing, and some
> amount of sema (type lookups etc).
>
> To me, these are pretty big drawbacks, and I want to make sure we're
> all on the same page and agree that this is the right approach before
> we go much farther.
>
>
> Ignoring optimizations for common cases, the only other realistic
> solution that I know of is to make use of tentative parsing to
> actually *start parsing* the ambiguous cases as declarations, and then
> roll back and re-parse if an error occurs.  The code basically looks
> like this:
>
>
>   // Parse declaration tentatively.
>   PP.StartTentativeParsing();
>   Res = ParseDeclarationStatement();
>   if (!Res.isError) {
>      PP.Commit();
>      return Res;
>   }
>   PP.Rollback();
>
>   // Parse statement non-tentatively
>   return ParseStatement();
> }

This is the approach that GCC's parser uses (as of GCC 3.4), and it
works reasonably well for them. A review of GCC's parser could give us
some sense of how often tentative parsing is needed, although my
impression is that they tentatively parse more often than is
absolutely necessary. The problem is that a lot of C++'s tricky
parsing rules (including lots of template parsing rules) get into the
declaration/expression ambiguity, so there is likely to be a
significant amount of duplication in the special parser. And the
declaration/expression ambiguity isn't the only place where
non-trivial disambiguation is required in C++.

Performance-wise, tentative parsing can be improved by allowing it to
annotate the token stream with new "already parsed as a..." tokens as
it goes through. Good candidates for this kind of annotation are
qualified names (which Chris mentioned) and template-ids (GCC does the
latter).

Having worked with it a bit (and run into a lot of places where
tentative parsing was a natural approach), I'm a big fan of tentative
parsing for ambiguity resolution.

  - Doug