[cfe-dev] Annotation tokens

Tue Aug 26 10:31:30 PDT 2008

Doug Gregor wrote:
> Performance-wise, tentative parsing can be improved by allowing it to
> annotate the token stream with new "already parsed as a..." tokens as
> it goes through. Good candidates for this kind of annotation are
> qualified names (which Chris mentioned) and template-ids (GCC does the
> latter).
>   

I think this is a good idea that can be incorporated in Clang. This 
won't only be useful for improving performance of tentative parsing but 
also as a way to cleanly parse qualified names.

Parsing qualified names ("A::B::x" - let's say 'scope qualifiers' for 
the "A::B::" part) without backtracking is tricky.
Chris suggested doing something like ParseWithLeadingScopeQualifier... 
functions. The problem with this is that it creates a kind of "fork" 
where parsing a bit of the grammar is duplicated.
This wasn't a big deal with the ParseWithLeadingIdentifier for 
disambiguating labels/identifiers, but it's a much bigger problem with 
scope qualifiers because these are to a *lot* of places:

-They are in parsing declaration specifiers
-in declarators
-in id-expressions
-in type-specifiers
-in declarations, expressions, function declarations, initializers, 
typeof, sizeof, etc..
...

Anyway, you get the picture. This means there are a lot of "forking" 
that will lead to a lot of duplication of grammar parsing.

Here are a couple of examples of how annotation tokens can be used:
-ParseDeclarationSpecifiers, may start parsing "A::B::" and if "A::B::x" 
turns out not to be a type, it may push a "scope qualifier annotation 
token" so that ParseDirectDeclarator can act upon it.
-The "disambiguation" pre-parser may use annotation tokens so that this 
token stream

A::T<int> (A::x);

would turn to this when the normal parser starts parsing the declaration:

'type token' '(' 'scope token' 'x' ')' ';'

The annotation tokens would hold a pointer value, TypeTy* for the "type 
token" and CXXScopeTy* for the "scope token".

Any thoughts?

-Argiris