[cfe-dev] Annotation tokens

Tue Aug 26 15:30:53 PDT 2008

Hi Argiris,

On Tue, Aug 26, 2008 at 1:31 PM, Argiris Kirtzidis <akyrtzi at gmail.com> wrote:
> Doug Gregor wrote:
>>
>> Performance-wise, tentative parsing can be improved by allowing it to
>> annotate the token stream with new "already parsed as a..." tokens as
>> it goes through. Good candidates for this kind of annotation are
>> qualified names (which Chris mentioned) and template-ids (GCC does the
>> latter).
>>
>
> I think this is a good idea that can be incorporated in Clang. This won't
> only be useful for improving performance of tentative parsing but also as a
> way to cleanly parse qualified names.

It should allow re-use of the qualified-name parser inside the
-pre-parser and the main parser, without requiring two passes over the
tokens that make up the qualified names.

> Parsing qualified names ("A::B::x" - let's say 'scope qualifiers' for the
> "A::B::" part) without backtracking is tricky.
> Chris suggested doing something like ParseWithLeadingScopeQualifier...
> functions. The problem with this is that it creates a kind of "fork" where
> parsing a bit of the grammar is duplicated.
> This wasn't a big deal with the ParseWithLeadingIdentifier for
> disambiguating labels/identifiers, but it's a much bigger problem with scope
> qualifiers because these are to a *lot* of places:
>
> -They are in parsing declaration specifiers
> -in declarators
> -in id-expressions
> -in type-specifiers
> -in declarations, expressions, function declarations, initializers, typeof,
> sizeof, etc..
> ...
>
> Anyway, you get the picture. This means there are a lot of "forking" that
> will lead to a lot of duplication of grammar parsing.

Yes.

> Here are a couple of examples of how annotation tokens can be used:
> -ParseDeclarationSpecifiers, may start parsing "A::B::" and if "A::B::x"
> turns out not to be a type, it may push a "scope qualifier annotation token"
> so that ParseDirectDeclarator can act upon it.
> -The "disambiguation" pre-parser may use annotation tokens so that this
> token stream
>
> A::T<int> (A::x);
>
> would turn to this when the normal parser starts parsing the declaration:
>
> 'type token' '(' 'scope token' 'x' ')' ';'
>
> The annotation tokens would hold a pointer value, TypeTy* for the "type
> token" and CXXScopeTy* for the "scope token".
>
> Any thoughts?

I think this is an important step for unifying the common bits of the
pre-parser and main parser, and should improve parsing performance.
The same technique would eventually be used for template-ids, and
could be used in other cases where avoiding additional parsing could
pay off (long decl-specifier-seqs? that might be taking the idea too
far...).

  - Doug