[cfe-dev] [PATCH] C++ nested-names (Parser) and annotation tokens

Thu Oct 9 23:35:49 PDT 2008

Hi,

The attached patches implement support for nested-name-specifiers 
(foo::bar::x) on the Parser utilizing 'annotation tokens' (many thanks 
to Doug for the idea here: 
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002664.html)

About annotation tokens:

These are a special kind of tokens that the parser may use (not the 
lexer) to replace a stream of lexed tokens with a single one that 
encapsulates the relevant semantic information.
There are two kinds:
-typename annotation (represents a typedef name in C, and a possibly 
qualified typename in C++, like "foo::bar::myclass")
-C++ scope annotation (represents a nested-name-specifier, ("foo::bar::")

Annotation tokens contain a void* value that represents semantic 
information specific to the annotation kind (a TypeTy* for typename and 
CXXScopeTy* for scope) and the SourceRange of the tokens that they replaced.
As you can see in the attached "annot-token.patch" there were some 
changes to the Token class to support annotations but its size did not 
change.

The benefits of the annotation tokens are:

----- 1) Vastly simplified handling of nested-names.
In my previous attempts at nested-names, the main issue was how to keep 
track of the "C++ scope specifier state" in a way so that introducing 
nested-names, at "parsing contexts" that don't particularly care about 
nested-names, won't over-complicate things and cause a lot of code 
duplication for the parsing code. Here's an example on how annotation 
tokens handle that:

Assume that we have:
  sizeof( foo::bar::x )

sizeof doesn't particularly care about nested names, it only wants to 
find out if it has a type or an expression and defer parsing to the 
appropriate parsing functions.
Here's how it works if "foo::bar::x" is a type.

-sizeof calls Parser::isDeclarationSpecifier
    -Parser::isDeclarationSpecifier at the beginning calls 
Parser::AnnotateToken,
        -Parser::AnnotateToken parses and resolves both the scope-spec 
and the typename and sets as current token an annotation type token that 
indicates the type
    -Parser::isDeclarationSpecifier sees that the current token is an 
annotation type token and returns true since this is a declaration specifier
-sizeof calls Parser::ParseTypeName
    -When execution reaches Parser::ParseDeclarationSpecifiers, it sees 
the annotation type token, takes the information from it and "consumes" 
it from the token stream.

Ok, how about if "foo::bar::x" is an expression ?:

-sizeof calls Parser::isDeclarationSpecifier
    -Parser::isDeclarationSpecifier, at the beginning calls 
Parser::AnnotateToken,
        -Parser::AnnotateToken parses the scope-spec, sees that 'x' is 
not a typename and sets as current token an annotation scope token for 
"foo::bar::" (which is followed by the 'x' identifier token)
    -Parser::isDeclarationSpecifier sees that the current token is not a 
declaration specifier and returns false
-sizeof calls Parser::ParseExpression
    -When execution reaches Parser::ParseCastExpression, the annotation 
scope token indicates a qualified-id expression which is handled by 
Parser::ParseCXXIdExpression
        -Parser::ParseCXXIdExpression takes the information from the 
annotation scope token and calls Actions.ActOnIdentifierExpr by passing 
the 'x' identifier and the specific C++ scope that it should be a member of

The important thing to notice about the above is that nested-names 
didn't affect the parsing logic of contexts that don't directly deal 
with nested-names.
Sizeof didn't have to do some special check for nested names. If the 
expression was this:

sizeof( foo::bar:: )

The error would be reported by Parser::ParseCXXIdExpression, sizeof 
doesn't have to check for this too.

At this point you may think that the side-effects of 
Parser::isDeclarationSpecifier (changing the token stream) may lead to 
problems, but in practice, due to how tokens are used, this is highly 
unlikely.
The parser mostly deals with just what is the current token and how that 
affects the current parsing logic. It doesn't have some "long term token 
memory" that can be "unsynchronized" by changing the token stream.

----- 2) Efficient backtracking.
The ambiguity resolution parser can use annotation tokens to spare the 
Parser from having to re-parse nested-names.
The nested-names (and typenames) will be resolved by the tentative 
parser once and the normal parser will use the annotation tokens.

----- 3) While annotation tokens bring the most benefits for C++, they 
are also useful for C too.
Currently, a typename gets looked up twice, once in 
Parser::isDeclarationSpecifier and then in 
Parser::ParseDeclarationSpecifiers. By replacing the typename with an 
annotation token, a typename gets looked up and resolved only once.

Any comments are welcome!

-Argiris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nns-parser.patch
Type: text/x-diff
Size: 34052 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081009/92cecb2c/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: annot-token.patch
Type: text/x-diff
Size: 7994 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081009/92cecb2c/attachment-0001.patch>