[cfe-dev] [PATCH] C++ nested-names (Parser) and annotation tokens
Argiris Kirtzidis
akyrtzi at gmail.com
Thu Oct 9 23:35:49 PDT 2008
Hi,
The attached patches implement support for nested-name-specifiers
(foo::bar::x) on the Parser utilizing 'annotation tokens' (many thanks
to Doug for the idea here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002664.html)
About annotation tokens:
These are a special kind of tokens that the parser may use (not the
lexer) to replace a stream of lexed tokens with a single one that
encapsulates the relevant semantic information.
There are two kinds:
-typename annotation (represents a typedef name in C, and a possibly
qualified typename in C++, like "foo::bar::myclass")
-C++ scope annotation (represents a nested-name-specifier, ("foo::bar::")
Annotation tokens contain a void* value that represents semantic
information specific to the annotation kind (a TypeTy* for typename and
CXXScopeTy* for scope) and the SourceRange of the tokens that they replaced.
As you can see in the attached "annot-token.patch" there were some
changes to the Token class to support annotations but its size did not
change.
The benefits of the annotation tokens are:
----- 1) Vastly simplified handling of nested-names.
In my previous attempts at nested-names, the main issue was how to keep
track of the "C++ scope specifier state" in a way so that introducing
nested-names, at "parsing contexts" that don't particularly care about
nested-names, won't over-complicate things and cause a lot of code
duplication for the parsing code. Here's an example on how annotation
tokens handle that:
Assume that we have:
sizeof( foo::bar::x )
sizeof doesn't particularly care about nested names, it only wants to
find out if it has a type or an expression and defer parsing to the
appropriate parsing functions.
Here's how it works if "foo::bar::x" is a type.
-sizeof calls Parser::isDeclarationSpecifier
-Parser::isDeclarationSpecifier at the beginning calls
Parser::AnnotateToken,
-Parser::AnnotateToken parses and resolves both the scope-spec
and the typename and sets as current token an annotation type token that
indicates the type
-Parser::isDeclarationSpecifier sees that the current token is an
annotation type token and returns true since this is a declaration specifier
-sizeof calls Parser::ParseTypeName
-When execution reaches Parser::ParseDeclarationSpecifiers, it sees
the annotation type token, takes the information from it and "consumes"
it from the token stream.
Ok, how about if "foo::bar::x" is an expression ?:
-sizeof calls Parser::isDeclarationSpecifier
-Parser::isDeclarationSpecifier, at the beginning calls
Parser::AnnotateToken,
-Parser::AnnotateToken parses the scope-spec, sees that 'x' is
not a typename and sets as current token an annotation scope token for
"foo::bar::" (which is followed by the 'x' identifier token)
-Parser::isDeclarationSpecifier sees that the current token is not a
declaration specifier and returns false
-sizeof calls Parser::ParseExpression
-When execution reaches Parser::ParseCastExpression, the annotation
scope token indicates a qualified-id expression which is handled by
Parser::ParseCXXIdExpression
-Parser::ParseCXXIdExpression takes the information from the
annotation scope token and calls Actions.ActOnIdentifierExpr by passing
the 'x' identifier and the specific C++ scope that it should be a member of
The important thing to notice about the above is that nested-names
didn't affect the parsing logic of contexts that don't directly deal
with nested-names.
Sizeof didn't have to do some special check for nested names. If the
expression was this:
sizeof( foo::bar:: )
The error would be reported by Parser::ParseCXXIdExpression, sizeof
doesn't have to check for this too.
At this point you may think that the side-effects of
Parser::isDeclarationSpecifier (changing the token stream) may lead to
problems, but in practice, due to how tokens are used, this is highly
unlikely.
The parser mostly deals with just what is the current token and how that
affects the current parsing logic. It doesn't have some "long term token
memory" that can be "unsynchronized" by changing the token stream.
----- 2) Efficient backtracking.
The ambiguity resolution parser can use annotation tokens to spare the
Parser from having to re-parse nested-names.
The nested-names (and typenames) will be resolved by the tentative
parser once and the normal parser will use the annotation tokens.
----- 3) While annotation tokens bring the most benefits for C++, they
are also useful for C too.
Currently, a typename gets looked up twice, once in
Parser::isDeclarationSpecifier and then in
Parser::ParseDeclarationSpecifiers. By replacing the typename with an
annotation token, a typename gets looked up and resolved only once.
Any comments are welcome!
-Argiris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nns-parser.patch
Type: text/x-diff
Size: 34052 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081009/92cecb2c/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: annot-token.patch
Type: text/x-diff
Size: 7994 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20081009/92cecb2c/attachment-0001.patch>
More information about the cfe-dev
mailing list