[cfe-dev] [PATCH] C++ nested-names (Parser) and annotation tokens

Wed Oct 29 10:40:06 PDT 2008

Hi Argiris,

2008/10/10 Argiris Kirtzidis <akyrtzi at gmail.com>:
> The attached patches implement support for nested-name-specifiers
> (foo::bar::x) on the Parser utilizing 'annotation tokens' (many thanks to
> Doug for the idea here:
> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002664.html)

Nice job taking a one-sentence description and turning it into a very
clean implementation!

> About annotation tokens:
>
> These are a special kind of tokens that the parser may use (not the lexer)
> to replace a stream of lexed tokens with a single one that encapsulates the
> relevant semantic information.
> There are two kinds:
> -typename annotation (represents a typedef name in C, and a possibly
> qualified typename in C++, like "foo::bar::myclass")
> -C++ scope annotation (represents a nested-name-specifier, ("foo::bar::")
>
> Annotation tokens contain a void* value that represents semantic information
> specific to the annotation kind (a TypeTy* for typename and CXXScopeTy* for
> scope) and the SourceRange of the tokens that they replaced.
> As you can see in the attached "annot-token.patch" there were some changes
> to the Token class to support annotations but its size did not change.

I'm glad to see that Token remained the same size.

> The benefits of the annotation tokens are:
>
> ----- 1) Vastly simplified handling of nested-names.
> In my previous attempts at nested-names, the main issue was how to keep
> track of the "C++ scope specifier state" in a way so that introducing
> nested-names, at "parsing contexts" that don't particularly care about
> nested-names, won't over-complicate things and cause a lot of code
> duplication for the parsing code.

I can definitely see how this helps... it means we can effectively
treat these nested names as a single token when parsing other parts of
the grammar, which will drastically simplify those other pieces.

>Here's an example on how annotation tokens
[snip]

> ----- 2) Efficient backtracking.
> The ambiguity resolution parser can use annotation tokens to spare the
> Parser from having to re-parse nested-names.
> The nested-names (and typenames) will be resolved by the tentative parser
> once and the normal parser will use the annotation tokens.

> ----- 3) While annotation tokens bring the most benefits for C++, they are
> also useful for C too.
> Currently, a typename gets looked up twice, once in
> Parser::isDeclarationSpecifier and then in
> Parser::ParseDeclarationSpecifiers. By replacing the typename with an
> annotation token, a typename gets looked up and resolved only once.
>
>
> Any comments are welcome!

First of all, I'm really impressed. This came out much cleaner than I
had anticipated. I have some nitpicks and a question below, but as far
as I'm concerned, if you have tests and some minimal Sema support (so
that all the tests run properly), please feel free to check it in.
Great job!

Some nitpicks follow.

+  /// ActOnCXXEnterDeclaratorScope - Called when a C++ scope specifier (global
+  /// scope or nested-name-specifier) is parsed, part of a declarator-id.
+  /// After this method is called, according to [C++ 3.4.3p3], names should be
+  /// looked up in the declarator-id's scope, until the declarator is
parsed and
+  /// ActOnCXXExitDeclaratorScope is called.
+  /// The 'SS' should be a non-empty valid CXXScopeSpec.
+  virtual void ActOnCXXEnterDeclaratorScope(Scope *S, const CXXScopeSpec &SS) {
+  }
+
+  /// ActOnCXXExitDeclaratorScope - Called when a declarator that previously
+  /// invoked ActOnCXXEnterDeclaratorScope(), is finished. 'SS' is the same
+  /// CXXScopeSpec that was passed to ActOnCXXEnterDeclaratorScope as well.
+  /// Used to indicate that names should revert to being looked up in the
+  /// defining scope.
+  virtual void ActOnCXXExitDeclaratorScope(const CXXScopeSpec &SS) {
+  }

It would be really, really great if we had an RAII class of some sort
to make sure that
ActOnCXXEnterDeclaratorScope/ActOnCXXExitDeclaratorScope calls were
always paired. For example, Parser::ParseDirectDeclarator's control
flow looks right, but the Enter call is pretty far from the Exit call
(and at a different level of "if" nesting).

I'm pretty certain that I don't understand the role of these
functions. It looks like they want Sema to keep some additional state
regarding the scope in which name lookup should occur that's somehow
different from the current scope of parsing, but I don't know when
that happens... when we're parsing a direct-declarator, say, when
declaring a member function out-of-line:

  void Sema::MemberFunction(TypeFromInSema Arg) { }

Once we've parse Sema::MemberFunction, we seem to be calling
ActOnCXXEnterDeclaratorScope. I assume that's so that the lookup of
TypeFromInSema will look into 'Sema' first?

Could we instead have a different kind of Scope that handles this case
of being inside "Sema" without

+  /// [C++]   template-id
+  ///
+  bool isTokenUnqualifiedId() const {

Please put a [TODO] or [FIXME] on the template-id production.

+  /// AnnotateToken - If the current token position is on a typename (possibly
+  /// qualified in C++) or a C++ scope specifier not followed by a typename,
+  /// AnnotateToken will replace one or more tokens with a single annotation
+  /// token representing the typename or C++ scope respectively.
+  /// This simplifies handling of C++ scope specifiers and allows efficient
+  /// backtracking without the need to re-parse and resolve nested-names and
+  /// typenames.
+  void AnnotateToken();

I think the name of this routine could be better, for two reasons.
First, it doesn't say what the annotations are doing---resolving
qualified-ids and such. Second, I'd like there to be a "Maybe" or a
"Try" at the beginning, to say that it might change the token stream
(or it might not).

  - Doug