[cfe-dev] [PATCH] C++ nested-name-specifier (Parser)

Mon Aug 11 10:33:35 PDT 2008

For parsing scope specifiers ("A::B::") I tried out having the parser 
keep a "parsed scope spec" state, so that parsing functions can check it 
and act accordingly.
This works and Sema only resolves a scope specifier once, but it's not a 
very attractive option from the maintainability viewpoint.

You basically insert various "hasParsedCXXScopeSpec()" checks at several 
points. Any parsing function may need to use such a check in order to 
proceed correctly and it's not clear which functions need to do that and 
at what exact point; you have to consider the execution paths that lead 
to this function and whether a scope specifier may be parsed before this 
function is called.

To be more specific, here are some examples:

Parser::DeclTy *Parser::ParseDeclaration(unsigned Context) {
  if (hasParsedCXXScopeSpec())
    return ParseSimpleDeclaration(Context);

  switch (Tok.getKind()) {
  case tok::kw_namespace:
    return ParseNamespace(Context);
  default:
    return ParseSimpleDeclaration(Context);
  }
}

Parser::ExprResult Parser::ParseExpression() {
  if (Tok.is(tok::kw_throw) && !hasParsedCXXScopeSpec())
    return ParseThrowExpression();

  ExprResult LHS = ParseCastExpression(false);
  if (LHS.isInvalid) return LHS;

  return ParseRHSOfBinaryExpression(LHS, prec::Comma);
}

A more general approach that deals with the C++ ambiguities, may work 
for parsing scope specifiers too.

The C++ ambiguities, that function-style casting introduces, are like 
these mentioned here:
http://publib.boulder.ibm.com/infocenter/lnxpcomp/v7v91/index.jsp?topic=/com.ibm.vacpp7l.doc/language/ref/clrc08cplr403.htm

I think that tentative parsing/backtracking is the most clean way to 
deal with the C++ ambiguities. In general, backtracking does not work 
well for most Action methods; for example ActOnIdentifierExpr creates 
new Expr objects and if you backtrack after calling ActOnIdentifierExpr, 
you have to deal with both the Expr objects and the diagnostics that may 
occur (cache them ? clean up Expr objects if you have to discard them ?).
But there are a few Action methods that serve mostly as an aid for the 
parser and do not deal with AST building. "isTypeName" is one of them, 
and "ActOnNestedNameSpecifier" can be considered one of them too.
ActOnNestedNameSpecifier basically takes "A::" and returns a CXXScopeTy 
that the parser can use to pass along to other actions, so backtracking 
can be done after calling it without a lot of fuss.
If you get "C<int>::", you will instantiate a template class but this 
instantiation will be added to the list of unique types, it won't be 
discarded, thus backtracking can be done with something like this too.

Preprocessor can now do efficient backtracking, here's my suggestion for 
having efficient backtracking in the Parser too:

Have a special parsing function with the purpose of  determining whether 
a statement is a declaration.
It will work by using tentative parsing/backtracking.
It will only call Action methods to do typechecking (isTypeName) and 
resolve scope specifiers ("A::") (the Action methods may instantiate 
template classes).
During tentative parsing, the result of these Action methods will be 
cached based on token location (we can assume that on a particular 
SourceLocation, if you call Action.isTypeName, the result will be the same).
We don't have to cache diagnostics, if an error occurs because of a 
mistyped scope specifier (or when doing template class instantiation), 
allow the diagnostic and skip to the next statement.
As soon as we know that the current statement is a declaration or not, 
do backtracking and call the normal parsing functions. For the more 
common cases, the necessary tentative parsing will be little or no 
tentative parsing at all.
When, during normal parsing, the parser needs to call an Action method 
that was already called during tentative parsing, the parser will use 
the cached result.
Before moving to the next statement, clear the cached results of Action 
methods, and repeat.

With the above approach, Sema will do typechecking only once for each 
case the parser needs typechecking.

Any thoughts about the above ?

-Argiris