[cfe-dev] [PATCH] C++ nested-name-specifier (Parser)

Sat Aug 9 17:18:39 PDT 2008

On Aug 9, 2008, at 4:49 PM, Argiris Kirtzidis wrote:

> And about the preprocessor changes: Is it ok to commit them, and is  
> it more intuitive to change 'Rewind' to 'Backtrack' ?

Yes, this does look like a nice cleanup for the previous mechanism.   
Please improve this comment to explain what the members are and what  
the invariants are:

+  // Cached tokens state.
+  typedef std::vector<Token> CachedTokensTy;
+  CachedTokensTy CachedTokens;
+  CachedTokensTy::size_type CachedLexPos;
+  CachedTokensTy::size_type CachedRewindPos;

and rename Rewind -> Backtrack and commit it, thanks!

In the future, this will have to be generalized a bit.  IIUC, C++ can  
require nested backtracking in some cases.  This would require a stack  
of speculation.  Also, we may eventually need warnings that are  
emitted while speculating to be cached and discarded if the lookahead  
is discarded.

> Chris Lattner wrote:
>>
>>> The attached 'pp-caching' patch, modifies Preprocessor::LookAhead  
>>> (making it efficient enough to replace LookNext) and adds  
>>> 'rewinding' functionality.
>>> That is, one can call Preprocessor::EnableRewindAtThisPos, lex and  
>>> consume any number of tokens, and later call Preprocessor::Rewind  
>>> to make the preprocessor re-lex the tokens that were previsouly  
>>> lexed since EnableRewindAtThisPos was called.
>>
>> I think we will need this functionality (aka 'tentative parsing' or  
>> 'backtracking') at some point, but I'm not sure why we need it for  
>> this.  As a prolog, I haven't dug into the part of the spec, so if  
>> I'm missing something obvious, please be gentle on me :)
>
> This is not a requirement of the spec, it just simplifies the parser  
> in a similar way to how introducing NextToken() simplified it.

Ok.

>> My basic understanding (again, based on intuition, not reading of  
>> the spec) of scope resolution was that it could actually be  
>> implemented without lookahead.  My thought was that if you  
>> see ::A::B or A::B that you start by looking up the ::A /A part and  
>> decide what it references.  when looking up A, consider if it  
>> resolves to a namespace.  My though would be that the parser  
>> invokes an action to look up the decl corresponding to A.
>>
>> Options are that it could either be:
>> 1) typedef name
>> 2) variable/ivar/etc
>> 3) namespace
>> 4) struct
>> ...
>>
>> One ::A is looked up, based on what the action returns, my  
>> assumption is that the parser would see the next :: and do a  
>> subscope lookup.  Basically it would parse "::B" and then invoke an  
>> action to look up ::B in the decl returned by the ::A lookup.  At  
>> some point presumably, you end up with a type or a variable etc,  
>> which we'd need to treat as the start of a declspec etc.  One nice  
>> thing about this is that it would not require the parser to build  
>> up an explicit 'CXXScopeSpec' object, Sema could build it if it  
>> wanted, but it wouldn't be required.
>
> As far as the spec is concerned, you basically want to do something  
> like this:
>
> -You have "A::B"
> -Parse "A::", not just 'A'. This is because "A::" indicates nested- 
> name-specifier, and you want to do a specialized name lookup for  
> nested-names. This specialized lookup will only consider classes/ 
> namespaces and ignore the rest. For example:
>
> namespace A { int x; }
> void f() {
>  int A;
>  A::x = 0;  // We don't want 'A' lookup to return 'int A' here, we  
> want the namespace A.
> }
>
> -Now parse "B" and lookup 'B' in the scope of 'A'.

Ok.  Conceptually, I don't see a problem with this.  After consuming  
the identifier, if it is followed by a colon, invoke a different  
action to do the lookup.

However, I think I am starting to understand what you mean.   
isDeclarationSpecifier() (for example) contains this code right now:

   switch (Tok.getKind()) {
   ...
     // typedef-name
   case tok::identifier:
     return Actions.isTypeName(*Tok.getIdentifierInfo(), CurScope) != 0;

This means that following my approach would take us back down the  
route of having a "parse expression with leading identifier expression  
already eaten" method, and things like that.  This is because you'd  
have to do something like:

case tok::identifier:
   // eats the current identifier and related type stuff as a whole,
   D = ConsumeAndResolveIdentifier();
   if (Actions.isTypeName(D))
     ... it's a decl spec ...
   else
     .. it's an 'identifier expression' ..

Is that the problem?  It also means that isDeclarationSpecifier would  
need backtracking and significant other stuff to work as a predicate.   
In practice, this means we'd want to refactor all callers to not use  
it like it does.

>> Is this approach achievable?  I'd prefer to avoid parsing and  
>> rewinding unless absolutely necessary.
>
> The advantage of the rewinding part is that there's no need for the  
> Parser or Sema to keep and look after a 'C++ scoping' state (a state  
> that is not just local to functions, but part of the Parser/Sema  
> class).

I'm not sure I believe that.  How do you propose to handle more  
complex things (like the T<int>::x case) in the future?  It seems that  
we're going to need to invoke sema to do template instantiation and  
other stuff at some point: passing a pre-parsed thing like this to  
sema as one big action call seems difficult.
>
> This also means that we can drop the rewinding part if the Parser or  
> Sema keep and look after a 'C++ scoping' state.
> If the Parser keeps the state, it will build and pass 'CXXScopeSpec'  
> objects to action methods, (like in my patch).
> If the Sema keeps the state, it will make error recovery awkward.

I actually think that things are more awkward with having Sema work  
this out.  Specifically, if you have: "A :: B :: C" you might have one  
reference or it might be "A::B ::C" which can occur in a few places in  
the grammar:
http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/doc/coloncolon.txt

If the parser just passed all of A/B/C to Sema, I'm not sure how it  
would handle this.  It seems really easy if you invoke actions for  
"A::" then "B" (which returns a type).  When the parser got a type, it  
would naturally ratchet on and parse ::C as a qualified name.

-Chris