[cfe-dev] decl/expr ambiguity

Sun Aug 24 18:00:04 PDT 2008

Chris Lattner wrote:
>
> On Aug 24, 2008, at 3:59 PM, Argiris Kirtzidis wrote:
>> Chris Lattner wrote:
>>> Okay, here's another crazy idea.  If you boil it down, my 
>>> objections  to preparsing are basically:
>>>
>>> 1. the perf cost of having to do the prepare in *every* decl case.
>>> 2. [minor] the perf cost for qualified expr cases (std::cout << ...)
>>> 3. [minor] the maintenance cost of the second parser.
>>>
>>
>> 1) This is not true, as I explain in this post:
>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002625.html
>
> Either I'm very confused (likely!) or your current patch doesn't do 
> this.  As I mentioned in the other post, it looks like it runs the 
> preparser for any identifier at top level.  It even runs it for the "x 
> = 4" case.

In the second patch posted here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002617.html
No backtracking is enabled for the common cases:

Parser::TentativeParsingResult Parser::isCXXDeclarationStatement() {
.......
    TentativeParsingResult TPR = isCXXDeclarationSpecifier();
    if (TPR != TPR_ambiguous)
      return TPR;

    TentativeParsingAction PA(*this);

    TPR = TryParseSimpleDeclaration();

    PA.Revert();
....
}

isCXXDeclarationSpecifier(), does at most a one token lookahead. 
(actually, for 'typeof' it tentatively skips through it to see if 
"typeof(..)" is followed by '(' but this is too uncommon to even discuss 
it).
If the current token does not indicate a type, 
isCXXDeclarationSpecifier() does no token consumption.

>
>> 2) This is not inherent to the preparser, even if there's a 
>> "tentatively parse decl then parse as expr" approach, we still prefer 
>> to do such resolutions once; This perf cost needs to be solved in 
>> either case.
>
> By using a "parse expr with leading qualified name" approach, or with 
> something else?

Here's what I have in mind.
-something like "A::" indicates a scope qualifier.
-there's a parser method with a purpose to resolve scope qualifiers, say 
"ParseCXXScopeQualifier", it parses them, calls sema actions to resolve 
them and returns a CXXScopeTy* from Sema (this is used to pass to sema 
actions that will need it).
-ParseCXXScopeQualifier can cache that CXXScopeTy* result, so that when 
it is called again for the same token source location, it will return 
the cached result without doing any sema resolution at all. It will also 
skip the necessary number of tokens (if it was previously called with 
"A::B::", it will skip 4 tokens).

Now say that a statement starts with this:
A::B::T a

-The preparser sees that 'A::' is a scope qualifier and calls 
ParseCXXScopeQualifier to do its thing. Then calls Parser::IsTypeName. 
Both methods cache their results.
-The preparser sees that "A::B::T" is not followed by a '(', backtracks, 
and returns "it's a declaration"
-The normal parser sees that 'A::' is a scope qualifier and calls 
ParseCXXScopeQualifier which just returns the previously cached result 
and skips 4 tokens.
-The normal parser calls Parser::IsTypeName which also just returns the 
previously cached result.
-sema resolutions are only done once

What do you think ?

PS: Here's a weird test I did. I used the preparser to see how many 
"ambiguous" declarations (of T(...) style) there are in actual C code. I 
used a few of GCC files:
gcc.c:  declarations in functions: 432       ambiguous: 0
expr.c  declarations in functions: 730      ambiguous: 0
combine.c declarations in functions: 564   ambiguous: 0

I think this suggests that it's a really uncommon case.
If anyone wants to give me a preprocessed file for a test, please do!

-Argiris