[cfe-dev] decl/expr ambiguity
Argiris Kirtzidis
akyrtzi at gmail.com
Sun Aug 24 18:00:04 PDT 2008
Chris Lattner wrote:
>
> On Aug 24, 2008, at 3:59 PM, Argiris Kirtzidis wrote:
>> Chris Lattner wrote:
>>> Okay, here's another crazy idea. If you boil it down, my
>>> objections to preparsing are basically:
>>>
>>> 1. the perf cost of having to do the prepare in *every* decl case.
>>> 2. [minor] the perf cost for qualified expr cases (std::cout << ...)
>>> 3. [minor] the maintenance cost of the second parser.
>>>
>>
>> 1) This is not true, as I explain in this post:
>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002625.html
>
> Either I'm very confused (likely!) or your current patch doesn't do
> this. As I mentioned in the other post, it looks like it runs the
> preparser for any identifier at top level. It even runs it for the "x
> = 4" case.
In the second patch posted here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002617.html
No backtracking is enabled for the common cases:
Parser::TentativeParsingResult Parser::isCXXDeclarationStatement() {
.......
TentativeParsingResult TPR = isCXXDeclarationSpecifier();
if (TPR != TPR_ambiguous)
return TPR;
TentativeParsingAction PA(*this);
TPR = TryParseSimpleDeclaration();
PA.Revert();
....
}
isCXXDeclarationSpecifier(), does at most a one token lookahead.
(actually, for 'typeof' it tentatively skips through it to see if
"typeof(..)" is followed by '(' but this is too uncommon to even discuss
it).
If the current token does not indicate a type,
isCXXDeclarationSpecifier() does no token consumption.
>
>> 2) This is not inherent to the preparser, even if there's a
>> "tentatively parse decl then parse as expr" approach, we still prefer
>> to do such resolutions once; This perf cost needs to be solved in
>> either case.
>
> By using a "parse expr with leading qualified name" approach, or with
> something else?
Here's what I have in mind.
-something like "A::" indicates a scope qualifier.
-there's a parser method with a purpose to resolve scope qualifiers, say
"ParseCXXScopeQualifier", it parses them, calls sema actions to resolve
them and returns a CXXScopeTy* from Sema (this is used to pass to sema
actions that will need it).
-ParseCXXScopeQualifier can cache that CXXScopeTy* result, so that when
it is called again for the same token source location, it will return
the cached result without doing any sema resolution at all. It will also
skip the necessary number of tokens (if it was previously called with
"A::B::", it will skip 4 tokens).
Now say that a statement starts with this:
A::B::T a
-The preparser sees that 'A::' is a scope qualifier and calls
ParseCXXScopeQualifier to do its thing. Then calls Parser::IsTypeName.
Both methods cache their results.
-The preparser sees that "A::B::T" is not followed by a '(', backtracks,
and returns "it's a declaration"
-The normal parser sees that 'A::' is a scope qualifier and calls
ParseCXXScopeQualifier which just returns the previously cached result
and skips 4 tokens.
-The normal parser calls Parser::IsTypeName which also just returns the
previously cached result.
-sema resolutions are only done once
What do you think ?
PS: Here's a weird test I did. I used the preparser to see how many
"ambiguous" declarations (of T(...) style) there are in actual C code. I
used a few of GCC files:
gcc.c: declarations in functions: 432 ambiguous: 0
expr.c declarations in functions: 730 ambiguous: 0
combine.c declarations in functions: 564 ambiguous: 0
I think this suggests that it's a really uncommon case.
If anyone wants to give me a preprocessed file for a test, please do!
-Argiris
More information about the cfe-dev
mailing list