[cfe-dev] [PATCH] C++ decl/expr ambiguity resolution approach

Sun Aug 24 16:28:44 PDT 2008

On Aug 24, 2008, at 3:29 PM, Argiris Kirtzidis wrote:

> Chris Lattner wrote:
>>
>> Activating backtracking isn't hugely expensive, but it certainly  
>> isn't free.  Also, scanning ahead and buffering tokens certainly  
>> isn't free, so we should avoid it when possible.  One of the costs  
>> is the Sema/Type resolution that has to happen in the preparser.   
>> For example, silly things like:
>>
>>  std::vector<bool>::iterator ...
>>
>> require the preparser to do template specialization etc to look up  
>> what "iterator" in vector<bool> is.  Immediately after the  
>> preparser decides that it is a type, the decl parser kicks in and  
>> has to do all the same resolution stuff.
>
> The most efficient approach is to do template specialization and  
> other resolution stuff once, right ?

Right, of course.

> Even with the "tentatively parse as a decl" approach you'd prefer to  
> not repeat resolutions when going for "expression parsing".

If we only do tentative parsing when the result is likely to end up  
being a decl, and if it does end up being a decl, there is no  
reanalysis.  The trick is to not tentatively parse when it is likely  
to be an expr.

>> This means that every variable definition will require starting  
>> backtrack bookkeeping, doing a preparse (even if not very far  
>> forward) then then deciding "yep, it's a decl", backtracking, and  
>> then reparsing as a decl.  This seems like a pretty significant  
>> cost to pay for these common cases, and I think the "speculatively  
>> parse as a decl if ambiguous and back off later" approach is better.
>
> I think that you have a misunderstanding about what the ambiguous  
> cases are.

I'm sure I do :).  The part of your patch that freaks me out is this:

+  default: {
+    bool isDeclaration = false;
      // If we have an identifier at the top level statement...
+    if (!OnlyStatement) {
+      TentativeParsingResult TPR = isDeclarationStatement();

+  TentativeParsingResult isDeclarationStatement() {
+    if (getLang().CPlusPlus)
+      return isCXXDeclarationStatement();
+    return isDeclarationSpecifier() ? TPR_true : TPR_false;
+  }

This means that (if I understand correctly) your current patch uses  
the preparser for tons of cases, including the "x = 4" and "func(4)"  
cases.  It certainly isn't reading types and qualified identifiers  
before deciding.

> The ambiguous cases are those where a type is followed by a '('.  
> These cases:
> int X = ...
> Value *V = ..
>
> and the vast majority of declarations are not ambiguous at all, so  
> no preparsing, backtracking etc. is needed for them, just a one- 
> token lookahead.
> int X =  // not ambiguous
> int (X) = // ambiguous
> const int (X) // not ambiguous because it starts with 'const'

Ok.

> So your assumption that "almost all of the ambiguous cases will end  
> up being declarations" is not true, in fact, if I had to guess, I'd  
> say that the balance would probably lean towards the expression side.
> Most of the declarations that are of the T(..) variety, in practice,  
> are function pointer declarations and these are not that common  
> inside functions.

Ok, so my objection is really to the current implementation, not the  
approach :)

>> e qualified type *before* making a decision about whether it is a  
>> statement or a decl.  This makes the logic a little more complex  
>> (we need to bring back the "parse expr with leading identifier"  
>> logic) but it would very neatly solve this problem in just about  
>> every real case, and it would mean that we aren't re-resolving and  
>> sema-ing types in any of the common cases here.
>
> The "having the parser cache sema resolutions" solves this.

If the plan of record is to eat the type in the unconditional part of  
the parser, then this is a non-issue.

>> Argiris, what do you think of the "tentatively parse as a decl"  
>> approach?
>
> As previously said, the preparser approach is non-intrusive and  
> easily replaceable. We can go with a "tentatively parse as a decl"  
> approach and muddle up the Parser+Sema for this C++ corner case (at  
> the expense of the other languages) but we wouldn't know if it was  
> really worth it. If we had the preparser approach in place we would  
> be able to compare against it and see that it's a clear performance  
> benefit or not. Or we could run against actual codebases and see  
> what the ambiguous statements resolve to.

I agree that going with the preparser approach is preferable...  
particularly if it doesn't kick in very often.  Are you really  
agreeing to the "parse and sema qualified names and types before  
deciding whether it is a decl or expr" approach?

If so, I think we're on the same page and the preparser idea works for  
me, if not, I'm really confused (again) :)

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20080824/3f66a9b7/attachment.html>