[cfe-dev] [PATCH] C++ decl/expr ambiguity resolution approach

Sun Aug 24 15:11:26 PDT 2008

On Aug 24, 2008, at 8:19 AM, Argiris Kirtzidis wrote:
>> To me, these are pretty big drawbacks, and I want to make sure  
>> we're all on the same page and agree that this is the right  
>> approach before we go much farther.
>
> About maintenance:
> I'd say that having separate "disambiguation parsing" offers  
> flexibility. For example, if we decide that disambiguating comma- 
> separated declarators as comma-separated expressions is almost  
> always not what the programmer intended, and it leads to less  
> confusing errors to "shortcut" to a declaration by the first  
> declarator, we can do that easily by changing a couple of lines.We  
> can have a fine-grained control over disambiguation.

I think there are two separate future costs, which are important to  
consider.  Once all of c++ is implemented (!) we get to maintenance  
cost, which is the biggest piece of the "software cost" puzzle. For  
example, one cost is paid when you want to make an extension to the  
grammar (e.g. C++'0x features).  I think that having two parsers  
running around is actually worse for this sort of thing, because it  
means understanding and maintaining two separate parsers, and making  
sure they agree on everything.

The decision of how to handle comma isn't really in the same category,  
because it is a bug, not an extension point for future change.

OTOH, I agree with you that the lookahead parser is very simple and  
the cost of understanding it and keeping it in sync with the rest of  
the parser is probably not too bad.  We still have the efficiency  
issues though, see the other email :)

>> However, the advantage of this approach is that it is much faster  
>> than doing a pre-parse when things turn out to actually be a  
>> declaration.  Since many statement cases get filtered out before  
>> reaching the tentative case, I think this is a win.  Additionally,  
>> this does not require duplicating parsing code for a bunch of  
>> grammar productions, which is my primary objection to your approach.
>
> With that approach, we will make big architectural changes just to  
> deal with a very uncommon situation. I really don't think the  
> tradeoffs worth it.

Which architectural changes?

> I don't have benchmarks, but I don't think there's such a big  
> performance cost with the 'special parser' approach:
>
> --The situation where disambiguation is required is uncommon.

If that's true, then it doesn't matter which solution we pick :)

> --We can do shortcuts, like disambiguating to a declaration by the  
> first declarator (GCC style), or when the first declarator has a '='  
> initializer.

You can do the same thing with tentative parsing if desired.

> --Lexer does lexing *only once*. The tokens are cached.

Right.

> --The "disambiguation parser" is a tight "state-machine", no fancy  
> stuff except type checking.

Except that type checking pulls in a ton of sema, e.g. to handle  
std::vector<bool>::iterator.

> --The only architectural change that I propose is caching type- 
> checks which:
>      1) would also be needed for a "normal parser with roll-back"
>      2) is useful in general; even on C there are multiple type- 
> checks (i.e isDeclarationSpecifier calls)

I don't have a mental model for what this would entail, so I'm not  
sure if it would be clean or not.

> In general, the "special disambiguation parser" is a very  
> unintrusive approach. If later on, using benchmarks, we determine  
> that it has unacceptable performance cost, we can easily replace it  
> with a more sophisticated "normal parser with rollback" approach.

I agree, and the best part of the disambiguation parser is that it is  
completely separated from the rest of the parser.  This means it  
doesn't make the rest of the parser incrementally more complicated.   
OTOH, I strongly believe that the time to make this decision is now,  
before we have template instantiation and other complex stuff.   
Changing approaches will be much harder later.

-Chris