<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Aug 24, 2008, at 3:29 PM, Argiris Kirtzidis wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Chris Lattner wrote:<br><blockquote type="cite"><br></blockquote><blockquote type="cite">Activating backtracking isn't hugely expensive, but it certainly isn't free.  Also, scanning ahead and buffering tokens certainly isn't free, so we should avoid it when possible.  One of the costs is the Sema/Type resolution that has to happen in the preparser.  For example, silly things like:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">  std::vector<bool>::iterator ...<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">require the preparser to do template specialization etc to look up what "iterator" in vector<bool> is.  Immediately after the preparser decides that it is a type, the decl parser kicks in and has to do all the same resolution stuff.<br></blockquote><br>The most efficient approach is to do template specialization and other resolution stuff once, right ?</blockquote><div><br class="webkit-block-placeholder"></div><div>Right, of course.</div><br><blockquote type="cite">Even with the "tentatively parse as a decl" approach you'd prefer to not repeat resolutions when going for "expression parsing".</blockquote><div><br class="webkit-block-placeholder"></div><div>If we only do tentative parsing when the result is likely to end up being a decl, and if it does end up being a decl, there is no reanalysis.  The trick is to not tentatively parse when it is likely to be an expr.</div><div><br class="webkit-block-placeholder"></div><blockquote type="cite"><blockquote type="cite">This means that every variable definition will require starting backtrack bookkeeping, doing a preparse (even if not very far forward) then then deciding "yep, it's a decl", backtracking, and then reparsing as a decl.  This seems like a pretty significant cost to pay for these common cases, and I think the "speculatively parse as a decl if ambiguous and back off later" approach is better.<br></blockquote><br>I think that you have a misunderstanding about what the ambiguous cases are.</blockquote><div><br class="webkit-block-placeholder"></div><div>I'm sure I do :).  The part of your patch that freaks me out is this:</div><div><br class="webkit-block-placeholder"></div><div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+  default: {</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+    bool isDeclaration = false;</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">     // If we have an identifier at the top level statement...</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+    if (!OnlyStatement) {</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+      TentativeParsingResult TPR = isDeclarationStatement();</div><div><font class="Apple-style-span" face="Monaco" size="2"><span class="Apple-style-span" style="font-size: 10px;"><br class="webkit-block-placeholder"></span></font></div>

</div><div><br class="webkit-block-placeholder"></div><div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+  TentativeParsingResult isDeclarationStatement() {</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+    if (getLang().CPlusPlus)</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+      return isCXXDeclarationStatement();</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+    return isDeclarationSpecifier() ? TPR_true : TPR_false;</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">+  }</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; "><br class="webkit-block-placeholder"></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 10px/normal Monaco; ">This means that (if I understand correctly) your current patch uses the preparser for tons of cases, including the "x = 4" and "func(4)" cases.  It certainly isn't reading types and qualified identifiers before deciding.</div><div><br></div></div><blockquote type="cite">The ambiguous cases are those where a type is followed by a '('. These cases:<br> int X = ...<br> Value *V = ..<br><br>and the vast majority of declarations are not ambiguous at all, so no preparsing, backtracking etc. is needed for them, just a one-token lookahead.<br>int X =  // not ambiguous<br>int (X) = // ambiguous<br>const int (X) // not ambiguous because it starts with 'const'</blockquote><div><br class="webkit-block-placeholder"></div>Ok.</div><div><br class="webkit-block-placeholder"></div><div><blockquote type="cite">So your assumption that "almost all of the ambiguous cases will end up being declarations" is not true, in fact, if I had to guess, I'd say that the balance would probably lean towards the expression side.<br>Most of the declarations that are of the T(..) variety, in practice, are function pointer declarations and these are not that common inside functions.</blockquote><div><br class="webkit-block-placeholder"></div><div>Ok, so my objection is really to the current implementation, not the approach :)</div><div><br class="webkit-block-placeholder"></div><blockquote type="cite"><blockquote type="cite">e qualified type *before* making a decision about whether it is a statement or a decl.  This makes the logic a little more complex (we need to bring back the "parse expr with leading identifier" logic) but it would very neatly solve this problem in just about every real case, and it would mean that we aren't re-resolving and sema-ing types in any of the common cases here.<br></blockquote><br>The "having the parser cache sema resolutions" solves this.</blockquote><div><br class="webkit-block-placeholder"></div><div>If the plan of record is to eat the type in the unconditional part of the parser, then this is a non-issue.</div><br><blockquote type="cite"><blockquote type="cite">Argiris, what do you think of the "tentatively parse as a decl" approach?<br></blockquote><br>As previously said, the preparser approach is non-intrusive and easily replaceable. We can go with a "tentatively parse as a decl" approach and muddle up the Parser+Sema for this C++ corner case (at the expense of the other languages) but we wouldn't know if it was really worth it. If we had the preparser approach in place we would be able to compare against it and see that it's a clear performance benefit or not. Or we could run against actual codebases and see what the ambiguous statements resolve to.<br></blockquote><br></div><div>I agree that going with the preparser approach is preferable... particularly if it doesn't kick in very often.  Are you really agreeing to the "parse and sema qualified names and types before deciding whether it is a decl or expr" approach?</div><div><br class="webkit-block-placeholder"></div><div>If so, I think we're on the same page and the preparser idea works for me, if not, I'm really confused (again) :)</div><div><br class="webkit-block-placeholder"></div><div>-Chris</div><div><br class="webkit-block-placeholder"></div></body></html>