[PATCH] D130460: [pseudo] Add recovery for declarations

Haojian Wu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 22 02:03:33 PDT 2022


hokein added a comment.

I think the current version should be good to review, some bits need to take care of:

- left-recursive vs right-recursive of declaration-seq rule, see my other comments about it. Currently I keep it as-is to avoid the performance regression on large files (I still don't see your point how right-recursive can simplify the error recovery, it seems to me either can work at least for this case)
- recoveryNextDeclaration will consume at least 1 token (the return token index > Cursor) to avoid the risk of running in an infinite loop
- I keep the original implementation as-is (except fixing some bugs), but IMO `T->Indent == OrigIndent && LikelyStartsDeclaration(T->Kind)` is too strict (see my FIXME testcases), and we should probably include the IDENTIFIER (I believe it is a common case), `LikelyStartsDeclaration(T->Kind) || (T->Kind = IDENTIFIER && T->Indent == OrigIndent)` might be better.



================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:291
+  // they appear at the beginning of a line.
+  auto LikelyStartsDeclaration = [&](tok::TokenKind K) -> bool {
+    switch (K) {
----------------
hokein wrote:
> the list looks reasonable -- I think we can lookup the `Follow(declaration)` set to find more.
added more: kw_friend, kw_inline, kw_explicit, kw_virtual.


================
Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:313
-declaration-seq := declaration
-declaration-seq := declaration-seq declaration
 declaration := block-declaration
----------------
hokein wrote:
> Just want to spell out, this causes a performance hit :(
> (a pure change of this has a ~6% slowdown, 7.17 MB/s -> 6.7MB/s on SemaCodeComplete.cpp).
> 
> left-cursive grammar is more friendly to LR-like parsers, because the parser can do the reduction earlier (after a parsed a `declaration`, we can perform a `declaration-seq` reduce); while right-cursive grammar, the reduction starts from the last declaration (this means our GSS is more deeper).
> 
> > change decl-sequence to right-recursive to simplify this
> I remembered we discussed this bit before, but I don't really remember the details now :(
> 
Got more data, it does hurt the performance on large files:

SemaExpr.cpp: 9.54MB/s -> 7.98MB/s
SemaDeclCXX.cpp: 10.16MB/s -> 9.25MB/s

small/medium files doesn't seem to be affected (hypothesis: these files has less declarations than large files, thus being less affected by the right-recursive declaration-seq grammar rule)

Hover.cpp: 8MB/s -> 7.92MB/s
ASTSignals.cpp: 12.34M/s -> 12.29MB/s



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130460/new/

https://reviews.llvm.org/D130460



More information about the cfe-commits mailing list