[PATCH] D130460: [pseudo] Add recovery for declarations

Haojian Wu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 28 06:37:19 PDT 2022


hokein added inline comments.


================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:267
 
+Token::Index recoverNextDeclaration(const RecoveryParams &P) {
+  if (P.Begin == P.Tokens.tokens().size())
----------------
I think this excludes the parameter-declaration (from the grammar spec, it is a different nonterminal).


================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:267
 
+Token::Index recoverNextDeclaration(const RecoveryParams &P) {
+  if (P.Begin == P.Tokens.tokens().size())
----------------
hokein wrote:
> I think this excludes the parameter-declaration (from the grammar spec, it is a different nonterminal).
The logic of implementations looks very solid. I think it would be good to have some unittests for it. 


================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:282
+  //  - if the braces are missing, we don't know where to recover
+  // So We only try to detect the others.
+  bool AcceptBraces = false;
----------------
if I read the code correctly, the following case should work like

```
void foo() {}

LLVM_DISCARD int bar() {} // this is an opaque `declaration` node?

void foo2() {} // it won't work if we change void -> Foo.
```


================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:291
+  // they appear at the beginning of a line.
+  auto LikelyStartsDeclaration = [&](tok::TokenKind K) -> bool {
+    switch (K) {
----------------
the list looks reasonable -- I think we can lookup the `Follow(declaration)` set to find more.


================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:301
+    case tok::kw_union:
+    // Types & other decl-specifiers
+    case tok::kw_void:
----------------
maybe consider `tok::identifier` (if the identifier is a type specifier, it is a good indicator of a declaration starting) as well? And yeah, we're less certain about it, but since we use the indentation information, I guess it might probably work well?


================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:352
+    }
+
+    // Semicolons terminate statements.
----------------
I think we miss to check the return index >= Cursor here.


================
Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:313
-declaration-seq := declaration
-declaration-seq := declaration-seq declaration
 declaration := block-declaration
----------------
Just want to spell out, this causes a performance hit :(
(a pure change of this has a ~6% slowdown, 7.17 MB/s -> 6.7MB/s on SemaCodeComplete.cpp).

left-cursive grammar is more friendly to LR-like parsers, because the parser can do the reduction earlier (after a parsed a `declaration`, we can perform a `declaration-seq` reduce); while right-cursive grammar, the reduction starts from the last declaration (this means our GSS is more deeper).

> change decl-sequence to right-recursive to simplify this
I remembered we discussed this bit before, but I don't really remember the details now :(



================
Comment at: clang-tools-extra/pseudo/test/cxx/recovery-declarations.cpp:6
+
+int x[0];
+// CHECK:      simple-declaration := decl-specifier-seq init-declarator-list ;
----------------
nit: add some tests to cover the initializer, e.g. 

```
auto lambda = [] xxxx () xxx {};
```

 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130460/new/

https://reviews.llvm.org/D130460



More information about the cfe-commits mailing list