[clang-tools-extra] 6a9f79e - [pseudo] Eliminate the type-name identifier ambiguities in the grammar.

Haojian Wu via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 17 05:31:51 PDT 2022


Author: Haojian Wu
Date: 2022-08-17T14:30:53+02:00
New Revision: 6a9f79e1020db9f581d00791f1f644b64facfebe

URL: https://github.com/llvm/llvm-project/commit/6a9f79e1020db9f581d00791f1f644b64facfebe
DIFF: https://github.com/llvm/llvm-project/commit/6a9f79e1020db9f581d00791f1f644b64facfebe.diff

LOG: [pseudo] Eliminate the type-name identifier ambiguities in the grammar.

See https://reviews.llvm.org/D130626 for motivation.

Identifier in the grammar has different categories (type-name, template-name,
namespace-name), they requires semantic information to resolve. This patch is
to eliminate the "local" ambiguities in type-name, and namespace-name, which
gives us a performance boost of the parser:

  - eliminate all different type rules (class-name, enum-name, typedef-name), and
    fold them into a unified type-name, this removes the #1 type-name ambiguity, and
    gives us a big performance boost;
  - remove the namespace-alis rules, as they're hard and uninteresting;

Note that we could eliminate more and gain more performance (like fold template-name,
type-name, namespace together), but at current stage, we'd like keep all existing
categories of the identifier (as they might assist in correlated disambiguation &
keep the representation of important concepts uniform).

| file               |ambiguous nodes |  forest size     | glrParse performance |
|SemaCodeComplete.cpp|  11k -> 5.7K   | 10.4MB -> 7.9MB  | 7.1MB/s -> 9.98MB/s  |
|       AST.cpp      |  1.3k -> 0.73K | 0.99MB -> 0.77MB | 6.7MB/s -> 8.4MB/s   |

Differential Revision: https://reviews.llvm.org/D130747

Added: 
    

Modified: 
    clang-tools-extra/pseudo/lib/cxx/cxx.bnf
    clang-tools-extra/pseudo/test/glr.cpp

Removed: 
    


################################################################################
diff  --git a/clang-tools-extra/pseudo/lib/cxx/cxx.bnf b/clang-tools-extra/pseudo/lib/cxx/cxx.bnf
index bc6599c4e3c44..7221a5086acf5 100644
--- a/clang-tools-extra/pseudo/lib/cxx/cxx.bnf
+++ b/clang-tools-extra/pseudo/lib/cxx/cxx.bnf
@@ -34,14 +34,9 @@ _ := statement-seq
 _ := declaration-seq
 
 # gram.key
-typedef-name := IDENTIFIER
-typedef-name := simple-template-id
+#! we don't distinguish between namespaces and namespace aliases, as it's hard
+#! and uninteresting.
 namespace-name := IDENTIFIER
-namespace-name := namespace-alias
-namespace-alias := IDENTIFIER
-class-name := IDENTIFIER
-class-name := simple-template-id
-enum-name := IDENTIFIER
 template-name := IDENTIFIER
 
 # gram.basic
@@ -391,9 +386,12 @@ builtin-type := INT
 builtin-type := FLOAT
 builtin-type := DOUBLE
 builtin-type := VOID
-type-name := class-name
-type-name := enum-name
-type-name := typedef-name
+#! Unlike C++ standard grammar, we don't distinguish the underlying type (class,
+#! enum, typedef) of the IDENTIFIER, as these ambiguities are "local" and don't
+#! affect the final parse tree. Eliminating them gives a significant performance
+#! boost to the parser.
+type-name := IDENTIFIER
+type-name := simple-template-id
 elaborated-type-specifier := class-key nested-name-specifier_opt IDENTIFIER
 elaborated-type-specifier := class-key simple-template-id
 elaborated-type-specifier := class-key nested-name-specifier TEMPLATE_opt simple-template-id
@@ -551,7 +549,7 @@ private-module-fragment := module-keyword : PRIVATE ; declaration-seq_opt
 class-specifier := class-head { member-specification_opt [recover=Brackets] }
 class-head := class-key class-head-name class-virt-specifier_opt base-clause_opt
 class-head := class-key base-clause_opt
-class-head-name := nested-name-specifier_opt class-name
+class-head-name := nested-name-specifier_opt type-name
 class-virt-specifier := contextual-final
 class-key := CLASS
 class-key := STRUCT

diff  --git a/clang-tools-extra/pseudo/test/glr.cpp b/clang-tools-extra/pseudo/test/glr.cpp
index 221725c6f089f..f805e42ffa6dd 100644
--- a/clang-tools-extra/pseudo/test/glr.cpp
+++ b/clang-tools-extra/pseudo/test/glr.cpp
@@ -12,10 +12,7 @@ void foo() {
 // CHECK-NEXT: │ └─; := tok[8]
 // CHECK-NEXT: └─statement~simple-declaration := decl-specifier-seq init-declarator-list ;
 // CHECK-NEXT:   ├─decl-specifier-seq~simple-type-specifier := <ambiguous>
-// CHECK-NEXT:   │ ├─simple-type-specifier~type-name := <ambiguous>
-// CHECK-NEXT:   │ │ ├─type-name~IDENTIFIER := tok[5]
-// CHECK-NEXT:   │ │ ├─type-name~IDENTIFIER := tok[5]
-// CHECK-NEXT:   │ │ └─type-name~IDENTIFIER := tok[5]
+// CHECK-NEXT:   │ ├─simple-type-specifier~IDENTIFIER := tok[5]
 // CHECK-NEXT:   │ └─simple-type-specifier~IDENTIFIER := tok[5]
 // CHECK-NEXT:   ├─init-declarator-list~ptr-declarator := ptr-operator ptr-declarator
 // CHECK-NEXT:   │ ├─ptr-operator~* := tok[6]
@@ -23,12 +20,11 @@ void foo() {
 // CHECK-NEXT:   └─; := tok[8]
 }
 
-// CHECK:      3 Ambiguous nodes:
+// CHECK:      2 Ambiguous nodes:
 // CHECK-NEXT: 1 simple-type-specifier
 // CHECK-NEXT: 1 statement
-// CHECK-NEXT: 1 type-name
 // CHECK-EMPTY:
 // CHECK-NEXT: 0 Opaque nodes:
 // CHECK-EMPTY:
-// CHECK-NEXT: Ambiguity: 0.40 misparses/token
+// CHECK-NEXT: Ambiguity: 0.20 misparses/token
 // CHECK-NEXT: Unparsed: 0.00%


        


More information about the cfe-commits mailing list