[PATCH] D130337: [pseudo] Eliminate multiple-specified-types ambiguities using guards
Haojian Wu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Jul 22 05:45:09 PDT 2022
hokein added a comment.
Thanks, the change looks good in general.
> we handle *all* rules for the interesting node types explicitly, rather than default: return false. This allows us to assert that all cases are handled, so things don't "fall through the cracks" after grammar changes. Alternative suggestions welcome. (I have a feeling this "exhaustive pattern-matching" idea will come up again...)
The dangerous bit is that now we will crash at the runtime if the parsing code triggers a missing case.
Yeah, I hit this problem in the previous function-declarator patch. One approach will be to define an enum for each nonterminal `enum Nonterminal { rhs0_rhs1 }`, rather than put all rules into a single enum. It is easier to enumerate all values for a single nonterminal (I think this is the common case)
namespace cxx {
namespace rule {
enum simple_type_specifier {
builtin_type0,
nested_name_specifier0_type_name1,
...
}
}
}
In D130337#3671159 <https://reviews.llvm.org/D130337#3671159>, @sammccall wrote:
> FWIW, as-is with no caching, this is a ~2% slowdown on my machine (5.82 -> 5.72 MB/s on SemaCodeComplete.cpp).
Actually, the number looks quite good to me (I expected that there will be a significant slowdown without caching). I think we can check in the current version, and add the caching stuff afterwards.
> Whereas D130150 <https://reviews.llvm.org/D130150> using the grammar is a a 7% speedup (5.82 -> 6.22), so roughly an 9% performance difference between the approaches.
Yeah, because the grammar-based approach reduces the number of ambiguous node in the forest.
================
Comment at: clang-tools-extra/pseudo/lib/cxx/CXX.cpp:284
+ defining_type_specifier_seq_0defining_type_specifier_1defining_type_specifier_seq,
+ GUARD(!hasExclusiveType(P.RHS.front()) ||
+ !hasExclusiveType(P.RHS.back()))},
----------------
nit: I would suggest using the index explicitly `P.RHS[0]`, `P.RHS[1]`, it increases the readability (the rul name encoding the index, easier to spot the corresponding element).
================
Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:353
decl-specifier-seq := decl-specifier
-decl-specifier-seq := decl-specifier decl-specifier-seq
+decl-specifier-seq := decl-specifier decl-specifier-seq [guard]
storage-class-specifier := STATIC
----------------
offtopic comment: The sad bit of the `RuleID` approach (vs `guard=SingleType`) is that we don't really know what kind of the guard is by reading the grammar file only.
I think this is critical information, and worth to keep in the grammar file. (ideas: add comments, or bring the `guard=SingleType` in the grammar again, but we ignore the `guard` value in the grammar parsing).
================
Comment at: clang-tools-extra/pseudo/lib/cxx/cxx.bnf:385
+builtin-type := BOOL
simple-type-specifier := SHORT
+builtin-type := INT
----------------
I think the reason to leave SHORT/LONG/SIGNED UNSIGNED as-is is that they can combined with other type (e.g. short int).
Can we group them together, and add a comment?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130337/new/
https://reviews.llvm.org/D130337
More information about the cfe-commits
mailing list