[PATCH] D130337: [pseudo] Eliminate multiple-specified-types ambiguities using guards

Fri Jul 22 06:11:54 PDT 2022

sammccall marked an inline comment as done.
sammccall added a comment.

In D130337#3671559 <https://reviews.llvm.org/D130337#3671559>, @hokein wrote:

> Thanks, the change looks good in general.
>
>> we handle *all* rules for the interesting node types explicitly, rather than default: return false. This allows us to assert that all cases are handled, so things don't "fall through the cracks" after grammar changes. Alternative suggestions welcome. (I have a feeling this "exhaustive pattern-matching" idea will come up again...)
>
> The dangerous bit is that now we will crash at the runtime if the parsing code triggers a missing case.

Yeah, grammar changes make this brittle.
Still, hopefully we canary our releases...

> Yeah, I hit this problem in the previous function-declarator patch. One approach will be to define an enum for each nonterminal `enum Nonterminal { rhs0_rhs1 }`, rather than put all rules into a single enum. It is easier to enumerate all values for a single nonterminal (I think this is the common case)

Well, I don't think it's the most common (vs just targeting a rule or two) but certainly we never enumerate *all* the rules!
Interesting idea.

I'm not sure it'd be worth adding control-flow here to split up the switches by rule type though, switches are pretty heavyweight syntactically.

> In D130337#3671159 <https://reviews.llvm.org/D130337#3671159>, @sammccall wrote:
>
>> FWIW, as-is with no caching, this is a ~2% slowdown on my machine (5.82 -> 5.72 MB/s on SemaCodeComplete.cpp).
>
> Actually, the number looks quite good to me (I expected that there will be a significant slowdown without caching). I think we can check in the current version, and add the caching stuff afterwards.

I'm not so happy...

The relevant baseline here IMO is the +7% version, as we're clearly currently paying ~8% cost to build all the silly incorrect parses, and that cost is artificial.
So 9% overall slowdown now, and still 5% with the cache, feels significant. But we can change this later.

>> Whereas D130150 <https://reviews.llvm.org/D130150> using the grammar is a a 7% speedup (5.82 -> 6.22), so roughly an 9% performance difference between the approaches.
>
> Yeah, because the grammar-based approach reduces the number of ambiguous node in the forest.

*Both* approaches do that, which shows how slow the guard version is :-(

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130337/new/

https://reviews.llvm.org/D130337