[PATCH] D61722: [AST] Add RecoveryExpr; produce it on overload resolution failure and missing member.

Mon May 20 02:57:53 PDT 2019

sammccall added subscribers: gribozavr, klimek.
sammccall added a comment.

In D61722#1508050 <https://reviews.llvm.org/D61722#1508050>, @rsmith wrote:

> The simplest thing to do might be to ensure that `vector<ErrorTy>` is itself `ErrorTy`.

You're probably right, I'll try this.

> For example, the constant evaluator will want to quickly bail out of evaluating expressions and statements containing errors

Thinking more - how important is this? These expressions are expected to be rare.
Discussing offline with @gribozavr, it seemed likely we could get away with the "errorness" of an expr being a local property, which would be simplifying. (Whereas the errorness of a type needs to be a transitive one).

> What kinds of cases are you thinking of that would be in your second category?

Incomplete code, either new code or during refactoring

- not enough arguments
- symbol not found due to missing #include
- member not added yet
- function not defined yet

In an IDE context, it's *much* more important that diagnostics are useful and AST nodes aren't discarded, than that recovery correctly "fixes" the AST to have the intended semantics.

>From that point of view, typo recovery/TypoExpr spends too much complexity budget on accurate recovery, rather than generality. RecoveryExpr is a different part of the design space for sure.

> I would assume that we would represent a function call for which the callee or an argument contains an error as a `CallExpr` with the "contains error" flag set and with `ErrorTy` as its type.

Almost:

- if the argument has an error, but its type is still known, then overload resolution runs as normal and this CallExpr doesn't itself have an error. (If there's a **transitive** HasError bit on Expr, it would be set).
- if the argument has an error and has ErrorTy, overload resolution may still succeed (depending on the rules we choose for ErrorTy). In which case again, the CallExpr itself has no error.
- if the argument has an error and has ErrorTy, and this causes overload resolution to fail, but all (best) overloads have the same type, we get a RecoveryExpr/CallExpr-with-error with a real type
- if the argument has an error and has ErrorTy, and this causes overload resolution to fail, and the type is unknown, then we get a RecoveryExpr/CallExpr-with-error with ErrorTy.

> Could we use the same representation here? (That is, represent this case as a `CallExpr` whose type is `ErrorTy` with the "contains error" flag set, with no `RecoveryExpr`.)

We can, but it breaks a lot of client code. These two cases are pretty different for clients:

- for an otherwise-valid CallExpr where one argument has errors, all the *local* invariants are preserved.
- for a CallExpr where callee is null, or the #args don't match the function, the client code often crashed or did the wrong thing.

My view is that we're fundamentally better having two types to keep the guarantees around `CallExpr` stronger. That we want matchers to "match through" RecoveryExpr, but callExpr shouldn't match here.
@ilya-biryukov disagreed and thought the main virtue of `RecoveryExpr` is backwards-compatibility.
Either way, I'm not optimistic about being able to get recovery changes to stick under this model. I'm willing to try if you feel strongly.

>> access of a nonexistent member Base expression is captured, type is never available.
> 
> Using `UnresolvedMemberExpr` for this case might make sense. I think we should think about how we'd represent a member access that can't be fully analyzed because the object expression contains errors, and think about whether it makes sense to use that same representation for the case where the error is immediately in forming the member access itself. (And similarly across all other kinds of recovery expression we might create.)

I agree it's important to be consistent here. It's hard work to be consistent and also precise!
Using RecoveryExpr for both cases seems sufficient to me, and one of its advantages is that it avoids having to make lots of hard decisions one-by-one, many of which will be wrong :-)

But that aside, I agree the best way to model that in the existing AST is make `UnresolvedMemberExpr` available in C, allow it to have no candidate decls, etc.
For the object expression, the error can be nested arbitrarily deep and I think we just have to apply whatever usual "subexpression-has-errors" strategy to `UnresolvedMemberExpr` and `MemberExpr`.

> If we want to enable (eg) AST matchers to operate on erroneous AST, using a homogeneous `RecoveryExpr` for all different syntaxes that originate errors seems like it would introduce complexity, especially if all consumers of the AST already need to deal with the case that some existing AST node is marked as being erroneous so don't get any benefit from having a concrete `RecoveryExpr` as a known stopping point. (If you get there, you've probably already gone too far.)

Based on experiments, I think the desired behavior for ASTMatchers is that `callExpr()` wouldn't match a broken call, but descendant/ancestor matching would still traverse "through" broken calls.
This gets most of the possible benefit for little cost: matchers can "for free" match the rest of the expression tree that gets dropped today, but e.g. existing clang-tidy checks won't bind directly to the broken parts of the AST.
Matchers/options to opt-in can be added. e.g. `brokenCallExpr(...)` or `recoveryExpr(attemptedExpr(CallExpr), ...)` as the case may be.

I thought @klimek was on board with this approach, but he can correct me if not...

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61722/new/

https://reviews.llvm.org/D61722