[PATCH] D65591: [AST] Add a flag indicating if any subexpression had errors

Wed Nov 27 11:51:38 PST 2019

rsmith added a comment.

In D65591#1625744 <https://reviews.llvm.org/D65591#1625744>, @aaron.ballman wrote:

> In D65591#1625228 <https://reviews.llvm.org/D65591#1625228>, @ilya-biryukov wrote:
>
> > We can also assume they're cheap, use the visitor-based implementation and later switch if this turn out to be a problem.
> >  I think we should prefer the approach that guarantees the compiler is not getting slow ever rather than chase the slowness later.
>
>
> The problem is: those bits are not infinite and we only have a handful left until bumping the allocation size; is this use case critical enough to use one of those bits?

In the abstract, improving error recovery and toolability are among the more important things we could do, and it certainly seems worth spending a bit on every `Expr` on this to me, if there's no better alternative. (That is: I think this approach is OK if we don't find something better.)

It's not obvious to me whether recomputing this information by AST traversal would be fast enough: *if* we want to perform the "does this contain errors?" check from contexts we encounter frequently (eg, when deciding whether to produce warnings, whether we should skip certain semantic actions, and similar), we can't do that if it requires a traversal.

In D65591#1626969 <https://reviews.llvm.org/D65591#1626969>, @aaron.ballman wrote:

> In D65591#1626638 <https://reviews.llvm.org/D65591#1626638>, @ilya-biryukov wrote:
>
> > Alternatively, we could keep an equivalent of `set<Expr*> InvalidExprs` in `ASTContext`. Gives the same computational complexity, but has a higher constant overhead factor.
> >  Does that look reasonable?
>
>
> Yup, that is also very reasonable.

I think this will be very expensive from a maintenance (and potentially performance) perspective: every constructor for every expression node will need to check whether its subexpressions are in the set, and if so, add themselves to the set.

As a variant on this, we could build a `map<Expr*, bool>` to track whether each expression is known to contain errors, or known to not contain errors, populate it lazily when queried, and shortcut the whole mechanism if we've never created an error node. That still seems a little expensive, but at least we'd only pay the cost for invalid programs. If we could produce a dense numbering of `Expr`s, this'd suddenly seem pretty good, though, and we know that `Expr`s are allocated on the `ASTContext`'s bump allocator, so maybe there's something we can do there...

Another option is to represent "contains an error" with a specific `Type`. This'd mean we'd not preserve type information in less-common cases such as:  `f( T(some-erroneous-expression) )`, and wouldn't be able to diagnose if there's no function `f` that takes a `T`, but I think that's an acceptable loss. The downside is that we would need logic to create representations with the appropriate "contains an error" type throughout `Sema` -- everywhere where we create nodes with dependent types, plus some additional cases like cast expressions that are never type-dependent (but can contain errors).

Here's my current thinking:

- Using dependence to model "contains errors" is a mistake. We should stop doing that.
- We therefore need a different type to indicate "couldn't determine the type of this because it contains errors", and should add a new builtin type for that.
- In cases like `int( thing-containing-errors )`, it's probably completely fine that we can't determine whether the expression contains errors, and we shouldn't provide an interface to ask that question. (The current flag bit doesn't help with that anyway, since the errors could be nested indirectly, say inside a lambda, and we don't propagate the flag through that.)

So my proposal would be: stop trying to track whether expressions contain errors. Instead, add a builtin type to indicate whether the type of an expression couldn't be determined due to an error, and propagate that in the same way we propagate type-dependence. Would that address the use cases here?

> I'm not certain there are real problems there, but I am wondering whether something like a BMI should include serialized error AST nodes or not. Would a consumer of the module expect that? I suppose it could if we wanted it to.

We already do preserve invalid AST through AST files if we're asked to do so; this isn't new. (For example, the `Invalid` flag on `Decl`s is preserved.) This is important for some tooling scenarios: you want to be able to produce and use a precompiled preamble even if it contains errors.

> Ah, drat, I was hoping we had at least one test that already did this, but I don't see one.

Maybe we should introduce a `__builtin_dump(expr)` function that dumps its operand at the point where semantic analysis is first done? That'd be useful for other things too (`#pragma clang __debug dump X` is a little unwieldy).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65591/new/

https://reviews.llvm.org/D65591