[clangd-dev] RFC: add RecoveryExpr to represent invalid constructs in AST

Thu May 9 04:31:02 PDT 2019

In the presence of broken code, the AST clang generates is often very
limited.
Often Sema diagnoses errors and doesn't create AST nodes. For example, if
the arguments to a function call are wrong, we get no CallExpr, and all
subexpressions are discarded.
IDE tools like clangd see a lot of broken code, and rely on the resulting
AST, so I'd like to improve this situation.

If the user writes the following erroneous code:
  a(b, c).d()  // error: a only takes one parameter
then no AST is emitted for this expression, and:
1) clangd's "go to definition" doesn't work on a, even though there's no
overloading
2) "go to definition" also doesn't work on b and c or their subexpressions,
even though these expressions are valid
3) code completion doesn't work properly for d(), even though we know what
type a() returns. (go to definition doesn't either).
There are lots of similar cases: (foo->bar when bar doesn't exist, etc).

My proposal is to add a fairly generic RecoveryExpr that the AST can emit
in these situations. It would capture:
 - the approximate StmtClass we were trying to create (e.g. CallExpr)
 - the subexpressions - (e.g. an overloadexpr for the callee, and the
arguments)
 - the type, if we can make a reasonable guess (e.g. all overloads have the
same type)

I also considered other approaches, and think they don't work so well:
 - just generate the e.g. CallExpr etc anyway, and maybe add an "invalid"
flag. This means weakening many invariants in many different node types.
It's subtle and error-prone, and adds conceptual and implementation
complexity to many nodes that now have to be able to represent various
error cases. Even within clang it's hard to find all the analyses that need
to be updated.  (
http://lists.llvm.org/pipermail/cfe-dev/2019-April/062151.html)
 - extract the extra information we need from Sema, rather than the AST.
e.g. Sema should invoke a callback whenever an identifier is resolved to a
Decl. This solves go-to-definition pretty well, but isn't a very flexible
approach. The AST is rich, and provides context, can be dumped etc, a Sema
callback would be an ad-hoc solution that is less convenient to use and
understand.
 - add a hierarchy of new nodes to represent errors (RecoveryCallExpr etc).
It seems like a lot of work to model all the different ways code can be
invalid, and I don't think the extra structure would be useful enough to
justify it.

I have a very rough prototype of this idea:
https://reviews.llvm.org/D61722 (emits
RecoveryExpr for failed overload resolution).
It does create a form of recovery that affects diagnostics. But the changes
exposed by tests are few and seem mostly good (especially compared to
emitting CallExpr).
Some open questions:
 - what's the type of the expression when we can't guess? (IntTy in the
patch)
 - will there be surprises if RecoveryExpr is emitted in more places and
they start to interact?
 - should RecursiveASTVisitor, ASTMatchers etc skip these nodes by default,
for compatibility with external code?
 - what are the important use cases that I haven't thought about?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20190509/067a4094/attachment.html>