[cfe-dev] [clangd-dev] RFC: add RecoveryExpr to represent invalid constructs in AST

Fri May 10 01:01:49 PDT 2019

Thanks, Jan. I'm not familiar with all of the relevant parts of
Parser/Sema, so these are useful pointers.

On Thu, May 9, 2019 at 9:32 PM Jan Korous <jkorous at apple.com> wrote:

> Hi Sam,
>
> This seems really useful indeed.
>
> I am just wondering about how this would play with speculative actions.
>
> I am more familiar with speculative stuff in Parser
> (peculativeEvaluationRAII, TentativeParsingAction,
> Parser::TryParseLambdaIntroducer
>
I'm not sure if we'd want to use this to represent errors at the parser
level, I don't know enough yet about the parser to really have an opinion.

> but it seems that sometimes we also use the same approach in Sema:
> - SFINAETrap (Which IIUC isn't used just for SFINAE - for example in
> RebuildForRangeWithDereference.)
>
SFINAETrap is diagnostic-based, so we need to ensure when emitting
RecoveryExpr:
 - we must also emit a diagnostic that causes substitution failure
 - we must not cause a diagnostic that is emitted even on substitution
failure
Other than that, I expect this to interact OK: when substitution fails, the
subtrees with RecoveryExprs will be discarded.
It may hurt the *performance* of substitution failure though, as we create
more nodes rather than just cascading the failure immediately.
(RebuildForRangeWithDereference is indeed not SFINAE but should work the
same way)

- ContextRAII
>
I'm not familiar with this class or how it's used for speculative actions -
can you elaborate?

> - TypoExpr
>
These are somewhat complementary in that TypoExpr is used when an
identifier can't be resolved, and RecoveryExpr is useful to preserve
subexpressions that *can* be resolved.

That said there are cases where these intersect like
`invalid_function(valid_parameter)` - it would be nice to generate
something sensible here: a RecoveryExpr(CallExpr) where invalid_function is
dropped/represented by a TypoExpr/represented by a
RecoveryExpr(DeclRefExpr)...
I'm not sure exactly how this should work mechanically, and also don't
think it needs to work immediately.

> Any thoughts on that?
>
> Cheers,
>
> Jan
>
> On May 9, 2019, at 8:13 AM, Tom Honermann via clangd-dev <
> clangd-dev at lists.llvm.org> wrote:
>
> This would be helpful.  Within Coverity, we hacked together our own error
> recovery mechanism (mostly involving demoting errors to warnings and
> discarding function bodies and variable initializers with errors), but our
> mechanism sometimes leads to other down stream errors (I investigated a
> spurious "enumerator 'x' does not exist in instantiation of 'y' just this
> week).  The ability to explicitly indicate errors in the AST would be
> helpful.
>
> Ideally, I'd like to have ErrorDecl, ErrorStmt, ErrorExpr, and ErrorType
> nodes.  But there is no reason they would all have to be introduced at the
> same time.
>
> Tom.
> On 5/9/2019 7:31 AM, Sam McCall via cfe-dev wrote:
>
> In the presence of broken code, the AST clang generates is often very
> limited.
> Often Sema diagnoses errors and doesn't create AST nodes. For example, if
> the arguments to a function call are wrong, we get no CallExpr, and all
> subexpressions are discarded.
> IDE tools like clangd see a lot of broken code, and rely on the resulting
> AST, so I'd like to improve this situation.
>
> If the user writes the following erroneous code:
>   a(b, c).d()  // error: a only takes one parameter
> then no AST is emitted for this expression, and:
> 1) clangd's "go to definition" doesn't work on a, even though there's no
> overloading
> 2) "go to definition" also doesn't work on b and c or their
> subexpressions, even though these expressions are valid
> 3) code completion doesn't work properly for d(), even though we know what
> type a() returns. (go to definition doesn't either).
> There are lots of similar cases: (foo->bar when bar doesn't exist, etc).
>
> My proposal is to add a fairly generic RecoveryExpr that the AST can emit
> in these situations. It would capture:
>  - the approximate StmtClass we were trying to create (e.g. CallExpr)
>  - the subexpressions - (e.g. an overloadexpr for the callee, and the
> arguments)
>  - the type, if we can make a reasonable guess (e.g. all overloads have
> the same type)
>
> I also considered other approaches, and think they don't work so well:
>  - just generate the e.g. CallExpr etc anyway, and maybe add an "invalid"
> flag. This means weakening many invariants in many different node types.
> It's subtle and error-prone, and adds conceptual and implementation
> complexity to many nodes that now have to be able to represent various
> error cases. Even within clang it's hard to find all the analyses that need
> to be updated.  (
> http://lists.llvm.org/pipermail/cfe-dev/2019-April/062151.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2019-2DApril_062151.html&d=DwMFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=oOJsFtgkFCCcDlfLWKmfrVDtpXLihXJTBGJfSapXCV0&m=k3m3ocpqnJDQ1DBLtX6DNqMbG51U4RJ9VyH8qj2SfSc&s=jddDcS9V5Twt16Gxg9n1WTXzsxqOYx-KQIOIvXEovEc&e=>
> )
>  - extract the extra information we need from Sema, rather than the AST.
> e.g. Sema should invoke a callback whenever an identifier is resolved to a
> Decl. This solves go-to-definition pretty well, but isn't a very flexible
> approach. The AST is rich, and provides context, can be dumped etc, a Sema
> callback would be an ad-hoc solution that is less convenient to use and
> understand.
>  - add a hierarchy of new nodes to represent errors (RecoveryCallExpr
> etc). It seems like a lot of work to model all the different ways code can
> be invalid, and I don't think the extra structure would be useful enough to
> justify it.
>
> I have a very rough prototype of this idea:
> https://reviews.llvm.org/D61722
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D61722&d=DwMFaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=oOJsFtgkFCCcDlfLWKmfrVDtpXLihXJTBGJfSapXCV0&m=k3m3ocpqnJDQ1DBLtX6DNqMbG51U4RJ9VyH8qj2SfSc&s=x4VyJNvrS0sFh72jDS_z6ox6SL4zxZLE4lIP3i7tnLc&e=> (emits
> RecoveryExpr for failed overload resolution).
> It does create a form of recovery that affects diagnostics. But the
> changes exposed by tests are few and seem mostly good (especially compared
> to emitting CallExpr).
> Some open questions:
>  - what's the type of the expression when we can't guess? (IntTy in the
> patch)
>  - will there be surprises if RecoveryExpr is emitted in more places and
> they start to interact?
>  - should RecursiveASTVisitor, ASTMatchers etc skip these nodes by
> default, for compatibility with external code?
>  - what are the important use cases that I haven't thought about?
>
>
> _______________________________________________
> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_cfe-2Ddev&d=DwIGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=oOJsFtgkFCCcDlfLWKmfrVDtpXLihXJTBGJfSapXCV0&m=k3m3ocpqnJDQ1DBLtX6DNqMbG51U4RJ9VyH8qj2SfSc&s=pBwBCrRoYbCV6AZvGdWTEDd5gKfDDajyTQ8VRoQ38LY&e=
>
> _______________________________________________
> clangd-dev mailing list
> clangd-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/clangd-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190510/d5601a11/attachment.html>