r194002 - Try to correct a mistyped "-" or ">" to "->" for some C++ cases.

Tue Nov 5 12:31:39 PST 2013

On Tue, Nov 5, 2013 at 12:14 PM, Richard Smith <richard at metafoo.co.uk>wrote:

> On Tue, Nov 5, 2013 at 11:54 AM, Kaelyn Uhrain <rikka at google.com> wrote:
>
>>
>>
>>
>> On Tue, Nov 5, 2013 at 11:37 AM, Richard Smith <richard at metafoo.co.uk>wrote:
>>
>>> On Tue, Nov 5, 2013 at 10:40 AM, Kaelyn Uhrain <rikka at google.com> wrote:
>>>
>>>> On Tue, Nov 5, 2013 at 10:21 AM, Richard Smith <richard at metafoo.co.uk>wrote:
>>>>
>>>>> On Tue, Nov 5, 2013 at 9:56 AM, Kaelyn Uhrain <rikka at google.com>wrote:
>>>>>
>>>>>> On Mon, Nov 4, 2013 at 6:37 PM, Richard Smith <richard at metafoo.co.uk>wrote:
>>>>>>
>>>>>>>
>>>>>>> On 4 Nov 2013 11:05, "Kaelyn Uhrain" <rikka at google.com> wrote:
>>>>>>> >
>>>>>>> > Author: rikka
>>>>>>> > Date: Mon Nov  4 12:59:34 2013
>>>>>>> > New Revision: 194002
>>>>>>> >
>>>>>>> > URL: http://llvm.org/viewvc/llvm-project?rev=194002&view=rev
>>>>>>> > Log:
>>>>>>> > Try to correct a mistyped "-" or ">" to "->" for some C++ cases.
>>>>>>> >
>>>>>>> > Similar C code isn't caught as it seems to hit a different code
>>>>>>> path.
>>>>>>> > Also, as the check is only done for record pointers, cases
>>>>>>> involving
>>>>>>> > an overloaded operator-> are not handled either. Note that the
>>>>>>> reason
>>>>>>> > this check is done in the parser instead of Sema is not related to
>>>>>>> > having enough knowledge about the current state as it is about
>>>>>>> being
>>>>>>> > able to fix up the parser's state to be able to recover and
>>>>>>> traverse the
>>>>>>> > correct code paths.
>>>>>>> >
>>>>>>> > Modified:
>>>>>>> >     cfe/trunk/include/clang/Basic/DiagnosticParseKinds.td
>>>>>>> >     cfe/trunk/lib/Parse/ParseExpr.cpp
>>>>>>> >     cfe/trunk/test/SemaCXX/member-expr.cpp
>>>>>>> >
>>>>>>> > Modified: cfe/trunk/include/clang/Basic/DiagnosticParseKinds.td
>>>>>>> > URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticParseKinds.td?rev=194002&r1=194001&r2=194002&view=diff
>>>>>>> >
>>>>>>> ==============================================================================
>>>>>>> > --- cfe/trunk/include/clang/Basic/DiagnosticParseKinds.td
>>>>>>> (original)
>>>>>>> > +++ cfe/trunk/include/clang/Basic/DiagnosticParseKinds.td Mon Nov
>>>>>>>  4 12:59:34 2013
>>>>>>> > @@ -499,6 +499,9 @@ def ext_abstract_pack_declarator_parens
>>>>>>> >  def err_function_is_not_record : Error<
>>>>>>> >    "unexpected '%select{.|->}0' in function call; perhaps remove
>>>>>>> the "
>>>>>>> >    "'%select{.|->}0'?">;
>>>>>>> > +def err_mistyped_arrow_in_member_access : Error<
>>>>>>> > +  "use of undeclared identifier %0; did you mean '->' instead of "
>>>>>>> > +  "'%select{-|>}1'?">;
>>>>>>> >
>>>>>>> >  // C++ derived classes
>>>>>>> >  def err_dup_virtual : Error<"duplicate 'virtual' in base
>>>>>>> specifier">;
>>>>>>> >
>>>>>>> > Modified: cfe/trunk/lib/Parse/ParseExpr.cpp
>>>>>>> > URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Parse/ParseExpr.cpp?rev=194002&r1=194001&r2=194002&view=diff
>>>>>>> >
>>>>>>> ==============================================================================
>>>>>>> > --- cfe/trunk/lib/Parse/ParseExpr.cpp (original)
>>>>>>> > +++ cfe/trunk/lib/Parse/ParseExpr.cpp Mon Nov  4 12:59:34 2013
>>>>>>> > @@ -166,6 +166,46 @@ ExprResult Parser::ParseAssignmentExpres
>>>>>>> >    ExprResult LHS =
>>>>>>> ParseCastExpression(/*isUnaryExpression=*/false,
>>>>>>> >
>>>>>>> /*isAddressOfOperand=*/false,
>>>>>>> >                                         isTypeCast);
>>>>>>> > +
>>>>>>> > +  // Check for a possible typo of "-" or ">" instead of "->"
>>>>>>> after a
>>>>>>> > +  // pointer to a struct or class, while recovery is still
>>>>>>> possible.
>>>>>>> > +  if (LHS.isUsable() && (Tok.is(tok::minus) ||
>>>>>>> Tok.is(tok::greater))) {
>>>>>>> > +    QualType LHSType = LHS.get()->getType();
>>>>>>> > +    const RecordType *Pointee =
>>>>>>> > +        LHSType->isPointerType()
>>>>>>> > +            ? LHSType->getPointeeType()->getAsStructureType()
>>>>>>> > +            : 0;
>>>>>>> > +    const RecordDecl *RD = Pointee ? Pointee->getDecl() : 0;
>>>>>>> > +    const Token &NextTok = NextToken();
>>>>>>> > +    if (RD && NextTok.is(tok::identifier)) {
>>>>>>> > +      UnqualifiedId Name;
>>>>>>> > +      CXXScopeSpec ScopeSpec;
>>>>>>> > +      SourceLocation TemplateKWLoc;
>>>>>>> > +      NoTypoCorrectionCCC NoTCValidator;
>>>>>>> > +      Name.setIdentifier(NextTok.getIdentifierInfo(),
>>>>>>> NextTok.getLocation());
>>>>>>> > +      Sema::SFINAETrap Trap(Actions);
>>>>>>> > +      ExprResult Res =
>>>>>>> > +          Actions.ActOnIdExpression(getCurScope(), ScopeSpec,
>>>>>>> TemplateKWLoc,
>>>>>>> > +                                    Name, false, false,
>>>>>>> &NoTCValidator);
>>>>>>> > +      if (Res.isInvalid()) {
>>>>>>>
>>>>>>> What happens if the next token is ( or :: or similar, and this
>>>>>>> identifier is not an id-expression by itself? For instance, p-f(x) might
>>>>>>> find f by adl.
>>>>>>>
>>>>>> Assuming (1) "f" is both the name of a member of "p" and something
>>>>>> not found through a simple lookup but through adl or similar, and (2) the
>>>>>> pointer arithmetic/comparison on an object pointer is intended, then yes
>>>>>> this will probably do the wrong thing. And while it's not great, working
>>>>>> around that should be pretty trivial (e.g. adding parentheses around the
>>>>>> rhs of the "-" or ">").
>>>>>>
>>>>>
>>>>> OK, so this means we reject valid code; that's not OK -- please fix or
>>>>> revert. Here's one example of valid code this patch rejects:
>>>>>
>>>>> struct S { int f(...); };
>>>>> namespace X { struct T {}; int f(T); }
>>>>> S *g(S *p) { return p - f(X::T()); }
>>>>>
>>>>> This also produces a diagnostic with garbage in it:
>>>>>
>>>>>  error: use of undeclared identifier '������'; did you mean '->'
>>>>> instead of '-'?
>>>>>
>>>>
>>>> Reverted the commit until a way to handle cases like these can be
>>>> figured out. You wouldn't happen to have any suggestions for how to detect
>>>> cases like the one you gave, do you? The help would be much appreciated.
>>>>
>>>
>>> The easy thing to do would be to form a blacklist of problematic next
>>> tokens. This would need to include at least '(' and '::' -- though
>>> blacklisting '(' seems to remove a lot of the value from this diagnostic,
>>> sadly.
>>>
>>> I wonder if you can move this into ParseRHSOfBinaryExpression and wrap
>>> some kind of diagnostic trap/buffer around the ParseAssignmentExpression
>>> call there. Then, if that fails, try to interpret the - or > as a ->
>>> instead. You'd need to make sure that the diagnostic trap is cheap enough
>>> to not cause problems in such a hot code path, though.
>>>
>>> Maybe you could introduce a mechanism to register a callback to be
>>> invoked if an error is produced; from that mechanism, you could check
>>> whether interpreting the expression as a member expression would work. If
>>> it would, produce your new diagnostic from there and suppress all further
>>> diagnostics until you get back to ParseRHSOfBinaryExpr.
>>>
>>
>> Right now I'm looking into having a way to (optionally) capture multiple
>> diagnostics for later emission, with the intent of calling
>> ParseRHSOfBinaryExpr as if the "-" or ">" is intended, and if diagnostics
>> are captured then retry with an arrow instead similar to what my original
>> patch does (including the SFINAETrap). Then if that fails, emit the
>> captured diagnostics and pretend that the attempted arrow substitution
>> never happened. Plus I remember encountering other situations where I
>> wished I could capture multiple diagnostics for possible later emission if
>> an alternate recovery attempt failed, so I think that part may prove quite
>> useful outside of this particular situation. If I understand your
>> suggestion correctly, I think my approach is quite similar but without
>> having to ensure the callback is invoked prior to emitting a diagnostic
>> wherever that diagnostic might occur and without having the callback be
>> able to temporarily reset the state of the parser and without the callback
>> needing a way to revert/abort part of the current call chain.
>>
>
> My idea was not to revert / abort the current call chain, but simply to
> suppress all the diagnostics from it. That way we can avoid needing to
> buffer up diagnostics in the more common case that the '-' or '>' was not a
> typo. And my intent was that the callback would be invoked by Sema's
> diagnostic machinery, so it shouldn't require any changes outside that --
> and you're going to need changes there either way, if you want to implement
> a buffering diagnostics mechanism. I don't think we would need to touch the
> parser state, but we might possibly need to restore Sema's context, or at
> least check that building a member expression doesn't look at the context.
>
> If you think that implementing a diagnostic buffering system will work out
> better, I'm OK with that, but I'm more concerned about the possible
> performance cost with that approach.
>

I'd only do the buffering in the case where the pointer is for a record and
the next token is "-" or ">" and the buffering would just be to capturing
the diagnostic messages instead of letting them be printed (and hence most
of the overhead is only in the error case). Also, I'm not sure how the
callback approach would handle backing up the parser to traverse a
different code path to effectively call ParsePostfixExpressionSuffix then
make the call to ParseRHSOfBinaryExpr on a different set of tokens, in
order to both determine whether the substitution of an arrow will work and
to recover if it does
.

>
>    Also, in the case where this succeeds, we should annotate the produced
>>>>>>> expression onto the token stream to avoid reparsing it later.
>>>>>>>
>>>>>> I'm not sure what you're saying here... care to explain?
>>>>>>
>>>>>
>>>>> If you're going to the effort to produce an Expr from the identifier,
>>>>> please replace the tokens you used with a tok::annot_primary_expr token so
>>>>> that we won't need to redo the lookup and recreate the Expr when we
>>>>> actually parse the RHS.
>>>>>
>>>>
>>>>  From my understanding, when the correction is performed, the RHS
>>>> parsed by ParseRHSOfBinaryExpression is for what comes after the member
>>>> expr (so after "p->f" which "p-f" was corrected to)--and the parsing has no
>>>> more overlap than if the arrow hadn't been mistyped. Or are you referring
>>>> to the RHS of the arrow and the call to ParsePostfixExpressionSuffix, in
>>>> which case the lookup of the member expr has to be done anyway since the
>>>> identifier lookup had failed?
>>>>
>>>
>>> The case I'm referring to is when the correction is *not* performed, and
>>> we've already correctly parsed the id-expression that follows. In that
>>> case, the current approach would throw away its Expr*, and then immediately
>>> re-parse the exact same subexpression. But if the approach I suggested
>>> above works, this is moot because there's no reparsing involved.
>>>
>>
>> Ah, I see what you're saying now. I misunderstood which part you were
>> talking about "succeeding" (thanks a lot gmail...).
>>
>>
>>>
>>>
>>>>
>>>>>
>>>>>>  > +        Token OpTok = Tok;
>>>>>>> > +        Tok.setKind(tok::arrow);
>>>>>>> > +        PP.EnableBacktrackAtThisPos();
>>>>>>> > +        Res = ParsePostfixExpressionSuffix(LHS);
>>>>>>> > +        if (Res.isUsable()) {
>>>>>>> > +          LHS = Res;
>>>>>>> > +          PP.CommitBacktrackedTokens();
>>>>>>> > +          Diag(OpTok, diag::err_mistyped_arrow_in_member_access)
>>>>>>> > +              << NextTok.getIdentifierInfo() <<
>>>>>>> OpTok.is(tok::greater)
>>>>>>> > +              <<
>>>>>>> FixItHint::CreateReplacement(OpTok.getLocation(), "->");
>>>>>>> > +        } else {
>>>>>>> > +          Tok = OpTok;
>>>>>>> > +          PP.Backtrack();
>>>>>>> > +        }
>>>>>>> > +      }
>>>>>>> > +    }
>>>>>>> > +  }
>>>>>>> > +
>>>>>>> >    return ParseRHSOfBinaryExpression(LHS, prec::Assignment);
>>>>>>> >  }
>>>>>>> >
>>>>>>> >
>>>>>>> > Modified: cfe/trunk/test/SemaCXX/member-expr.cpp
>>>>>>> > URL:
>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/SemaCXX/member-expr.cpp?rev=194002&r1=194001&r2=194002&view=diff
>>>>>>> >
>>>>>>> ==============================================================================
>>>>>>> > --- cfe/trunk/test/SemaCXX/member-expr.cpp (original)
>>>>>>> > +++ cfe/trunk/test/SemaCXX/member-expr.cpp Mon Nov  4 12:59:34 2013
>>>>>>> > @@ -224,3 +224,16 @@ namespace pr16676 {
>>>>>>> >          .i;  // expected-error {{member reference type
>>>>>>> 'pr16676::S *' is a pointer; maybe you meant to use '->'}}
>>>>>>> >    }
>>>>>>> >  }
>>>>>>> > +
>>>>>>> > +namespace PR9054 {
>>>>>>> > +struct Foo {
>>>>>>> > +  void bar(int);
>>>>>>> > +  int fiz;
>>>>>>> > +};
>>>>>>> > +
>>>>>>> > +int test(struct Foo *foo) {
>>>>>>> > +  foo-bar(5);  // expected-error {{use of undeclared identifier
>>>>>>> 'bar'; did you mean '->' instead of '-'?}}
>>>>>>> > +  foo>baz(4);  // expected-error-re {{use of undeclared
>>>>>>> identifier 'baz'$}}
>>>>>>> > +  return foo>fiz;  // expected-error {{use of undeclared
>>>>>>> identifier 'fiz'; did you mean '->' instead of '>'?}}
>>>>>>> > +}
>>>>>>> > +}
>>>>>>> >
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > cfe-commits mailing list
>>>>>>> > cfe-commits at cs.uiuc.edu
>>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131105/bda207e7/attachment.html>