[PATCH] Basic correction of "-" or ">" to "->" (PR9054)

Kaelyn Uhrain rikka at google.com
Fri Nov 1 09:42:55 PDT 2013


The trouble I was having wasn't so much about giving Sema enough context to
make the correction, but about being able to hit the right code paths at
the right times to be able to 1) have some confidence that the correction
is even semi-reasonable, and 2) to be able to suppress duplicate/extraneous
errors and ideally recover from the typo.

For "foo->bar", the parser handles "foo" as the LHS by calling
Parser::ParseCastExpression that eventionally calls down into Sema and
comes back, then ParseCastExpression sees the arrow and calls
ParsePostfixExpressionSuffix which handles the member lookup of "bar" with
Sema's help as the final part of building the LHS. Then the parser goes
back out to Parser::ParseAssignmentExpression (which called
ParseCastExpression to create the LHS expression) and calls
Parser::ParseRHSOfBinaryExpression for the pieces of the expression that
come after "foo->bar".

If "foo->bar" is mistyped as "foo-bar" or "foo>bar", the parser handles
"foo" as above, but returns back to ParseAssignmentExpression and calls
ParseRHSOfBinaryExpression to handle "-bar"/">bar". Then it isn't until
after the "-" or ">" has been parsed and the parser is calling into Sema to
figure out what "bar" is that an error is encountered. To recover from the
error at the point Sema encounters it, Sema would have to be able to tell
the parser to undo the parsing of RHS of a binary expression and the
operator that triggered it, redo the parsing of the LHS enough to call
ParsePostfixExpressionSuffix for the post-recovery "->bar", and go on to
re-invoke ParseRHSOfBinaryExpression on whatever comes after "bar". Or, as
in my patch, the parser can preemptively check for the conditions under
which the error may occur (with the assumption that minus and greater-than
are operations rarely performed on pointers to record objects in valid code
and so the overhead in such a situation is acceptable) to see whether "bar"
by itself refers to anything, and if the lookup fails and treating bar as a
member works, assume the "-" or ">" was intended to be "->".


On Fri, Nov 1, 2013 at 12:55 AM, Serge Pavlov <sepavloff at gmail.com> wrote:

> Another approach is to inform Sema about context where the unknown name
> occurs. That would allow typo correction code to be gathered in one place
> in Sema. Such kind of typos could be processed by the same machinery in
> ActOnIdExpression, which now tries to make correction of misspelled names.
> There are other typos that are nice to handle ("." vs ".*", dot instead of
> comma etc). Handling them in Parser could make the latter bulky.
> Thanks,
> --Serge
> 2013/11/1 Kaelyn Uhrain <rikka at google.com>
>> Attached is an initial patch for trying to correct a missing "-" or ">"
>> to "->" when accessing a member through an object pointer. This patch also
>> doesn't work for C code as C seems to hit a different code path. I'm
>> sending the patch out for pre-commit review even though it is a small and
>> fairly unobtrusive (code-wise) patch because I'm a bit iffy on whether it's
>> a good way to perform the diagnostic.
>> For a bit of context, what makes this diagnostic tricky is that the
>> original error about the unknown identifier after the "-" or ">" occurs
>> well within Sema as the parser is handling the RHS of a binary operator,
>> but the recovery would require following a code path in the parser that was
>> part of the construction of the LHS. And since Sema cannot tell the parser
>> to back up a few steps....
>> Cheers,
>> Kaelyn
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> --
> Thanks,
> --Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131101/6b13e2ac/attachment.html>

More information about the cfe-commits mailing list