[PATCH v2 6/6] Wire up delayed typo correction to DiagnoseEmptyLookup and set up Sema::ActOnIdExpression to use the new functionality.

David Blaikie dblaikie at gmail.com
Wed Nov 19 16:24:29 PST 2014


OK, I think I've wrapped my head around enough of this to sign off - thanks
for your patience.

One random vague idea: if we find plausible typo corrections in a binary
operator (such that the binary operator is no longer ExprError) - should we
avoid actually applying the typos and wait until we get to the top of the
expression to see if we could get better typo corrections by using more
context? Not sure if that makes sense/would work.

On Wed, Nov 19, 2014 at 3:37 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Wed, Nov 19, 2014 at 3:36 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Wed, Nov 19, 2014 at 11:03 AM, Kaelyn Takata <rikka at google.com> wrote:
>>
>>>
>>>
>>> On Tue, Nov 18, 2014 at 10:50 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>> +cfe-commits
>>>>
>>>> Though I've had trouble applying the summary patch you posted - could
>>>> you rebase/repost? Sorry for the delays :/
>>>>
>>>
>>> No worries.
>>>
>>>>
>>>> On Tue, Nov 18, 2014 at 10:49 AM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 5, 2014 at 1:10 PM, Kaelyn Takata <rikka at google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 4, 2014 at 4:54 PM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Could you post/send (on/off list, etc) an aggregate patch so I can
>>>>>>> play around with the behavior of this patch?
>>>>>>>
>>>>>>> Some superficial patch review
>>>>>>>
>>>>>>> lambda passed to CorrectDelayedTyposInExpr has 'else-after-return' -
>>>>>>> you could move the short block up to the top:
>>>>>>>
>>>>>>> if (!...CPlusPlus11)
>>>>>>>   return this->VerifyIntegerConstantExpression(E);
>>>>>>> /* unindented the longer part of the code */
>>>>>>>
>>>>>>
>>>>>> Fixed. Thanks for catching that!
>>>>>>
>>>>>>>
>>>>>>> I'm slightly surprised at the indentation there too - I think
>>>>>>> clang-format has a special case for lambdas as the last function argument,
>>>>>>> where it won't indent them extra (so the "});" will be at the same level as
>>>>>>> the expression - maybe it only does that if you're not initializing a
>>>>>>> variable? Dunno.
>>>>>>>
>>>>>>
>>>>>> I think it was a quirk in the version of clang-format used to
>>>>>> originally format this several months ago; clang-format now formats it a
>>>>>> bit differently.
>>>>>>
>>>>>>>
>>>>>>> Do you need the 'this->' qualification in that lambda? I assume not.
>>>>>>> (& I usually favor just capture everything by reference [&] - but up to you
>>>>>>> on that, we're all still certainly feeling out how best to write C++11)
>>>>>>>
>>>>>>
>>>>>> I didn't think the implicit "this" could be used for accessing member
>>>>>> functions... I just learned something new. :) (Though "this" still needs to
>>>>>> be in the capture list--I didn't capture everything by reference because
>>>>>> the only variable that needed to be captured was "this".)
>>>>>>
>>>>>
>>>>> Sure - up to you. I usually don't even think about what's being
>>>>> captured and just write [&] by default. (this applies really well to
>>>>> functions that will only call the lambda within the function (not storing
>>>>> it away to call later - at which point you get into interesting lifetime
>>>>> situations, etc) - for those, it's trickier and depends on the lifetime of
>>>>> the things being captured as to whether to capture by ref or value, etc).
>>>>>
>>>>>
>>>>>>
>>>>>>> EmptyLookupTypoDiags::operator() - some {} on single-statement
>>>>>>> blocks (if (Ctx)/else)
>>>>>>> Might be worth inverting the if/following block - the if block is
>>>>>>> longer/nested than the tail block, so switching them around should reduce
>>>>>>> indentation, etc.
>>>>>>>
>>>>>>
>>>>>> Done and done.
>>>>>>
>>>>>>>
>>>>>>> std::string CorrectedStr(TC.getAsString(SemaRef.getLangOpts()));
>>>>>>>
>>>>>>> Prefer copy/move init rather than direct init:
>>>>>>>
>>>>>>> std::string CorrectedStr = TC.getAsString(SemaRef.getLangOpts());
>>>>>>>
>>>>>>
>>>>>> Changed.
>>>>>>
>>>>>>>
>>>>>>> Hmm - it might be nice to write EmptyLookupTypoDiags just as a
>>>>>>> single free function (taking all those members as parameters instead) then
>>>>>>> call that in a lambda - that way you don't have to write all the member
>>>>>>> state/ctor/etc logic?
>>>>>>>
>>>>>>
>>>>>> I had to capture variables by value instead of by reference to avoid
>>>>>> lifetime issues (and having 74 tests fail) and create a local variable so
>>>>>> the LookupResult didn't have to be captured since it cannot be copied, but
>>>>>> I did get it working as a static function and a lambda and the switch
>>>>>> resulted in a net change of 17 fewer lines (37 added, 54 deleted).
>>>>>>
>>>>>
>>>>> Cool cool.
>>>>>
>>>>>
>>>>>>
>>>>>>> lambda in ParseCaseStatement: I'd probably write "return
>>>>>>> ExprResult(...)" rather than explicitly specifying the return type. (do you
>>>>>>> even need to do either? If CorrectDelayedTyposInExpr is a template, then
>>>>>>> its return result should just implicitly convert to ExprResult in the
>>>>>>> caller, if it takes a std::function, that should still work too... - I can
>>>>>>> explain this more in person, perhaps.
>>>>>>>
>>>>>>
>>>>>> You're right, neither seem to be needed--though I thought I had added
>>>>>> the return type because of clang complaining without it.
>>>>>>
>>>>>>>
>>>>>>> Rather than goto in ParseCXXIdExpression, could that be a loop?
>>>>>>>
>>>>>>
>>>>>> The cleanest way I can see to write it as a loop would be to wrap it
>>>>>> in a "while (true) {...}" loop, and have an "if (Result.isInvalid() ||
>>>>>> Result.get()) break;" right before the (now unconditional) call to
>>>>>> UnconsumeToken. The only other way I can see to do the loop would require
>>>>>> checking "!Result.isInvalid() && !Result.get()" twice in a row, once to
>>>>>> know whether to call UnconsumeToken, and once in the do...while loop to
>>>>>> know whether to continue looping.
>>>>>>
>>>>>
>>>>> Hmm - fair point. I'd probably at least still write it using the
>>>>> while(true) loop rather than goto, just to demonstrate to the reader what
>>>>> kind of control flow is happening here more easily.
>>>>>
>>>>> As for other ways to write this... one alternative, which I wouldn't
>>>>> necessarily espouse as better than your suggested loop:
>>>>>
>>>>>   bool first = true;
>>>>>   do {
>>>>>     if (!first)
>>>>>       UnconsumeToken()
>>>>>     first = false;
>>>>>     ...
>>>>>   } while (!Result.isInvalid() && !Result.get())
>>>>>
>>>>> Though calling '!Result.isInvalid() && !Result.get()' twice probably
>>>>> isn't the end of the world either, though certainly feels a bit off. Your
>>>>> suggestion of:
>>>>>
>>>>>   while (true) {
>>>>>     ...
>>>>>     if (...)
>>>>>       break;
>>>>>     UnconsumeToken();
>>>>>   }
>>>>>
>>>>> seems good. It's probably about as obvious as its going to get, I'd
>>>>> imagine. If the body of the loop is long, it might be more clear to pull it
>>>>> into a function and that'd make the loop's scope and purpose clearer.
>>>>>
>>>>
>>> I just remembered the other reason why I wrote this as a goto instead of
>>> a loop (and why the goto label is "retry"): the body of the "loop" should
>>> in most cases only be executed once, would only be executed a second time
>>> if there is an error and typo correction suggests a keyword, and should
>>> *never* be executed more than twice (i.e. the goto should only ever be
>>> called at most once) unless something has gone horribly, impossibly wrong
>>> ("wrong" being a typo was encountered, a keyword was suggested,
>>> substituting the keyword causes typo correction to be triggered again in
>>> the same spot and suggest a keyword again... a situation that shouldn't be
>>> possible and for which I'd be willing to add an assertion to prevent it
>>> from happening).
>>>
>>
>> An assertion sounds good, in any case.
>>
>> If it's only a single retry - perhaps breaking out the body of the loop
>> into a function that you call twice would make sense?
>>
>> func()
>> if (failed)
>>   func(other stuff)
>> assert(!failed)
>>
>> ?
>>
>>
>>>
>>>>>
>>>>>>>
>>>>>>> Hmm - why is the correction applied to the LHS/RHS of the binary
>>>>>>> expression, rather than waiting until the full expression is parsed?
>>>>>>>
>>>>>>
>>>>>> If you are referring to within ParseRHSOfBinaryExpression, it's
>>>>>> because any error in the RHS will cause the LHS to be replaced with an
>>>>>> ExprError without any typos in the LHS being diagnosed, so waiting until
>>>>>> the full expression is parsed causes typos to go undiagnosed if there are
>>>>>> unrelated parse errors in the full expression. This had been a bit of a
>>>>>> pain to figure out and track down why typo correction on certain full
>>>>>> expressions wasn't happening for part of the expression...
>>>>>>
>>>>>
>>>>> Hmm, that's interesting - I'd be curious to understand why that is,
>>>>> but I guess it's got a good reason.
>>>>>
>>>>
>>> I think it's because errors in the expression are communicated through
>>> the same channel as the expression itself: the ExprResult. So for something
>>> that combines multiple expressions into one (which it seems that
>>> ParseRHSOfBinaryExpression handles both parsing the RHS and combining the
>>> LHS and RHS), it can either return the subexpression tree or an
>>> ExprError.... which is fine up until something more needs to be done with
>>> the subexpression tree once an ExprError has been generated.
>>>
>>
>> Hmm, OK - that makes sense. Could/should we then do the correction on the
>> LHS/RHS only when a failure is about to cause the expression to be thrown
>> out? (that way if there's no error... oh, but of course there'
>>
>
> (fat fingers...)
>
> but of course there's an error, because we're correcting typos)
>
> So, yeah... not sure there's a nicer way to handle that, then.... *wonders*
>
> (still reviewing things)
>
>
>>
>>
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 29, 2014 at 12:49 PM, Kaelyn Takata <rikka at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Among other things, this eliminates at least one redundant error
>>>>>>>> case
>>>>>>>> and allows recovery in several cases where it wasn't possible before
>>>>>>>> (e.g. correcting a mistyped static_cast<>).
>>>>>>>> ---
>>>>>>>>  include/clang/Parse/Parser.h         |   9 +++
>>>>>>>>  include/clang/Sema/Sema.h            |   4 +-
>>>>>>>>  lib/Parse/ParseCXXInlineMethods.cpp  |   1 +
>>>>>>>>  lib/Parse/ParseDecl.cpp              |   4 +-
>>>>>>>>  lib/Parse/ParseDeclCXX.cpp           |   2 +-
>>>>>>>>  lib/Parse/ParseExpr.cpp              |  30 +++++++---
>>>>>>>>  lib/Parse/ParseExprCXX.cpp           |  13 ++++-
>>>>>>>>  lib/Parse/ParseObjc.cpp              |   5 +-
>>>>>>>>  lib/Parse/ParseOpenMP.cpp            |   3 +-
>>>>>>>>  lib/Parse/ParseStmt.cpp              |   8 ++-
>>>>>>>>  lib/Sema/SemaExpr.cpp                | 110
>>>>>>>> ++++++++++++++++++++++++++++++++---
>>>>>>>>  lib/Sema/SemaStmt.cpp                |  19 ++++++
>>>>>>>>  test/FixIt/fixit-unrecoverable.cpp   |   4 +-
>>>>>>>>  test/SemaCXX/typo-correction.cpp     |   7 ++-
>>>>>>>>  test/SemaTemplate/crash-10438657.cpp |   2 +-
>>>>>>>>  15 files changed, 189 insertions(+), 32 deletions(-)
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> cfe-commits mailing list
>>>>>>>> cfe-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141119/391d21ec/attachment.html>


More information about the cfe-commits mailing list