[PATCH v2 6/6] Wire up delayed typo correction to DiagnoseEmptyLookup and set up Sema::ActOnIdExpression to use the new functionality.

Thu Nov 20 13:47:55 PST 2014

On Thu, Nov 20, 2014 at 1:31 PM, Kaelyn Takata <rikka at google.com> wrote:

>
>
> On Wed, Nov 19, 2014 at 3:36 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Wed, Nov 19, 2014 at 11:03 AM, Kaelyn Takata <rikka at google.com> wrote:
>>
>>>
>>>
>>> On Tue, Nov 18, 2014 at 10:50 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>> +cfe-commits
>>>>
>>>> Though I've had trouble applying the summary patch you posted - could
>>>> you rebase/repost? Sorry for the delays :/
>>>>
>>>
>>> No worries.
>>>
>>>>
>>>> On Tue, Nov 18, 2014 at 10:49 AM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 5, 2014 at 1:10 PM, Kaelyn Takata <rikka at google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 4, 2014 at 4:54 PM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Could you post/send (on/off list, etc) an aggregate patch so I can
>>>>>>> play around with the behavior of this patch?
>>>>>>>
>>>>>>> Some superficial patch review
>>>>>>>
>>>>>>> lambda passed to CorrectDelayedTyposInExpr has 'else-after-return' -
>>>>>>> you could move the short block up to the top:
>>>>>>>
>>>>>>> if (!...CPlusPlus11)
>>>>>>>   return this->VerifyIntegerConstantExpression(E);
>>>>>>> /* unindented the longer part of the code */
>>>>>>>
>>>>>>
>>>>>> Fixed. Thanks for catching that!
>>>>>>
>>>>>>>
>>>>>>> I'm slightly surprised at the indentation there too - I think
>>>>>>> clang-format has a special case for lambdas as the last function argument,
>>>>>>> where it won't indent them extra (so the "});" will be at the same level as
>>>>>>> the expression - maybe it only does that if you're not initializing a
>>>>>>> variable? Dunno.
>>>>>>>
>>>>>>
>>>>>> I think it was a quirk in the version of clang-format used to
>>>>>> originally format this several months ago; clang-format now formats it a
>>>>>> bit differently.
>>>>>>
>>>>>>>
>>>>>>> Do you need the 'this->' qualification in that lambda? I assume not.
>>>>>>> (& I usually favor just capture everything by reference [&] - but up to you
>>>>>>> on that, we're all still certainly feeling out how best to write C++11)
>>>>>>>
>>>>>>
>>>>>> I didn't think the implicit "this" could be used for accessing member
>>>>>> functions... I just learned something new. :) (Though "this" still needs to
>>>>>> be in the capture list--I didn't capture everything by reference because
>>>>>> the only variable that needed to be captured was "this".)
>>>>>>
>>>>>
>>>>> Sure - up to you. I usually don't even think about what's being
>>>>> captured and just write [&] by default. (this applies really well to
>>>>> functions that will only call the lambda within the function (not storing
>>>>> it away to call later - at which point you get into interesting lifetime
>>>>> situations, etc) - for those, it's trickier and depends on the lifetime of
>>>>> the things being captured as to whether to capture by ref or value, etc).
>>>>>
>>>>>
>>>>>>
>>>>>>> EmptyLookupTypoDiags::operator() - some {} on single-statement
>>>>>>> blocks (if (Ctx)/else)
>>>>>>> Might be worth inverting the if/following block - the if block is
>>>>>>> longer/nested than the tail block, so switching them around should reduce
>>>>>>> indentation, etc.
>>>>>>>
>>>>>>
>>>>>> Done and done.
>>>>>>
>>>>>>>
>>>>>>> std::string CorrectedStr(TC.getAsString(SemaRef.getLangOpts()));
>>>>>>>
>>>>>>> Prefer copy/move init rather than direct init:
>>>>>>>
>>>>>>> std::string CorrectedStr = TC.getAsString(SemaRef.getLangOpts());
>>>>>>>
>>>>>>
>>>>>> Changed.
>>>>>>
>>>>>>>
>>>>>>> Hmm - it might be nice to write EmptyLookupTypoDiags just as a
>>>>>>> single free function (taking all those members as parameters instead) then
>>>>>>> call that in a lambda - that way you don't have to write all the member
>>>>>>> state/ctor/etc logic?
>>>>>>>
>>>>>>
>>>>>> I had to capture variables by value instead of by reference to avoid
>>>>>> lifetime issues (and having 74 tests fail) and create a local variable so
>>>>>> the LookupResult didn't have to be captured since it cannot be copied, but
>>>>>> I did get it working as a static function and a lambda and the switch
>>>>>> resulted in a net change of 17 fewer lines (37 added, 54 deleted).
>>>>>>
>>>>>
>>>>> Cool cool.
>>>>>
>>>>>
>>>>>>
>>>>>>> lambda in ParseCaseStatement: I'd probably write "return
>>>>>>> ExprResult(...)" rather than explicitly specifying the return type. (do you
>>>>>>> even need to do either? If CorrectDelayedTyposInExpr is a template, then
>>>>>>> its return result should just implicitly convert to ExprResult in the
>>>>>>> caller, if it takes a std::function, that should still work too... - I can
>>>>>>> explain this more in person, perhaps.
>>>>>>>
>>>>>>
>>>>>> You're right, neither seem to be needed--though I thought I had added
>>>>>> the return type because of clang complaining without it.
>>>>>>
>>>>>>>
>>>>>>> Rather than goto in ParseCXXIdExpression, could that be a loop?
>>>>>>>
>>>>>>
>>>>>> The cleanest way I can see to write it as a loop would be to wrap it
>>>>>> in a "while (true) {...}" loop, and have an "if (Result.isInvalid() ||
>>>>>> Result.get()) break;" right before the (now unconditional) call to
>>>>>> UnconsumeToken. The only other way I can see to do the loop would require
>>>>>> checking "!Result.isInvalid() && !Result.get()" twice in a row, once to
>>>>>> know whether to call UnconsumeToken, and once in the do...while loop to
>>>>>> know whether to continue looping.
>>>>>>
>>>>>
>>>>> Hmm - fair point. I'd probably at least still write it using the
>>>>> while(true) loop rather than goto, just to demonstrate to the reader what
>>>>> kind of control flow is happening here more easily.
>>>>>
>>>>> As for other ways to write this... one alternative, which I wouldn't
>>>>> necessarily espouse as better than your suggested loop:
>>>>>
>>>>>   bool first = true;
>>>>>   do {
>>>>>     if (!first)
>>>>>       UnconsumeToken()
>>>>>     first = false;
>>>>>     ...
>>>>>   } while (!Result.isInvalid() && !Result.get())
>>>>>
>>>>> Though calling '!Result.isInvalid() && !Result.get()' twice probably
>>>>> isn't the end of the world either, though certainly feels a bit off. Your
>>>>> suggestion of:
>>>>>
>>>>>   while (true) {
>>>>>     ...
>>>>>     if (...)
>>>>>       break;
>>>>>     UnconsumeToken();
>>>>>   }
>>>>>
>>>>> seems good. It's probably about as obvious as its going to get, I'd
>>>>> imagine. If the body of the loop is long, it might be more clear to pull it
>>>>> into a function and that'd make the loop's scope and purpose clearer.
>>>>>
>>>>
>>> I just remembered the other reason why I wrote this as a goto instead of
>>> a loop (and why the goto label is "retry"): the body of the "loop" should
>>> in most cases only be executed once, would only be executed a second time
>>> if there is an error and typo correction suggests a keyword, and should
>>> *never* be executed more than twice (i.e. the goto should only ever be
>>> called at most once) unless something has gone horribly, impossibly wrong
>>> ("wrong" being a typo was encountered, a keyword was suggested,
>>> substituting the keyword causes typo correction to be triggered again in
>>> the same spot and suggest a keyword again... a situation that shouldn't be
>>> possible and for which I'd be willing to add an assertion to prevent it
>>> from happening).
>>>
>>
>> An assertion sounds good, in any case.
>>
>> If it's only a single retry - perhaps breaking out the body of the loop
>> into a function that you call twice would make sense?
>>
>> func()
>> if (failed)
>>   func(other stuff)
>> assert(!failed)
>>
>> ?
>>
>
> I've attached a patch that replaces the goto in favor of a helper method
> for the part of ParseCXXIdExpression that has to be retried when correcting
> a keyword typo (it had to be a method instead of a static function because
> almost all of the Parser members being accessed are private). I'm fine with
> committing either version.
>

Yep, that patch looks fine - thanks!

>
>>
>>>
>>>>>
>>>>>>>
>>>>>>> Hmm - why is the correction applied to the LHS/RHS of the binary
>>>>>>> expression, rather than waiting until the full expression is parsed?
>>>>>>>
>>>>>>
>>>>>> If you are referring to within ParseRHSOfBinaryExpression, it's
>>>>>> because any error in the RHS will cause the LHS to be replaced with an
>>>>>> ExprError without any typos in the LHS being diagnosed, so waiting until
>>>>>> the full expression is parsed causes typos to go undiagnosed if there are
>>>>>> unrelated parse errors in the full expression. This had been a bit of a
>>>>>> pain to figure out and track down why typo correction on certain full
>>>>>> expressions wasn't happening for part of the expression...
>>>>>>
>>>>>
>>>>> Hmm, that's interesting - I'd be curious to understand why that is,
>>>>> but I guess it's got a good reason.
>>>>>
>>>>
>>> I think it's because errors in the expression are communicated through
>>> the same channel as the expression itself: the ExprResult. So for something
>>> that combines multiple expressions into one (which it seems that
>>> ParseRHSOfBinaryExpression handles both parsing the RHS and combining the
>>> LHS and RHS), it can either return the subexpression tree or an
>>> ExprError.... which is fine up until something more needs to be done with
>>> the subexpression tree once an ExprError has been generated.
>>>
>>
>> Hmm, OK - that makes sense. Could/should we then do the correction on the
>> LHS/RHS only when a failure is about to cause the expression to be thrown
>> out? (that way if there's no error... oh, but of course there'
>>
>
> Yup, typo correction is being attempted in the places like
> ParseRHSOfBinaryExpression only when there is already an error and the
> original expression (including any TypoExprs within it) are about to be
> lost. ;)
>
>>
>>
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 29, 2014 at 12:49 PM, Kaelyn Takata <rikka at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Among other things, this eliminates at least one redundant error
>>>>>>>> case
>>>>>>>> and allows recovery in several cases where it wasn't possible before
>>>>>>>> (e.g. correcting a mistyped static_cast<>).
>>>>>>>> ---
>>>>>>>>  include/clang/Parse/Parser.h         |   9 +++
>>>>>>>>  include/clang/Sema/Sema.h            |   4 +-
>>>>>>>>  lib/Parse/ParseCXXInlineMethods.cpp  |   1 +
>>>>>>>>  lib/Parse/ParseDecl.cpp              |   4 +-
>>>>>>>>  lib/Parse/ParseDeclCXX.cpp           |   2 +-
>>>>>>>>  lib/Parse/ParseExpr.cpp              |  30 +++++++---
>>>>>>>>  lib/Parse/ParseExprCXX.cpp           |  13 ++++-
>>>>>>>>  lib/Parse/ParseObjc.cpp              |   5 +-
>>>>>>>>  lib/Parse/ParseOpenMP.cpp            |   3 +-
>>>>>>>>  lib/Parse/ParseStmt.cpp              |   8 ++-
>>>>>>>>  lib/Sema/SemaExpr.cpp                | 110
>>>>>>>> ++++++++++++++++++++++++++++++++---
>>>>>>>>  lib/Sema/SemaStmt.cpp                |  19 ++++++
>>>>>>>>  test/FixIt/fixit-unrecoverable.cpp   |   4 +-
>>>>>>>>  test/SemaCXX/typo-correction.cpp     |   7 ++-
>>>>>>>>  test/SemaTemplate/crash-10438657.cpp |   2 +-
>>>>>>>>  15 files changed, 189 insertions(+), 32 deletions(-)
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> cfe-commits mailing list
>>>>>>>> cfe-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141120/07188637/attachment.html>