[PATCH v4 5/8] Start adding the infrastructure for handling TypoExprs.

Kaelyn Takata rikka at google.com
Thu Oct 23 11:14:00 PDT 2014


Well, I've found a test case that would trigger the assertion but in the
code's current form does the right thing:

class Foo {
public:
  template <> void testIt();
  void textIt();
};

void test(Foo *f) {
  f->TestIt();
}


With the patchset in it's current incarnation (including the fix I
mentioned in my last email):

fail.cc:3:20: error: no function template matches function template
specialization 'testIt'
  template <> void testIt();
                   ^
fail.cc:8:6: error: no member named 'TestIt' in 'Foo'; did you mean
'textIt'?
  f->TestIt();
     ^~~~~~
     textIt
fail.cc:4:8: note: 'textIt' declared here
  void textIt();
       ^
2 errors generated.


With the "if (!NE.isInvalid())" as an assertion:

fail.cc:3:20: error: no function template matches function template
specialization 'testIt'
  template <> void testIt();
                   ^
clang-3.6: ../tools/clang/lib/Sema/SemaExprCXX.cpp:6055: ExprResult
(anonymous namespace)::TransformTypos::TransformTypoExpr(clang::TypoExpr
*): Assertion `!NE.isInvalid()' failed.


Note that the (invalid) function template "testIt" has an edit distance of
1 from "TestIt" and the proper suggestion, "textIt", has an edit distance
of 2. I've added the above test case to typo-correction-delayed.cpp.


On Thu, Oct 23, 2014 at 11:05 AM, David Blaikie <dblaikie at gmail.com> wrote:

> (sorry for the repetition - I'm failing at 'reply all' this morning)
>
> On Thu, Oct 23, 2014 at 10:53 AM, Kaelyn Takata <rikka at google.com> wrote:
>
>> On Wed, Oct 22, 2014 at 2:19 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Oct 21, 2014 at 1:37 PM, Kaelyn Takata <rikka at google.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Oct 17, 2014 at 9:52 AM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 9, 2014 at 10:12 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 9, 2014 at 9:58 AM, Kaelyn Takata <rikka at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 7, 2014 at 5:03 PM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 7, 2014 at 2:29 PM, Kaelyn Takata <rikka at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 7, 2014 at 11:43 AM, David Blaikie <dblaikie at gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 6, 2014 at 3:46 PM, Kaelyn Takata <rikka at google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 3, 2014 at 11:45 AM, David Blaikie <
>>>>>>>>>>> dblaikie at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Given that these are parts of a patch series, some of which
>>>>>>>>>>>> have already been signed off on/I haven't looked at, it's a bit hard to
>>>>>>>>>>>> review the overall design - but I believe you & Richard talked that over,
>>>>>>>>>>>> so I'm not too worried about that (just explaining why my review feedback's
>>>>>>>>>>>> going to be a bit narrow)
>>>>>>>>>>>>
>>>>>>>>>>>> +  if (NumTypos > 0 && !ExprEvalContexts.empty())
>>>>>>>>>>>> +    ExprEvalContexts.back().NumTypos += NumTypos;
>>>>>>>>>>>> +  else
>>>>>>>>>>>> +    assert(NumTypos == 0 && "There are outstanding typos after popping the "
>>>>>>>>>>>> +                            "last ExpressionEvaluationContextRecord");
>>>>>>>>>>>>
>>>>>>>>>>>> Could this be simplified? I think it's equivalent to:
>>>>>>>>>>>>
>>>>>>>>>>>>   if (!ExprEvalContexts.empty())
>>>>>>>>>>>>     ExprEvalContexts.back().NumTypos += NumTypos;
>>>>>>>>>>>>   else
>>>>>>>>>>>>     assert(NumTypos == 0 ... );
>>>>>>>>>>>>
>>>>>>>>>>>> or:
>>>>>>>>>>>>
>>>>>>>>>>>>   assert(ExprEvalContexts.empty() || NumTypos != 0 && ...)
>>>>>>>>>>>>   if (!ExprEvalContexts.empty())
>>>>>>>>>>>>     ...
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, the "NumTypos > 0" was unnecessary and has been removed.
>>>>>>>>>>> (FYI, the second form you suggested doesn't quite work because the assert
>>>>>>>>>>> would fail on the common case where ExprEvalContexts is not empty and there
>>>>>>>>>>> are no typos.)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hmm, right, maybe:
>>>>>>>>>>
>>>>>>>>>>   assert(ExprEvalContexts.empty() ^ NumTypos != 0 && ... )
>>>>>>>>>>   if (!ExprEvalContexts.empty())
>>>>>>>>>>     ExprEvalContexts.back().NumTypos += NumTypos;
>>>>>>>>>>
>>>>>>>>>> but perhaps that's just too confusing. I just find an assertion
>>>>>>>>>> as the only thing inside a block of code to be a bit off... but it's
>>>>>>>>>> certainly not unprecedented in LLVM and is probably more legible - just my
>>>>>>>>>> own hangup to get over :)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> There's an unnecessary semicolon at the end of the "while (true)" loop in TransformTypos::Transform
>>>>>>>>>>>>
>>>>>>>>>>>> Removed.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'd roll this up a bit:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +      auto &State = SemaRef.getTypoExprState(TE);
>>>>>>>>>>>> +      if (State.DiagHandler)
>>>>>>>>>>>> +        (*State.DiagHandler)(State.Consumer->getCurrentCorrection());
>>>>>>>>>>>>
>>>>>>>>>>>> into:
>>>>>>>>>>>>
>>>>>>>>>>>>   if (auto *Handler = SemaRef.getTypoExprState(TE).DiagHandler)
>>>>>>>>>>>>     (*DiagHandler)(State.Consumer->getCurrentCorrection());
>>>>>>>>>>>>
>>>>>>>>>>>> That rollup doesn't work because State would be unknown when
>>>>>>>>>>> accessing State.Consumer in the argument to the DiagHandler.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oh, right!
>>>>>>>>>>
>>>>>>>>>>> count in the condition, plus insert in the body of the if in TransformTypos::TransformTypoExpr would be more efficient as a single insert-and-check (by avoiding two separate lookups into the set):
>>>>>>>>>>>>   if (TypoExprs.insert(E).second) // means this element previously didn't exist in the set, and now does
>>>>>>>>>>>>
>>>>>>>>>>>> I've changed the count-then-insert to just an insert since
>>>>>>>>>>> SmallPtrSetImpl::insert returns true when the item wasn't already in the
>>>>>>>>>>> set.
>>>>>>>>>>>
>>>>>>>>>>>> This:
>>>>>>>>>>>>
>>>>>>>>>>>>   TransformCache.find(E) != TransformCache.end()
>>>>>>>>>>>>
>>>>>>>>>>>> might commonly be written as:
>>>>>>>>>>>>
>>>>>>>>>>>>   TransformCache.count(E)
>>>>>>>>>>>>
>>>>>>>>>>>> Optionally with "!= 0", but often count is used directly as a boolean test, especially when it's a single set/map (not a multi set/map).
>>>>>>>>>>>>
>>>>>>>>>>>> Changed.
>>>>>>>>>>>
>>>>>>>>>>>> This comment didn't quite make sense to me:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +      // If corrections for the first TypoExpr have been exhausted, retry them
>>>>>>>>>>>> +      // against the next combination of substitutions for all of the other
>>>>>>>>>>>> +      // TypoExprs.
>>>>>>>>>>>>
>>>>>>>>>>>> When you say "retry them" which them do you mean? The corrections for the first TypoExpr? But what does it mean for them to be exhausted then, if they can still be considered?
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe I need a bit of help with the higher level algorithm here - I assume the idea is:
>>>>>>>>>>>>
>>>>>>>>>>>> Call transform.
>>>>>>>>>>>> As it visits TypoExpr nodes, it tries their first typo candidate available.
>>>>>>>>>>>> If we get to the end and have invalid code, reset the state of all the typo corrections except the first one (on its next query, it'll give back its second alternative).
>>>>>>>>>>>>
>>>>>>>>>>>> Here is where the confusion is at. As the transform visits each
>>>>>>>>>>> TypoExpr node it tries the first candidate available. If the first
>>>>>>>>>>> candidates of all the TypoExprs don't work, it then tries the next (and
>>>>>>>>>>> subsequent) candidates of only the first TypoExpr that was seen. When it
>>>>>>>>>>> exhausts the candidates from the first TypoExpr, it then resets the stream
>>>>>>>>>>> of candidates for the first TypoExpr and fetches the next candidate of the
>>>>>>>>>>> TypoExpr, continuing in this way until all combinations of candidates of
>>>>>>>>>>> all TypoExprs have been tried.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So I'm still a bit confused about which parts of the retry logic
>>>>>>>>>> are in which parts of the code.
>>>>>>>>>>
>>>>>>>>>> OK, I think I've got it. Transform is called and we don't even
>>>>>>>>>> know which TypoExprs are in the Expr. So we transform.
>>>>>>>>>>
>>>>>>>>>> As transform runs it comes across TypoExprs, and attempts to
>>>>>>>>>> transform them. For each one, it tries candidates until it finds a good one
>>>>>>>>>> (based on the immediate context - what are the cases where
>>>>>>>>>> BuildDeclarationNameExpr fails here? (when does NE.isInvalid fail?)).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think there is a bit of confusion on what a "retry" is. Within
>>>>>>>>> TransformTypoExpr, for either the first TypoExpr ever encountered by this
>>>>>>>>> TreeTransform or for a TypoExpr without a cached substitution, it tries to
>>>>>>>>> build a valid subexpression from the TypoCorrection candidates (e.g. a
>>>>>>>>> DeclRefExpr) and continues trying candidates until it finds one for which a
>>>>>>>>> valid subexpression can be formed, which it caches and returns. If none of
>>>>>>>>> the candidates worked, it will cache and return an invalid ExprResult. The
>>>>>>>>> TreeTransform then takes the returned subexpression and tries to rebuild
>>>>>>>>> the full expression (possibly trying to transform multiple TypoExprs along
>>>>>>>>> with CallExprs, etc).
>>>>>>>>>
>>>>>>>>> I don't remember off the top of my head without looking at the
>>>>>>>>> body, but I think BuildDeclarationNameExpr will fail in cases where the
>>>>>>>>> correction is e.g. a type name, or a member reference name--basically,
>>>>>>>>> BuildDeclarationNameExpr returns an ExprResult and the correct behavior in
>>>>>>>>> principle is to check for an invalid result and try the next correction
>>>>>>>>> candidate when one is seen--even if the Consumer never returns a candidate
>>>>>>>>> for which BuildDeclarationNameExpr would fail.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So, in your example below, maybe the right answer is BEH, but AF
>>>>>>>>>> is valid, with no correct candidate for the 3 choice.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, AF with no correction for the 3rd TypoExpr would result in the
>>>>>>>>> entire Expr result being invalid within the main Transform method.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Consumers start at "before the first" element, calling
>>>>>>>>>> "getNextCorrection" on a new consumer gives you the first correction.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That is correct. And the "before the first" element is also an
>>>>>>>>> empty correction for use with getCurrentCorrection. ;)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On the first call to TransformExpr, the first call to
>>>>>>>>>> TransformTypoExpr chooses A, records it, continues.
>>>>>>>>>> Gets to the second TransformTypoExpr, tries D and E, succeeds at
>>>>>>>>>> F.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> it would only try D then E then F within a single call to
>>>>>>>>> TransformTypoExpr if BuildDeclarationNameExpr failed for D and E.
>>>>>>>>>
>>>>>>>>>> On the third TransformTypoExpr, it tries H and I, both fail, so
>>>>>>>>>> it returns from the last line ("return TransformCache[E] = ExprError]).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Same case here--that last line would only be reached if H and I
>>>>>>>>> both failed within the call to BuildDeclarationNameExpr.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We get back to Transform with a failure. So we walk the TypoExprs
>>>>>>>>>> (in CheckAndAdvanceTypoExprCorrectionStreams) we reset the 3rd consumer
>>>>>>>>>> (because it was finished).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, CheckAndAdvanceTypoExprCorrectionStreams would do nothing if
>>>>>>>>> the first TypoExpr is not finished. If the first TypoExpr is finished, then
>>>>>>>>> it would reset it and clear the cached result for the second TypoExpr (so
>>>>>>>>> that the next call to TransformTypoExpr on the second TypoExpr would try a
>>>>>>>>> new candidate instead of using the cached result). Only if both the first
>>>>>>>>> and second TypoExprs were *both* finished would the cached value of the
>>>>>>>>> third TypoExpr be cleared (and the candidate streams for both the first and
>>>>>>>>> second TypoExprs would be reset in addition to their cached results being
>>>>>>>>> removed from the cache). The only way
>>>>>>>>> CheckAndAdvanceTypoExprCorrectionStreams would see if the third TypoExpr
>>>>>>>>> was finished would be if both the first and second TypoExprs were also
>>>>>>>>> finished, and upon resetting it the tree transform would only be attempted
>>>>>>>>> again if there were at least four known TypoExprs (i.e. the number of known
>>>>>>>>> TypoExprs is greater than the number of TypoExprs that were finished and
>>>>>>>>> had to be reset by CheckAndAdvanceTypoExprCorrectionStreams).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We transform again:
>>>>>>>>>>
>>>>>>>>>> First typo) We are at A so we call next and get B
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes. The first TypoExpr would always have it's next candidate
>>>>>>>>> tried by a call to TransformTypoExpr.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Second typo) From F we continue and try G, fail, then we're
>>>>>>>>>> finished for the second typo.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, G would only be tried if the result from transforming F was
>>>>>>>>> removed from the cache.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Third typo) Perhaps we find some solution for this, or not, it
>>>>>>>>>> doesn't matter.
>>>>>>>>>>
>>>>>>>>>> We get back to Transform with a failure (second typo), So we walk
>>>>>>>>>> the TypoExprs again, resetting the second (and maybe the third).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If there are only three TypoExprs, then resetting the third
>>>>>>>>> TypoExpr would mean there are no valid combinations of corrections that
>>>>>>>>> yield a valid expression (where the final result of the transform is
>>>>>>>>> neither an invalid ExprResult nor caused any errors to be reported by the
>>>>>>>>> SFINAETrap).
>>>>>>>>>
>>>>>>>>
>>>>>>>>>> First typo) We are at B so we call next and get to C, let's say
>>>>>>>>>> we fail here - C isn't even remotely valid.
>>>>>>>>>> Now it really matters whether we find/fail on the second and
>>>>>>>>>> third typo. If we fail on the second and third... we're going to give up.
>>>>>>>>>> We'll find that all the corrections are finished and we'll conclude that
>>>>>>>>>> we've tried everything.
>>>>>>>>>>
>>>>>>>>>> Am I following this right?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Not quite. I think you are forgetting about the fact that, for all
>>>>>>>>> TypoExprs except the first, a cached result is used if it exists instead of
>>>>>>>>> fetching a new candidate from the Consumer and trying to build a valid Expr
>>>>>>>>> with it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ah, looking at the code again I see what you mean. The 'break' is
>>>>>>>> the bit that I missed in the reset/advance loop.
>>>>>>>>
>>>>>>>> Which patches would I have to apply to ToT to be able to exercise
>>>>>>>> this code? It might help for me to step through some of these scenarios in
>>>>>>>> a debugger.
>>>>>>>>
>>>>>>>
>>>>>>> To exercise this code, you'd need the full patch series (all 8) to
>>>>>>> exercise this code because it's not until patch #8 that anything actually
>>>>>>> generates TypoExprs in the AST. I can send you a combined patch rebased on
>>>>>>> ToT off-list if you'd like (it's ~112KB uncompressed), and the new test
>>>>>>> cases in test/SemaCXX/typo-correction-delayed.cpp explicitly test this
>>>>>>> functionality.
>>>>>>>
>>>>>>
>>>>>> That'd be great!
>>>>>>
>>>>>
>>>>> With your total patch applied I've been able to reproduce a few typo
>>>>> correction scenarios - thanks!
>>>>>
>>>>> But I can't seem to reach/exercise the loop in
>>>>> TransformTypos::TransformTypoExpr - if I change the "if (!NE.isInvalid())"
>>>>> condition at the end of the loop (the only way to repeat the body of the
>>>>> loop) to "assert(!NE.isInvalid())" it doesn't fire on any of the test cases
>>>>> (nor on my own modified versions). Do you have some tests that cover this?
>>>>> Could you show/describe a simple situation where it occurs so I can play
>>>>> around with that part of the logic?
>>>>>
>>>>
>>>> I'm not sure it can be triggered with CorrectTypoDelayed only hooked up
>>>> to LookupMemberExprInRecord (a static function called by
>>>> Sema::BuildMemberReferenceExpr directly and indirectly via
>>>> LookupMemberExpr), but in the second patch set which hooks
>>>> CorrectTypoDelayed up to DiagnoseEmptyLookup, the assert is triggered by
>>>> the tests SemaCXX/type-definition-in-specifier.cpp
>>>> and SemaCXX/warn-thread-safety-parsing.cpp (in addition to the loop being
>>>> the correct/principled behavior since there is no guarantee that the
>>>> recovery callback will generate a valid ExprResult from a non-empty
>>>> TypoCorrection). For this patch set, to trigger the assert a code snippet
>>>> would have to correct a class member lookup and have a candidate that is a
>>>> member function, constant, variable, or template in the current class or
>>>> one of its base classes (per the RecordMemberExprValidatorCCC) yet which
>>>> yields causes MemberExprTypoRecovery to return an ExprError when trying to
>>>> build the member reference. Basically, it would require having the typo
>>>> correction try a candidate that is an accessible class member that cannot
>>>> be reference--something that I'm not sure is even possible in C++.
>>>>
>>>
>>> Hrm, OK - could you change it to an assertion (& remove the outer
>>> loop/replace it with a conditional) until that later patch where it becomes
>>> necessary, so it'll be easier to review it in the context of it being
>>> executed/used code?
>>>
>>
>> I understand your suggestion, however I don't think changing the code to
>> do the wrong thing and assert if a false assumption is violated is the
>> right thing to do. If the only source of an ExprResult in the loop was the
>> call to Sema::BuildDeclarationNameExpr, I'd be a bit more willing to do
>> so... but if a recovery handler callback is provided and invoked, there's
>> no guarantee it won't return an invalid ExprResult, and no documentation of
>> the assumption that the callback must return a valid ExprResult (outside of
>> the one assertion you are suggesting, buried within the TypoExpr handling
>> machinery). Additionally, changing the code to do the wrong thing by
>> changing the "while (...)" to "if (...)" and replacing "if
>> (!NE.isInvalid())" with a bogus assertion does nothing to simplify the
>> actual code block... particularly since it would immediately be changed
>> right back to the way it is now in the next patch set (a version of which I
>> had already mailed out two months ago).
>>
>
> My concern is that this is untested (& actually untestable/dead in this
> revision) code - if someone were to sync to this revision and try to
> understand what the code did, why it was there, and how it should be
> modified to do other things, they'd be confused & probably get it wrong
> (because there are no tests to demonstrate the motivation). It's nice for
> code to go along with the use - the usual rules of providing tests along
> with code.
>
> But I'll muddle through & try to remember to figure out the tests & the
> design when you commit/review the next patches if necessary.
>
>
>>
>> (FYI, I also just noticed I had a mistake in the full patch I sent you...
>> specifically I seem to have forgotten to delete a return statement when I
>> changed the code to look up the TypoExpr in the TransformCache only one and
>> assign the result to the created entry and rebased the rest of the patch
>> set:
>>
>> diff --git a/lib/Sema/SemaExprCXX.cpp b/lib/Sema/SemaExprCXX.cpp
>> index 81b3b01..9e05e82 100644
>> --- a/lib/Sema/SemaExprCXX.cpp
>> +++ b/lib/Sema/SemaExprCXX.cpp
>> @@ -6049,8 +6049,6 @@ public:
>>          if (!TC.isKeyword())
>>            R.addDecl(TC.getCorrectionDecl());
>>          NE = SemaRef.BuildDeclarationNameExpr(CXXScopeSpec(), R, false);
>> -        return TransformCache[E] =
>> -                   SemaRef.BuildDeclarationNameExpr(CXXScopeSpec(), R,
>> false);
>>        }
>>        assert(!NE.isUnset() &&
>>               "Typo was transformed into a valid-but-null ExprResult");
>>
>> So the "if (!NE.isUnset())" is intended to be the only early return
>> within the loop, though it still does not fail as an assert within the
>> patchset.)
>>
>>>
>>> Without that wrinkle, I think I've followed the code enough to sign off
>>> on it. Please commit.
>>>
>>> [I do find the way the algorithm visits the corrections to be a little
>>> hard to follow, in that its split between the TypoExprs cache, consumer
>>> state, TransformTypoExpr, and CheckAndAdvanceTypoExprCorrectionStreams - in
>>> some very abstract theory I would like having each sequence of typo
>>> candidates (the Consumers represent this) expose an iterator pair, then a
>>> generic utility to do combinations of a list of ranges and it would expose
>>> an iterator over those combinations which you could just increment to get
>>> the next one - it'd separate out the combination logic into a reusable
>>> component that could be easily examined/tested, then composed with the typo
>>> correction stuff to keep the logic there relatively simple/clearer.
>>>
>>> I might try to work up an example of what I mean]
>>>
>>>
>>>
>>>
>>>>
>>>>> - David
>>>>>
>>>>> (micro review comment I saw along the way: I /think/ CheckAndAdvanceTypoExprCorrectionStreams
>>>>> could be simplified as follows:
>>>>>
>>>>
>>>> You're right, it can be simplified now that the loop is in its own
>>>> function. Changed.
>>>>
>>>>>
>>>>>   bool CheckAndAdvanceTypoExprCorrectionStreams() {
>>>>>     for (auto TE : TypoExprs) {
>>>>>       TransformCache.erase(TE);
>>>>>       auto &State = SemaRef.getTypoExprState(TE);
>>>>>       if (!State.Consumer->finished())
>>>>>         return true;
>>>>>       State.Consumer->resetCorrectionStream();
>>>>>     }
>>>>>     return false;
>>>>>   }
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> (I'm sort of wondering if this might be better expressed by some
>>>>>>>> iterator scheme, but I'm not sure - there's no pretty next_combination
>>>>>>>> algorithm (someone proposed one to boost at some point, it seems, but
>>>>>>>> that's about as close as it's got to anything common/generic) anyway)
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> So for 3 TypoExprs... let's say 1 has the candidates (A, B, C),
>>>>>>>>>>> 2 has the candidates (D, E, F, G) and 3 has the candidates (H, I) the
>>>>>>>>>>> iterations would test the combinations in order:
>>>>>>>>>>>
>>>>>>>>>>> A D H
>>>>>>>>>>> B D H
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I suppose a simpler form of my above example is: if, on the first
>>>>>>>>>> pass, we get ADH, on the next pass the code as-is would result in BCI,
>>>>>>>>>> wouldn't it? It'll "next" each of the corrections.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (I think you meant BEI, and C is only a candidate of the first
>>>>>>>>> TypoExpr, not the second one.) No it won't because of the caching. The
>>>>>>>>> second TypoExpr is only effectively "next"ed once all of the candidates for
>>>>>>>>> the first TypoExpr have been tried with the current candidate of the second
>>>>>>>>> TypoExpr, and the third TypoExpr would only be "next"ed if all of the
>>>>>>>>> candidates of both the first and second TypoExpr have been tried with the
>>>>>>>>> current candidate of the third TypoExpr.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And I'm still not quite understanding why there's the  E !=
>>>>>>>>>> TypoExprs[0] condition... maybe that's part of what I'm missing. It looks
>>>>>>>>>> like that code would cause, in a single Transform, multiple instances of
>>>>>>>>>> the first typo to be computed with different/subsequent typo results...
>>>>>>>>>> which seems confusing/strange. (though I don't fully understand how/where
>>>>>>>>>> multiple instances of the same TypoExpr will be visited in a single
>>>>>>>>>> Transform)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That condition ensures that the cached result is never used for
>>>>>>>>> the first TypoExpr (instead of having an extra conditional for all of the
>>>>>>>>> return statements below here to not cache the result for the first
>>>>>>>>> TypoExpr).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (some nitpicks of the code - there's a duplicate lookup of
>>>>>>>>>> TransformCache (count then [], or count then insert further down the
>>>>>>>>>> algorithm - it should probably just be:
>>>>>>>>>>
>>>>>>>>>> ExprResult &CacheEntry = TransformCache[E];
>>>>>>>>>> if (CacheEntry)
>>>>>>>>>>   return CacheEntry;
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This wouldn't work because "TransformCache[E]" would add a cache
>>>>>>>>> entry for E, and either the "if (CacheEntry)" would always be true, or an
>>>>>>>>> ExprError() would be considered false and an ExprError cache entry would
>>>>>>>>> always be ignored, both of which would be bad.
>>>>>>>>>
>>>>>>>>
>>>>>>>> you're right that ExprResult isn't boolean testable - but it is
>>>>>>>> possible to distinguish ExprError from a default-constructed ExprResult. A
>>>>>>>> default constructed ExprResult is in the valid-but-unset state, so far as I
>>>>>>>> can tell, so it would look like:
>>>>>>>>
>>>>>>>> if (!CacheEntry.isUnset())
>>>>>>>>   return CacheEntry
>>>>>>>>
>>>>>>>> (which is, admittedly, not the /most/ legible thing ever)
>>>>>>>>
>>>>>>>
>>>>>>> I'm worried this won't work, or at least would be (potentially)
>>>>>>> fragile, because .isUnset() is defined as "valid-but-null" and Sema sets
>>>>>>> valid-but-null as an explicit state in the return values of various
>>>>>>> functions--at least a few of which I saw while looking at users of the typo
>>>>>>> correction code.
>>>>>>>
>>>>>>
>>>>>> Ooh... I could totally see that happening (& thanks for the
>>>>>> explanation).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Of course, a new local variable could be created containing the
>>>>>>>>> result of TransformCache.find(E) and then compared to TransformCache.end()
>>>>>>>>> and returning as appropriate, but I find it to be more verbose and less
>>>>>>>>> clear:
>>>>>>>>>
>>>>>>>>> auto CacheEntry = TransformCache.find(E);
>>>>>>>>> if (!TypoExprs.insert(E) && CacheEntry != TransformCache.end())
>>>>>>>>>   return CacheEntry->second;
>>>>>>>>>
>>>>>>>>> And neither of the subsequent return statements would be changed
>>>>>>>>> because the cache entry would still need to be created...
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you were using the iterator version, they could be hinted
>>>>>>>> insertions which might be better - but I agree, if you're just using the
>>>>>>>> iterators it's probably not terribly helpful. But as shown above, you can
>>>>>>>> actually just use the op[] at the start, check !isUnset, then initialize
>>>>>>>> the elements. This is a fairly common idiom in map related code, though
>>>>>>>> more often it looks like
>>>>>>>>
>>>>>>>
>>>>>>> After a bit more consideration, I've taken your suggested approach
>>>>>>> of using .isUnset() and added an assertion on trying to cache an "unset"
>>>>>>> value ("assert(!NE.isUnset() && ...);").
>>>>>>>
>>>>>>
>>>>>> Sounds like a fair tradeoff - if the assertion does fire we can add a
>>>>>> test case for it & switch back to something like your original formulation.
>>>>>>
>>>>>> The other gotcha to be aware of is that since it's a reference into a
>>>>>> DenseMap are invalidated by insertions - so if any of the work to
>>>>>> initialize the TransformCache (the while nextCorrection loop, and probably
>>>>>> most importantly the "BuildDeclarationNameExpr" call) can also modify the
>>>>>> TransformCache (by transforming another typo expr) then the reference will
>>>>>> be invalidated and the approach I'd suggested won't be applicable (you
>>>>>> could still keep just one lookup in the cache hit case, but in the cache
>>>>>> miss you'd have to do a second lookup after the work)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>>     return CacheEntry = NE;
>>>>>>>>>> }
>>>>>>>>>> return CacheEntry = ExprError();
>>>>>>>>>>
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> C D H
>>>>>>>>>>> A E H
>>>>>>>>>>> B E H
>>>>>>>>>>> C E H
>>>>>>>>>>> A F H
>>>>>>>>>>> B F H
>>>>>>>>>>> C F H
>>>>>>>>>>> A G H
>>>>>>>>>>> B G H
>>>>>>>>>>> C G H
>>>>>>>>>>> A D I
>>>>>>>>>>> B D I
>>>>>>>>>>> C D I
>>>>>>>>>>> A E I
>>>>>>>>>>> B E I
>>>>>>>>>>> C E I
>>>>>>>>>>> A F I
>>>>>>>>>>> B F I
>>>>>>>>>>> C F I
>>>>>>>>>>> A G I
>>>>>>>>>>> B G I
>>>>>>>>>>> C G I
>>>>>>>>>>>
>>>>>>>>>>>> If we get to the end, have invalid code, and the first typo has no alternatives left to try - I guess we just want to print errors for everything, without typo corrections?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> If all of the combinations of typo corrections failed to create
>>>>>>>>>>> an acceptable Expr, then the current correction for all of TypoExprs would
>>>>>>>>>>> be an empty TypoCorrection which would cause the corresponding diagnostic
>>>>>>>>>>> handlers to print out errors without suggestions.
>>>>>>>>>>>
>>>>>>>>>>>> It might be helpful to split out some of these chunks (the state resetting code and maybe the diagnostic printing at the end) into separate functions to give them logical names & make the higher level algorithm more clear?
>>>>>>>>>>>>
>>>>>>>>>>>> I've split the diagnostics emission and the state handling into
>>>>>>>>>>> separate helpers, and expanded their description comments. I'm not keen on
>>>>>>>>>>> the name for the state handler but it is the best name I could come up with
>>>>>>>>>>> that was even remotely descriptive.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Roughly works, though I'm still quite confused, it seems.
>>>>>>>>>> (alternative name for that function would be "reset" rather than "advance"
>>>>>>>>>> - but once I understand the algorithm better I might have better
>>>>>>>>>> suggestions)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The function removes the cached result for the first TypoExpr then
>>>>>>>>> checks the TypoExpr's correction candidate stream; if it has reached the
>>>>>>>>> end of the stream, it will reset the stream then perform the same check for
>>>>>>>>> the second TypoExpr, continuing until it either has reset the correction
>>>>>>>>> streams for all of the TypoExprs or has found a TypoExpr that hadn't
>>>>>>>>> reached the end of its correction stream. And in the former case, it would
>>>>>>>>> signal that all combinations of corrections had been retried and the
>>>>>>>>> TreeTransform should stop searching for a valid combination of corrections.
>>>>>>>>> And now that I think about it, the "E != TypoExprs[0]" is probably now a
>>>>>>>>> superfluous check since calling CheckAndAdvanceTypoExprCorrectionStreams
>>>>>>>>> would always remove the first TypoExpr from the result cache.
>>>>>>>>>
>>>>>>>>> The loop at the end ("for (auto E : TypoExprs) {") is over a hashing data structure, right? Is that going to cause unstable output of diagnostics? (eg: different machines, different memory layout, the same program might get the typos printed in a different order) or is there something else that will fix that?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> If you have to change the set to be a SetVector, or similar (to stabilize output) - then you won't need the FirstTypoExpr anymore - it'll just be the first one in the set, since you iterate in insertion order at that point.
>>>>>>>>>>>>
>>>>>>>>>>>> Switched it from a SmallPtrSet to a SmallSetVector like it
>>>>>>>>>>> should have been. Doing so also allowed me to simplify the state handling
>>>>>>>>>>> code by being able to assume a deterministic order, and eliminating
>>>>>>>>>>> FirstTypoExpr.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> "while (TypoCorrection TC = State.Consumer->getNextCorrection()) {" doesn't seem to be a loop - it has an unconditional return at the end of the body. Should it just be an 'if'?
>>>>>>>>>>>>
>>>>>>>>>>>> There was actually a missing check on whether the result of
>>>>>>>>>>> Sema::BuildDeclarationNameExpr was valid.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Were there missing/added test cases to cover this situation?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No because at this point in the patch series, there is nothing
>>>>>>>>> that actually generates TypoExprs.
>>>>>>>>>
>>>>>>>>>> "+ (FullExpr.get()->isTypeDependent() ||
>>>>>>>>>>>>
>>>>>>>>>>>> +       FullExpr.get()->isValueDependent() ||
>>>>>>>>>>>> +       FullExpr.get()->isInstantiationDependent())) {"
>>>>>>>>>>>>
>>>>>>>>>>>> What are those checks for? I don't know what condition we would be in if we had typo corrections but it wasn't type, value, or instantiation dependent. Perhaps a comment could explain that?
>>>>>>>>>>>>
>>>>>>>>>>>> The check is to avoid running the tree transform on Expr trees
>>>>>>>>>>> that are known to not have a TypoExpr in them, regardless of whether the
>>>>>>>>>>> current expression evaluation context still indicates there are uncorrected
>>>>>>>>>>> typos. I've added a comment to that effect.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hmm - I suppose my question might then be: When would we have
>>>>>>>>>> NumTypos > 0 but be in a situation where the full expression cannot have
>>>>>>>>>> any typos in it?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't remember off the top of my head because it's been a few
>>>>>>>>> months now since I wrote the original patches... but basically any
>>>>>>>>> situation where more than one FullExpr shares an expression evaluation
>>>>>>>>> context or when the current expression evaluation context had its NumTypos
>>>>>>>>> increased by an expression evaluation context that was popped off the
>>>>>>>>> stack. And TreeTransforms aren't cheap enough to blindly run on every Expr
>>>>>>>>> when NumTypos is non-zero.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> To simplify Sema::clearDelayedTypo you could add MapVector::erase(const KeyType&) (given that std::map has such an operation, it seems like a reasonable one for MapVector to have).
>>>>>>>>>>>>
>>>>>>>>>>>> Done (simple patch attached).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Great - if you'd like to commit this separately, it should have a
>>>>>>>>>> unit test. Otherwise it's probably simple enough to commit along with your
>>>>>>>>>> new usage if you prefer.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Committed with unit test as r219240.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sema::getTypoExprState looks like it could be a const function
>>>>>>>>>>>> (it looks like it's never meant to be called when the element isn't already
>>>>>>>>>>>> in the map) - using 'lookup' instead of operator[] will be a const
>>>>>>>>>>>> operation on the map (& assert if the element doesn't exist in the map).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sadly I cannot do that because the operator[] cannot be used if
>>>>>>>>>>> the method is made const, and .lookup() can't be used because it returns
>>>>>>>>>>> by-value instead of by-reference and TypoExprState is not copyable:
>>>>>>>>>>>
>>>>>>>>>>> ../tools/clang/lib/Sema/SemaLookup.cpp:4600:10: warning:
>>>>>>>>>>> returning reference to local temporary object [-Wreturn-stack-address]
>>>>>>>>>>>   return DelayedTypos.lookup(TE);
>>>>>>>>>>>
>>>>>>>>>>> ../include/llvm/ADT/MapVector.h:88:30: error: call to
>>>>>>>>>>> implicitly-deleted copy constructor of 'const clang::Sema::TypoExprState'
>>>>>>>>>>>     return Pos == Map.end()? ValueT() :
>>>>>>>>>>> Vector[Pos->second].second;
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oh, weird - my mistake in assuming/misremembering what lookup
>>>>>>>>>> actually does. I'd still make getTypoExprState const and use find+assert
>>>>>>>>>> just to constrain the interface a bit more rather than leaving it possible
>>>>>>>>>> to accidentally create a new entry in the map when the API is only intended
>>>>>>>>>> to query existing entries.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Done. The new changes I've made overall are:
>>>>>>>>>
>>>>>>>>> diff --git a/include/clang/Sema/Sema.h b/include/clang/Sema/Sema.h
>>>>>>>>> index b30bf5e..f644fdd 100644
>>>>>>>>> --- a/include/clang/Sema/Sema.h
>>>>>>>>> +++ b/include/clang/Sema/Sema.h
>>>>>>>>> @@ -2628,7 +2628,7 @@ private:
>>>>>>>>>                               bool ErrorRecovery, bool
>>>>>>>>> &IsUnqualifiedLookup);
>>>>>>>>>
>>>>>>>>>  public:
>>>>>>>>> -  const TypoExprState &getTypoExprState(TypoExpr *TE);
>>>>>>>>> +  const TypoExprState &getTypoExprState(TypoExpr *TE) const;
>>>>>>>>>
>>>>>>>>>    /// \brief Clears the state of the given TypoExpr.
>>>>>>>>>    void clearDelayedTypo(TypoExpr *TE);
>>>>>>>>> diff --git a/lib/Sema/SemaExprCXX.cpp b/lib/Sema/SemaExprCXX.cpp
>>>>>>>>> index 73aa997..1f08276 100644
>>>>>>>>> --- a/lib/Sema/SemaExprCXX.cpp
>>>>>>>>> +++ b/lib/Sema/SemaExprCXX.cpp
>>>>>>>>> @@ -6028,7 +6028,7 @@ public:
>>>>>>>>>      // If the TypoExpr hasn't been seen before, record it.
>>>>>>>>> Otherwise, return the
>>>>>>>>>      // cached transformation result if there is one and the
>>>>>>>>> TypoExpr isn't the
>>>>>>>>>      // first one that was encountered.
>>>>>>>>> -    if (!TypoExprs.insert(E) && E != TypoExprs[0] &&
>>>>>>>>> TransformCache.count(E)) {
>>>>>>>>> +    if (!TypoExprs.insert(E) && TransformCache.count(E)) {
>>>>>>>>>        return TransformCache[E];
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> diff --git a/lib/Sema/SemaLookup.cpp b/lib/Sema/SemaLookup.cpp
>>>>>>>>> index 8ddb5fd..bb6c76a 100644
>>>>>>>>> --- a/lib/Sema/SemaLookup.cpp
>>>>>>>>> +++ b/lib/Sema/SemaLookup.cpp
>>>>>>>>> @@ -4601,8 +4601,11 @@ TypoExpr
>>>>>>>>> *Sema::createDelayedTypo(std::unique_ptr<TypoCorrectionConsumer> TCC,
>>>>>>>>>    return TE;
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> -const Sema::TypoExprState &Sema::getTypoExprState(TypoExpr *TE) {
>>>>>>>>> -  return DelayedTypos[TE];
>>>>>>>>> +const Sema::TypoExprState &Sema::getTypoExprState(TypoExpr *TE)
>>>>>>>>> const {
>>>>>>>>> +  auto Entry = DelayedTypos.find(TE);
>>>>>>>>> +  assert(Entry != DelayedTypos.end() &&
>>>>>>>>> +         "Failed to get the state for a TypoExpr!");
>>>>>>>>> +  return Entry->second;
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>>  void Sema::clearDelayedTypo(TypoExpr *TE) {
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Otherwise this all looks like rather neat stuff - though I'm
>>>>>>>>>>>> not as able to review in detail some of the refactoring of existing typo
>>>>>>>>>>>> correction stuff. It looks plausible.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I've attached the updated patch. Thanks for reviewing this,
>>>>>>>>>>> David!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 25, 2014 at 2:00 PM, Kaelyn Takata <rikka at google.com
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> ping
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 26, 2014 at 11:05 AM, Kaelyn Takata <
>>>>>>>>>>>>> rikka at google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +dblaikie
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jul 31, 2014 at 1:20 PM, Kaelyn Takata <
>>>>>>>>>>>>>> rikka at google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Part of the infrastructure is a map from a TypoExpr to the
>>>>>>>>>>>>>>> Sema-specific
>>>>>>>>>>>>>>> state needed to correct it, along with helpers to ease
>>>>>>>>>>>>>>> dealing with the
>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The the typo count is propagated up the stack of
>>>>>>>>>>>>>>> ExpressionEvaluationContextRecords when one is popped off of
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> avoid accidentally dropping TypoExprs on the floor. For
>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>> the attempted correction of g() in
>>>>>>>>>>>>>>> test/CXX/class/class.mem/p5-0x.cpp
>>>>>>>>>>>>>>> happens with an ExpressionEvaluationContextRecord that is
>>>>>>>>>>>>>>> popped off
>>>>>>>>>>>>>>> the stack prior to ActOnFinishFullExpr being called and the
>>>>>>>>>>>>>>> tree
>>>>>>>>>>>>>>> transform for TypoExprs being run.
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>  include/clang/Sema/Sema.h           |  44 +++++
>>>>>>>>>>>>>>>  include/clang/Sema/SemaInternal.h   |  15 +-
>>>>>>>>>>>>>>>  include/clang/Sema/TypoCorrection.h |   2 +-
>>>>>>>>>>>>>>>  lib/Sema/SemaExpr.cpp               |   7 +
>>>>>>>>>>>>>>>  lib/Sema/SemaExprCXX.cpp            | 108 ++++++++++++
>>>>>>>>>>>>>>>  lib/Sema/SemaLookup.cpp             | 316
>>>>>>>>>>>>>>> ++++++++++++++++++++++++------------
>>>>>>>>>>>>>>>  6 files changed, 384 insertions(+), 108 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141023/3a1c93c5/attachment.html>


More information about the cfe-commits mailing list