[PATCH v4 5/8] Start adding the infrastructure for handling TypoExprs.

Kaelyn Takata rikka at google.com
Mon Oct 6 15:46:18 PDT 2014


On Fri, Oct 3, 2014 at 11:45 AM, David Blaikie <dblaikie at gmail.com> wrote:

> Given that these are parts of a patch series, some of which have already
> been signed off on/I haven't looked at, it's a bit hard to review the
> overall design - but I believe you & Richard talked that over, so I'm not
> too worried about that (just explaining why my review feedback's going to
> be a bit narrow)
>
> +  if (NumTypos > 0 && !ExprEvalContexts.empty())
> +    ExprEvalContexts.back().NumTypos += NumTypos;
> +  else
> +    assert(NumTypos == 0 && "There are outstanding typos after popping the "
> +                            "last ExpressionEvaluationContextRecord");
>
> Could this be simplified? I think it's equivalent to:
>
>   if (!ExprEvalContexts.empty())
>     ExprEvalContexts.back().NumTypos += NumTypos;
>   else
>     assert(NumTypos == 0 ... );
>
> or:
>
>   assert(ExprEvalContexts.empty() || NumTypos != 0 && ...)
>   if (!ExprEvalContexts.empty())
>     ...
>
> Yes, the "NumTypos > 0" was unnecessary and has been removed. (FYI, the
second form you suggested doesn't quite work because the assert would fail
on the common case where ExprEvalContexts is not empty and there are no
typos.)

> There's an unnecessary semicolon at the end of the "while (true)" loop in TransformTypos::Transform
>
> Removed.

>
>
>
> I'd roll this up a bit:
>
>
> +      auto &State = SemaRef.getTypoExprState(TE);
> +      if (State.DiagHandler)
> +        (*State.DiagHandler)(State.Consumer->getCurrentCorrection());
>
> into:
>
>   if (auto *Handler = SemaRef.getTypoExprState(TE).DiagHandler)
>     (*DiagHandler)(State.Consumer->getCurrentCorrection());
>
> That rollup doesn't work because State would be unknown when accessing
State.Consumer in the argument to the DiagHandler.

> count in the condition, plus insert in the body of the if in TransformTypos::TransformTypoExpr would be more efficient as a single insert-and-check (by avoiding two separate lookups into the set):
>   if (TypoExprs.insert(E).second) // means this element previously didn't exist in the set, and now does
>
> I've changed the count-then-insert to just an insert since
SmallPtrSetImpl::insert returns true when the item wasn't already in the
set.

> This:
>
>   TransformCache.find(E) != TransformCache.end()
>
> might commonly be written as:
>
>   TransformCache.count(E)
>
> Optionally with "!= 0", but often count is used directly as a boolean test, especially when it's a single set/map (not a multi set/map).
>
> Changed.

> This comment didn't quite make sense to me:
>
>
> +      // If corrections for the first TypoExpr have been exhausted, retry them
> +      // against the next combination of substitutions for all of the other
> +      // TypoExprs.
>
> When you say "retry them" which them do you mean? The corrections for the first TypoExpr? But what does it mean for them to be exhausted then, if they can still be considered?
>
> Maybe I need a bit of help with the higher level algorithm here - I assume the idea is:
>
> Call transform.
> As it visits TypoExpr nodes, it tries their first typo candidate available.
> If we get to the end and have invalid code, reset the state of all the typo corrections except the first one (on its next query, it'll give back its second alternative).
>
> Here is where the confusion is at. As the transform visits each TypoExpr
node it tries the first candidate available. If the first candidates of all
the TypoExprs don't work, it then tries the next (and subsequent)
candidates of only the first TypoExpr that was seen. When it exhausts the
candidates from the first TypoExpr, it then resets the stream of candidates
for the first TypoExpr and fetches the next candidate of the TypoExpr,
continuing in this way until all combinations of candidates of all
TypoExprs have been tried. So for 3 TypoExprs... let's say 1 has the
candidates (A, B, C), 2 has the candidates (D, E, F, G) and 3 has the
candidates (H, I) the iterations would test the combinations in order:

A D H
B D H
C D H
A E H
B E H
C E H
A F H
B F H
C F H
A G H
B G H
C G H
A D I
B D I
C D I
A E I
B E I
C E I
A F I
B F I
C F I
A G I
B G I
C G I

> If we get to the end, have invalid code, and the first typo has no alternatives left to try - I guess we just want to print errors for everything, without typo corrections?
>
>
If all of the combinations of typo corrections failed to create an
acceptable Expr, then the current correction for all of TypoExprs would be
an empty TypoCorrection which would cause the corresponding diagnostic
handlers to print out errors without suggestions.

> It might be helpful to split out some of these chunks (the state resetting code and maybe the diagnostic printing at the end) into separate functions to give them logical names & make the higher level algorithm more clear?
>
> I've split the diagnostics emission and the state handling into separate
helpers, and expanded their description comments. I'm not keen on the name
for the state handler but it is the best name I could come up with that was
even remotely descriptive.

> The loop at the end ("for (auto E : TypoExprs) {") is over a hashing data structure, right? Is that going to cause unstable output of diagnostics? (eg: different machines, different memory layout, the same program might get the typos printed in a different order) or is there something else that will fix that?
>
>
> If you have to change the set to be a SetVector, or similar (to stabilize output) - then you won't need the FirstTypoExpr anymore - it'll just be the first one in the set, since you iterate in insertion order at that point.
>
> Switched it from a SmallPtrSet to a SmallSetVector like it should have
been. Doing so also allowed me to simplify the state handling code by being
able to assume a deterministic order, and eliminating FirstTypoExpr.

>
>
> "while (TypoCorrection TC = State.Consumer->getNextCorrection()) {" doesn't seem to be a loop - it has an unconditional return at the end of the body. Should it just be an 'if'?
>
> There was actually a missing check on whether the result of
Sema::BuildDeclarationNameExpr was valid.

> "+ (FullExpr.get()->isTypeDependent() ||
>
> +       FullExpr.get()->isValueDependent() ||
> +       FullExpr.get()->isInstantiationDependent())) {"
>
> What are those checks for? I don't know what condition we would be in if we had typo corrections but it wasn't type, value, or instantiation dependent. Perhaps a comment could explain that?
>
> The check is to avoid running the tree transform on Expr trees that are
known to not have a TypoExpr in them, regardless of whether the current
expression evaluation context still indicates there are uncorrected typos.
I've added a comment to that effect.

>
> To simplify Sema::clearDelayedTypo you could add MapVector::erase(const KeyType&) (given that std::map has such an operation, it seems like a reasonable one for MapVector to have).
>
> Done (simple patch attached).


> Sema::getTypoExprState looks like it could be a const function (it looks
> like it's never meant to be called when the element isn't already in the
> map) - using 'lookup' instead of operator[] will be a const operation on
> the map (& assert if the element doesn't exist in the map).
>

Sadly I cannot do that because the operator[] cannot be used if the method
is made const, and .lookup() can't be used because it returns by-value
instead of by-reference and TypoExprState is not copyable:

../tools/clang/lib/Sema/SemaLookup.cpp:4600:10: warning: returning
reference to local temporary object [-Wreturn-stack-address]
  return DelayedTypos.lookup(TE);

../include/llvm/ADT/MapVector.h:88:30: error: call to implicitly-deleted
copy constructor of 'const clang::Sema::TypoExprState'
    return Pos == Map.end()? ValueT() : Vector[Pos->second].second;


> Otherwise this all looks like rather neat stuff - though I'm not as able
> to review in detail some of the refactoring of existing typo correction
> stuff. It looks plausible.
>

I've attached the updated patch. Thanks for reviewing this, David!


>

>
On Thu, Sep 25, 2014 at 2:00 PM, Kaelyn Takata <rikka at google.com> wrote:
>
>> ping
>>
>> On Tue, Aug 26, 2014 at 11:05 AM, Kaelyn Takata <rikka at google.com> wrote:
>>
>>>
>>> +dblaikie
>>>
>>>
>>> On Thu, Jul 31, 2014 at 1:20 PM, Kaelyn Takata <rikka at google.com> wrote:
>>>
>>>>
>>>> Part of the infrastructure is a map from a TypoExpr to the Sema-specific
>>>> state needed to correct it, along with helpers to ease dealing with the
>>>> state.
>>>>
>>>> The the typo count is propagated up the stack of
>>>> ExpressionEvaluationContextRecords when one is popped off of to
>>>> avoid accidentally dropping TypoExprs on the floor. For example,
>>>> the attempted correction of g() in test/CXX/class/class.mem/p5-0x.cpp
>>>> happens with an ExpressionEvaluationContextRecord that is popped off
>>>> the stack prior to ActOnFinishFullExpr being called and the tree
>>>> transform for TypoExprs being run.
>>>> ---
>>>>  include/clang/Sema/Sema.h           |  44 +++++
>>>>  include/clang/Sema/SemaInternal.h   |  15 +-
>>>>  include/clang/Sema/TypoCorrection.h |   2 +-
>>>>  lib/Sema/SemaExpr.cpp               |   7 +
>>>>  lib/Sema/SemaExprCXX.cpp            | 108 ++++++++++++
>>>>  lib/Sema/SemaLookup.cpp             | 316
>>>> ++++++++++++++++++++++++------------
>>>>  6 files changed, 384 insertions(+), 108 deletions(-)
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141006/cb6777a1/attachment.html>
-------------- next part --------------
diff --git a/include/llvm/ADT/MapVector.h b/include/llvm/ADT/MapVector.h
index 4e1fc15..d82c579 100644
--- a/include/llvm/ADT/MapVector.h
+++ b/include/llvm/ADT/MapVector.h
@@ -147,6 +147,17 @@ public:
     return Next;
   }
 
+  /// \brief Remove all elements with the key value Key.
+  ///
+  /// Returns the number of elements removed.
+  size_type erase(const KeyT &Key) {
+    auto Iterator = find(Key);
+    if (Iterator == end())
+      return 0;
+    erase(Iterator);
+    return 1;
+  }
+
   /// \brief Remove the elements that match the predicate.
   ///
   /// Erase all elements that match \c Pred in a single pass.  Takes linear
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v5-0005-Start-adding-the-infrastructure-for-handling-Typo.patch
Type: text/x-patch
Size: 33590 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141006/cb6777a1/attachment.bin>


More information about the cfe-commits mailing list