[PATCH v3 5/9] Start adding the infrastructure for handling TypoExprs.

Tue Jul 29 10:51:59 PDT 2014

On Fri, Jul 25, 2014 at 2:27 PM, Richard Smith <richard at metafoo.co.uk>
wrote:

> On Mon, Jul 14, 2014 at 4:55 PM, Kaelyn Takata <rikka at google.com> wrote:
>
>>
>> The the typo count is propagated up the stack of
>> ExpressionEvaluationContextRecords when one is popped off of to
>> avoid accidentally dropping TypoExprs on the floor. For example,
>> the attempted correction of g() in test/CXX/class/class.mem/p5-0x.cpp
>> happens with an ExpressionEvaluationContextRecord that is popped off
>> the stack prior to ActOnFinishFullExpr being called and the tree
>> transform for TypoExprs being run.
>>
>
> FWIW, that case sounds like a bug: finalizing the full-expression should
> happen within the relevant ExpressionEvaluationContext.
>

I don't know enough about what the intended mechanics/interactions of these
within clang is intended to be to say for sure, but my recent experiences
with them suggests it isn't a bug. As I understand it, the declaration of b
and its initialization based on the call to g() is the full expression
within an expression evaluation context, but the the initializer (the call
to g()) would not and should not include the declaration of b and so is
given its own (sub-)expression evaluation context doesn't span the full
expression or include the declaration of b.

Is a simple count of TypoExprs sufficient here? If we leave the local
> expression without diagnosing a typo (or if our RecursiveASTVisitor fails
> to diagnose it), I think we still need to track it so that we can produce
> some kind of diagnostic for it. In particular:
>
> +    assert(NumTypos == 0 && "There are outstanding typos after popping
> the "
> +                            "last ExpressionEvaluationContextRecord");
>
> ... I think there are probably cases where this assertion can fire, and
> the right thing to do seems to be to issue a basic 'unknown identifier'
> diagnostic for any remaining typos.
>

IIRC, the assertion has actually been useful when hooking up the delayed
typo correction to a primary caller of DiagnoseEmptyLookup in that it has
helped me find cases where ActOnFinishFullExpr aren't ever called. However,
now that Sema will be keeping track of all of the undiagnosed TypoExprs and
their associated state to remove the layering violations in TypoExpr, this
tracking of whether an Expr in a given evaluation context might need the
tree transform run on it needs to be reworked.

+  llvm::SmallVector<Expr *, 4> ExprStack;
>
> This seems rather small; I expect we'd walk through more Exprs than this.
> But... you don't actually use this. Can you remove it?
>

It is used in patch #8 for passing the parent Expr of the TypoExpr to the
recovery callback, so that e.g. the number of arguments the TypoExpr is
being called with can be known.

>
> +      // If corrections for the first TypoExpr have been exhausted, retry
> them
> +      // against the next combination of substitutions for all of the
> other
> +      // TypoExprs.
> +      if (FirstTypoExpr->Consumer->finished()) {
> +        unsigned numFinished = 1;
> +        for (auto TE : TypoExprs) {
> +          if (TE != FirstTypoExpr) {
> +            TransformCache.erase(TE);
> +            if (TE->Consumer->finished()) {
> +              ++numFinished;
> +              TE->Consumer->resetCorrectionStream();
> +            } else {
> +              break;
> +            }
> +          }
> +        }
> +        FirstTypoExpr->Consumer->resetCorrectionStream();
> +        if (numFinished >= TypoExprs.size())
> +          break;
> +      }
>
> It looks like the net result of this is that you'll try:
>
>  * Typo 1 correction 1, typo 2 correction 1, typo 3 correction 1
>  * Typo 1 correction 2, typo 2 correction 1, typo 3 correction 1
>  * ...
>  * Typo 1 correction 1, typo 2 correction 2, typo 3 correction 2
>  * Typo 1 correction 2, typo 2 correction 2, typo 3 correction 2
>  * ...
>
> ... and you'll never try typo 2 correction 2 with typo 3 correction 1.
> (Also, if typo 2 has 2 corrections and typo 3 has 3, you'll never consider
> the third correction for typo 3).
>

I believe you are forgetting to account for the TransformCache which keeps
the cached TypoExprs from advancing Unless I screwed up the logic, the net
result should be e.g. (assuming typo 1 has 3 candidates, typo 2 has 2, and
typo 3 has 4):

 * Typo 1 correction 1, typo 2 correction 1, typo 3 correction 1
 * Typo 1 correction 2, typo 2 correction 1, typo 3 correction 1
 * Typo 1 correction 3, typo 2 correction 1, typo 3 correction 1
 * Typo 1 correction 1, typo 2 correction 2, typo 3 correction 1
 * Typo 1 correction 2, typo 2 correction 2, typo 3 correction 1
 * Typo 1 correction 3, typo 2 correction 2, typo 3 correction 1
 * Typo 1 correction 1, typo 2 correction 1, typo 3 correction 2
 * Typo 1 correction 2, typo 2 correction 1, typo 3 correction 2
 * Typo 1 correction 3, typo 2 correction 1, typo 3 correction 2
 * Typo 1 correction 1, typo 2 correction 2, typo 3 correction 2
 * Typo 1 correction 2, typo 2 correction 2, typo 3 correction 2
 * Typo 1 correction 3, typo 2 correction 2, typo 3 correction 2
 * Typo 1 correction 1, typo 2 correction 1, typo 3 correction 3
 * Typo 1 correction 2, typo 2 correction 1, typo 3 correction 3
 * Typo 1 correction 3, typo 2 correction 1, typo 3 correction 3
 * Typo 1 correction 1, typo 2 correction 2, typo 3 correction 3
 * Typo 1 correction 2, typo 2 correction 2, typo 3 correction 3
 * Typo 1 correction 3, typo 2 correction 2, typo 3 correction 3
 * Typo 1 correction 1, typo 2 correction 1, typo 3 correction 4
 * Typo 1 correction 2, typo 2 correction 1, typo 3 correction 4
 * Typo 1 correction 3, typo 2 correction 1, typo 3 correction 4
 * Typo 1 correction 1, typo 2 correction 2, typo 3 correction 4
 * Typo 1 correction 2, typo 2 correction 2, typo 3 correction 4
 * Typo 1 correction 3, typo 2 correction 2, typo 3 correction 4

The code:
 1) notes the first TypoExpr encountered
 2) for each TypoExpr encountered, if is the first TypoExpr or it isn't in
the cache, goes through the standard logic for choosing a new candidate and
caches the result, otherwise returns the cached result
 3) then once the first TypoExpr is finished, resets it and removes the
next TypoExpr from the cache so a new result will be generated for that
TypoExpr, and if that TypoExpr is also finished resets it and removes the
following TypoExpr from the cache, repeating until a TypoExpr that wasn't
finished is removed from the cache or all TypoExprs were seen to be
finished.

Since this seems to not be as clear/obvious as I thought, shall I add a
brief explanation about how the retrying interacts with the caching to the
comment on the loop?

> I think we need to decide:
>  * how many different combinations of typo corrections we're prepared to
> try, and
>  * which order to search them in
>

Right now the code tries all possible combinations that passed validation,
in a depth-first manner.

>
> Due to the tree structure of the AST, we should be able to do better than
> trying all combinations; for instance, given:
>
>  f(typo1, g(typo2, typo3))
>
> ... we can figure out which combinations work for the call to g, and only
> try those combinations in the call to f.
>

Unless the tree transform code doesn't short-circuit the transform once an
ExprError is encountered, the existing code already does this to a large
degree (and only for the subset of possible corrections that CorrectTypo
would have returned anyway). Given that having more than one or two typos
in a single full-expression seems to be quite uncommon, I suspect that
trying to be any more clever about which sub-combinations do or don't work
to avoid trying all combinations in this error path will not yield
sufficient gains to warrant the additional design effort and code
complexity.

> @@ -5964,6 +6063,16 @@ ExprResult Sema::ActOnFinishFullExpr(Expr *FE,
> SourceLocation CC,
>        return ExprError();
>    }
>
> +  if (ExprEvalContexts.back().NumTypos &&
> +      (FullExpr.get()->isTypeDependent() ||
> +       FullExpr.get()->isValueDependent() ||
> +       FullExpr.get()->isInstantiationDependent())) {
> +    FullExpr = TransformTypos(*this).Transform(FullExpr.get());
> +    ExprEvalContexts.back().NumTypos = 0;
> +    if (FullExpr.isInvalid())
> +      return ExprError();
> +  }
>
> This is assuming that any call to TransformTypos::Transform will remove
> all TypoExprs within the entire ExpressionEvaluationContext. I am not sure
> that this will necessarily be true in general. It'd be more robust for
> TransformTypos to specify which TypoExprs it transformed, and to remove
> those from the ExpressionEvaluationContext or from a list of currently
> undiagnosed TypoExprs.
>

As mentioned, the count tracking needs to be reworked now that Sema will be
maintaining the set of undiagnosed TypoExprs anyway, and what I have in
mind for fixing it will also solve this problem (I knew the current count
tracking was quite fragile, but it was sufficient for hooking the delayed
typo correction to member lookup and I didn't want to put a lot of effort
into wiring up precise tracking of the count when large parts of the code
may be changing anyway--and with Sema now maintaining a map of undiagnosed
TypoExprs to their associated Sema-specific state, I'm glad I hadn't tried
to add a bunch of fancy typo count tracking).

>
> +/// \returns a new \c TypoExpr that will later be replaced in the AST
> with an
> +/// Expr representing the result of performing typo correction.
> +TypoExpr *Sema::CorrectTypoDelayed(
>
> It would be useful to document that the caller still needs to produce a
> diagnostic if this returns nullptr.
>

Actually, I think it may make more sense to invoke the
TypoDiagnosticGenerator with an empty correction before returning nullptr
instead of foisting that responsibility on the caller.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140729/b70c7e6a/attachment.html>