[cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Thu Mar 24 14:37:17 PDT 2011

Still digesting your reply as time permits (and thank you for the feedback!)
but have one small question/issue regarding your suggestion of using a
SmallPtrSet for storing the BestResults: creating a SmallPtrSet requires
setting a fixed size for the set--how would one determine a reasonable
bounding size for the set?

Thanks,
Kaelyn

On Mon, Mar 14, 2011 at 10:28 AM, Douglas Gregor <dgregor at apple.com> wrote:

>
> On Mar 9, 2011, at 11:35 AM, Kaelyn Uhrain wrote:
>
> > I just finished up a new version of my patch earlier this week, which no
> longer uses the RecursiveASTVisitor (per chandlerc's suggestion).  I've
> rebased the patch against the current SVN and have attached it for you to
> test.  With the new version only the FixIt/typo.cpp and
> SemaObjC/synth-provisional-ivars.m tests are failing, and for those two I'm
> not entirely sure which parts of the behavior are correct/acceptable and
> which might be a bug in my changes.
>
> Definitely looking better! I don't have the time at the moment to re-run my
> performance tests, but I will comment on some aspects of your patch...
>
>
> Index: lib/Sema/SemaDecl.cpp
> ===================================================================
> --- lib/Sema/SemaDecl.cpp       (revision 127346)
> +++ lib/Sema/SemaDecl.cpp       (working copy)
> @@ -264,24 +264,24 @@
>
>   if (DeclarationName Corrected = CorrectTypo(Lookup, S, SS, 0, 0,
> CTC_Type)) {
>     if (NamedDecl *Result = Lookup.getAsSingle<NamedDecl>()) {
> +      std::string Suggestion = Result->getQualifiedNameAsString();
> +      // FIXME: Silly formatting rules make us have to quote strings
> ourselves
> +      // to keep quoting consistent between strings and IdentiferInfos.
> +      std::string QuotedSug = "'" + Suggestion + "'";
>
> getQualifiedNameAsString() isn't necessarily what we want here, because it
> always produces the fully-qualified name. That means it will overqualify
> when we're already in a namespace, e.g.,
>
>        namespace N {
>                namespace inner {
>                        class vector { /* ... */ };
>                }
>
>                void f() {
>                        vector v; // will suggest N::inner::vector, when
> inner::vector would suffice
>                }
>        }
>
> A similar issue is inline namespaces. For example, if one has:
>
>        namespace std {
>                inline namespace __1 {
>                        class vector { /* ... */ };
>                }
>        }
>
>        void f() {
>                vector v; // will suggest std::__1::vector, when std::vector
> would suffice
>        }
>
> It would be preferable to add some kind of function that computes the
> minimum qualification needed to get from the specified context (e.g.,
> CurContext, for unqualified name lookup) to the entity, since that's what
> the user would want to type.
>
> Incidentally, when testing this, I actually used
>
>        template<typename T> class vector { /* ... */ };
>
> in my examples, and got incorrect Fix-Its and diagnostics:
>
> t.cpp:7:4: error: no template named 'vector'; did you mean 'vector'?
>                        vector<int> v; // will suggest ...
>                        ^~~~~~
>                        vector
>
>
> +void TypoCorrectionConsumer::CheckEditDistanceAndStore(
> +    llvm::StringRef Name, DeclSpecPair DSPair, unsigned DistOffset) {
> +  using namespace std;
> +
>   // Compute an upper bound on the allowable edit distance, so that the
>   // edit-distance algorithm can short-circuit.
>   unsigned UpperBound = min(unsigned((Typo.size() + 2) / 3),
> BestEditDistance);
> @@ -2872,7 +2884,7 @@
>   // Compute the edit distance between the typo and the name of this
>   // entity. If this edit distance is not worse than the best edit
>   // distance we've seen so far, add it to the list of results.
> -  unsigned ED = Typo.edit_distance(Name, true, UpperBound);
> +  unsigned ED = Typo.edit_distance(Name, true, UpperBound) + DistOffset;
>   if (ED == 0)
>     return;
>
> The DistOffset should be added into the UpperBound, so that the
> edit_distance implementation can abort early if the resulting edit distance
> is going to be too high.
>
> +  // Only consider entities that haven't already been found.
> +  DeclContext *NDContext = ND->getDeclContext()->getPrimaryContext();
> +  iterator N = BestResults.find(ShortName);
> +  if (N != BestResults.end()) {
> +    for (DeclSpecList::iterator D = N->second.begin(), DEnd =
> N->second.end();
> +         D != DEnd; ++D) {
> +      if (NDContext == D->first) return;
> +    }
> +  }
>
> Should we use a SmallPtrSet for this, so that we get faster lookup if the
> BestResults set happens to get large?
>
> +void ScanNamespaceForCorrections(DeclContext *DC,
> +                                 TypoCorrectionConsumer &Consumer,
> +                                 unsigned DistanceOffset, bool DownOnly) {
> +
> +  if (DistanceOffset > Consumer.getBestEditDistance())
> +    return;
> +
> +  DeclContext *ParentDC = DC->getLookupParent();
> +
> +  for (DC = DC->getPrimaryContext(); DC != NULL; DC =
> DC->getNextContext()) {
> +    for (DeclContext::decl_iterator D = DC->decls_begin(),
> +         DEnd = DC->decls_end(); D != DEnd; ++D) {
> +      NamedDecl *ChildDecl = dyn_cast<NamedDecl>(D->getCanonicalDecl());
> +      NamespaceDecl *ChildND = dyn_cast_or_null<NamespaceDecl>(*D);
> +      if (ChildDecl)
> +        Consumer.FoundDecl(ChildDecl, NULL, false, DistanceOffset);
> +      if (ChildND)
> +        ScanNamespaceForCorrections(ChildND, Consumer,
> +                                    DistanceOffset + 1, true);
> +    }
> +  }
> +
> +  if (DownOnly || !ParentDC) return;
> +
> +  ScanNamespaceForCorrections(ParentDC, Consumer, DistanceOffset, false);
> +}
>
> This will end up walking most of the translation unit. It's better than the
> RecursiveASTVisitor, since it won't walk into function bodies or the
> definitions of classes, but there's still some redundant work. I'll make a
> concrete suggestion at the end of this reply.
>
> +    for (DeclSpecList::iterator DSI = I->second.begin(),
> +                                DSIEnd = I->second.end();
> +         DSI != DSIEnd; /* Increment in loop, */) {
> +      // Perform name lookup on this name.
> +      IdentifierInfo *Name = &Context.Idents.get(I->getKey());
> +      LookupPotentialTypoResult(*this, Res, Name, S, SS,
> +                                DSI->second ? DSI->first : MemberContext,
> +                                EnteringContext, CTC);
>
> Using DSI->first as the member context isn't quite what we want, because we
> don't want to turn an unqualified lookup into a member lookup. Rather, we
> want to turn  an unqualified lookup into a qualified lookup. So, I suggest
> that you form a new nested-name-specifier (e.g., a new "SS") that provides
> the minimum qualification needed to get from the current context over to the
> result we found (this is related to my first comment). If that works, you
> could write the results back to SS, so that our caller can go ahead and use
> that nested-name-specifier for the rest of semantic analysis as if the user
> had written it explicitly. (Naturally, callers would need some updating to
> understand that this could happen). The key here is that, on successfully
> returning an alternative result from typo correction, the name-lookup state
> looks *exactly* as it would had the user typed the right thing.
>
> Incidentally, I think this is the source of the
> SemaObjC/synth-provisional-ivars.m failure, since we're allowing lookup into
> an Objective-C class even though one can never qualify the name of an entity
> in an Objective-C class. We shouldn't ever do qualifier checks in
> (Objective-)C, although it's happening here.
>
> Back to my comment about repeated work. The typo correction code for
> unqualified lookup makes a pass over all of the identifiers known in the
> translation unit, which includes all of the names of entities buried in
> namespaces. So, when we also go walk the AST to find all of the namespaces
> (and the members of those namespaces), so we end up hitting many of those
> identifiers a second time, and traversing most of the AST in the process
> (which is more expensive than just looking at the identifiers). So, here's a
> concrete recommendation:
>
>        1) Don't walk the namespaces at all; let the check against all of
> the identifiers in the translation unit winnow the list down to the best
> candidates based on name alone. We don't know which of these candidates is
> visible, or how to get there.
>        2) When we're performing lookup for each candidate (e.g., the calls
> to LookupPotentialTypoResult), first try unqualified lookup. If that fails,
> save the candidate in a separate "try qualified name lookup" set.
>        3) If none of the candidates can be found by unqualified lookup,
> start looking in each of the namespaces we know about to see if qualified
> name lookup would work to find each of the candidates. You'll likely need a
> side structure containing all of the (unique, canonicalized) NamespaceDecls
> in the translation unit to make this efficient.
>
> Implemented this way, there's no additional penalty for typo correction of
> visible names. Moreover, we skip the AST walk entirely, and only perform
> targeted lookups for names that we know exist (because they were in the
> identifier table). This also permits a few more optimizations:
>
>        - We could (roughly) order namespaces based on their depth, so that
> we'll tend to visit the more top-level (and, therefore, more likely to have
> short qualifiers) namespaces first. If the qualifier for a namespace would
> be longer than the shortest qualifier we've found so far, we don't have to
> look in that namespace.
>
>        - For a given namespace, we could look for all of the candidates in
> that namespace at the same time, improving locality (which is particularly
> import when those lookups hit the disk in the PCH case).
>
> Sorry for the long e-mail; you've got me thinking a lot about this :)
>
>        - Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110324/ea6ba81a/attachment.html>