[cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting
Kaelyn Uhrain
rikka at google.com
Thu Mar 24 14:37:17 PDT 2011
Still digesting your reply as time permits (and thank you for the feedback!)
but have one small question/issue regarding your suggestion of using a
SmallPtrSet for storing the BestResults: creating a SmallPtrSet requires
setting a fixed size for the set--how would one determine a reasonable
bounding size for the set?
Thanks,
Kaelyn
On Mon, Mar 14, 2011 at 10:28 AM, Douglas Gregor <dgregor at apple.com> wrote:
>
> On Mar 9, 2011, at 11:35 AM, Kaelyn Uhrain wrote:
>
> > I just finished up a new version of my patch earlier this week, which no
> longer uses the RecursiveASTVisitor (per chandlerc's suggestion). I've
> rebased the patch against the current SVN and have attached it for you to
> test. With the new version only the FixIt/typo.cpp and
> SemaObjC/synth-provisional-ivars.m tests are failing, and for those two I'm
> not entirely sure which parts of the behavior are correct/acceptable and
> which might be a bug in my changes.
>
> Definitely looking better! I don't have the time at the moment to re-run my
> performance tests, but I will comment on some aspects of your patch...
>
>
> Index: lib/Sema/SemaDecl.cpp
> ===================================================================
> --- lib/Sema/SemaDecl.cpp (revision 127346)
> +++ lib/Sema/SemaDecl.cpp (working copy)
> @@ -264,24 +264,24 @@
>
> if (DeclarationName Corrected = CorrectTypo(Lookup, S, SS, 0, 0,
> CTC_Type)) {
> if (NamedDecl *Result = Lookup.getAsSingle<NamedDecl>()) {
> + std::string Suggestion = Result->getQualifiedNameAsString();
> + // FIXME: Silly formatting rules make us have to quote strings
> ourselves
> + // to keep quoting consistent between strings and IdentiferInfos.
> + std::string QuotedSug = "'" + Suggestion + "'";
>
> getQualifiedNameAsString() isn't necessarily what we want here, because it
> always produces the fully-qualified name. That means it will overqualify
> when we're already in a namespace, e.g.,
>
> namespace N {
> namespace inner {
> class vector { /* ... */ };
> }
>
> void f() {
> vector v; // will suggest N::inner::vector, when
> inner::vector would suffice
> }
> }
>
> A similar issue is inline namespaces. For example, if one has:
>
> namespace std {
> inline namespace __1 {
> class vector { /* ... */ };
> }
> }
>
> void f() {
> vector v; // will suggest std::__1::vector, when std::vector
> would suffice
> }
>
> It would be preferable to add some kind of function that computes the
> minimum qualification needed to get from the specified context (e.g.,
> CurContext, for unqualified name lookup) to the entity, since that's what
> the user would want to type.
>
> Incidentally, when testing this, I actually used
>
> template<typename T> class vector { /* ... */ };
>
> in my examples, and got incorrect Fix-Its and diagnostics:
>
> t.cpp:7:4: error: no template named 'vector'; did you mean 'vector'?
> vector<int> v; // will suggest ...
> ^~~~~~
> vector
>
>
> +void TypoCorrectionConsumer::CheckEditDistanceAndStore(
> + llvm::StringRef Name, DeclSpecPair DSPair, unsigned DistOffset) {
> + using namespace std;
> +
> // Compute an upper bound on the allowable edit distance, so that the
> // edit-distance algorithm can short-circuit.
> unsigned UpperBound = min(unsigned((Typo.size() + 2) / 3),
> BestEditDistance);
> @@ -2872,7 +2884,7 @@
> // Compute the edit distance between the typo and the name of this
> // entity. If this edit distance is not worse than the best edit
> // distance we've seen so far, add it to the list of results.
> - unsigned ED = Typo.edit_distance(Name, true, UpperBound);
> + unsigned ED = Typo.edit_distance(Name, true, UpperBound) + DistOffset;
> if (ED == 0)
> return;
>
> The DistOffset should be added into the UpperBound, so that the
> edit_distance implementation can abort early if the resulting edit distance
> is going to be too high.
>
> + // Only consider entities that haven't already been found.
> + DeclContext *NDContext = ND->getDeclContext()->getPrimaryContext();
> + iterator N = BestResults.find(ShortName);
> + if (N != BestResults.end()) {
> + for (DeclSpecList::iterator D = N->second.begin(), DEnd =
> N->second.end();
> + D != DEnd; ++D) {
> + if (NDContext == D->first) return;
> + }
> + }
>
> Should we use a SmallPtrSet for this, so that we get faster lookup if the
> BestResults set happens to get large?
>
> +void ScanNamespaceForCorrections(DeclContext *DC,
> + TypoCorrectionConsumer &Consumer,
> + unsigned DistanceOffset, bool DownOnly) {
> +
> + if (DistanceOffset > Consumer.getBestEditDistance())
> + return;
> +
> + DeclContext *ParentDC = DC->getLookupParent();
> +
> + for (DC = DC->getPrimaryContext(); DC != NULL; DC =
> DC->getNextContext()) {
> + for (DeclContext::decl_iterator D = DC->decls_begin(),
> + DEnd = DC->decls_end(); D != DEnd; ++D) {
> + NamedDecl *ChildDecl = dyn_cast<NamedDecl>(D->getCanonicalDecl());
> + NamespaceDecl *ChildND = dyn_cast_or_null<NamespaceDecl>(*D);
> + if (ChildDecl)
> + Consumer.FoundDecl(ChildDecl, NULL, false, DistanceOffset);
> + if (ChildND)
> + ScanNamespaceForCorrections(ChildND, Consumer,
> + DistanceOffset + 1, true);
> + }
> + }
> +
> + if (DownOnly || !ParentDC) return;
> +
> + ScanNamespaceForCorrections(ParentDC, Consumer, DistanceOffset, false);
> +}
>
> This will end up walking most of the translation unit. It's better than the
> RecursiveASTVisitor, since it won't walk into function bodies or the
> definitions of classes, but there's still some redundant work. I'll make a
> concrete suggestion at the end of this reply.
>
> + for (DeclSpecList::iterator DSI = I->second.begin(),
> + DSIEnd = I->second.end();
> + DSI != DSIEnd; /* Increment in loop, */) {
> + // Perform name lookup on this name.
> + IdentifierInfo *Name = &Context.Idents.get(I->getKey());
> + LookupPotentialTypoResult(*this, Res, Name, S, SS,
> + DSI->second ? DSI->first : MemberContext,
> + EnteringContext, CTC);
>
> Using DSI->first as the member context isn't quite what we want, because we
> don't want to turn an unqualified lookup into a member lookup. Rather, we
> want to turn an unqualified lookup into a qualified lookup. So, I suggest
> that you form a new nested-name-specifier (e.g., a new "SS") that provides
> the minimum qualification needed to get from the current context over to the
> result we found (this is related to my first comment). If that works, you
> could write the results back to SS, so that our caller can go ahead and use
> that nested-name-specifier for the rest of semantic analysis as if the user
> had written it explicitly. (Naturally, callers would need some updating to
> understand that this could happen). The key here is that, on successfully
> returning an alternative result from typo correction, the name-lookup state
> looks *exactly* as it would had the user typed the right thing.
>
> Incidentally, I think this is the source of the
> SemaObjC/synth-provisional-ivars.m failure, since we're allowing lookup into
> an Objective-C class even though one can never qualify the name of an entity
> in an Objective-C class. We shouldn't ever do qualifier checks in
> (Objective-)C, although it's happening here.
>
> Back to my comment about repeated work. The typo correction code for
> unqualified lookup makes a pass over all of the identifiers known in the
> translation unit, which includes all of the names of entities buried in
> namespaces. So, when we also go walk the AST to find all of the namespaces
> (and the members of those namespaces), so we end up hitting many of those
> identifiers a second time, and traversing most of the AST in the process
> (which is more expensive than just looking at the identifiers). So, here's a
> concrete recommendation:
>
> 1) Don't walk the namespaces at all; let the check against all of
> the identifiers in the translation unit winnow the list down to the best
> candidates based on name alone. We don't know which of these candidates is
> visible, or how to get there.
> 2) When we're performing lookup for each candidate (e.g., the calls
> to LookupPotentialTypoResult), first try unqualified lookup. If that fails,
> save the candidate in a separate "try qualified name lookup" set.
> 3) If none of the candidates can be found by unqualified lookup,
> start looking in each of the namespaces we know about to see if qualified
> name lookup would work to find each of the candidates. You'll likely need a
> side structure containing all of the (unique, canonicalized) NamespaceDecls
> in the translation unit to make this efficient.
>
> Implemented this way, there's no additional penalty for typo correction of
> visible names. Moreover, we skip the AST walk entirely, and only perform
> targeted lookups for names that we know exist (because they were in the
> identifier table). This also permits a few more optimizations:
>
> - We could (roughly) order namespaces based on their depth, so that
> we'll tend to visit the more top-level (and, therefore, more likely to have
> short qualifiers) namespaces first. If the qualifier for a namespace would
> be longer than the shortest qualifier we've found so far, we don't have to
> look in that namespace.
>
> - For a given namespace, we could look for all of the candidates in
> that namespace at the same time, improving locality (which is particularly
> import when those lookups hit the disk in the PCH case).
>
> Sorry for the long e-mail; you've got me thinking a lot about this :)
>
> - Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110324/ea6ba81a/attachment.html>
More information about the cfe-commits
mailing list