[cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Mon Mar 14 10:28:24 PDT 2011

On Mar 9, 2011, at 11:35 AM, Kaelyn Uhrain wrote:

> I just finished up a new version of my patch earlier this week, which no longer uses the RecursiveASTVisitor (per chandlerc's suggestion).  I've rebased the patch against the current SVN and have attached it for you to test.  With the new version only the FixIt/typo.cpp and SemaObjC/synth-provisional-ivars.m tests are failing, and for those two I'm not entirely sure which parts of the behavior are correct/acceptable and which might be a bug in my changes.

Definitely looking better! I don't have the time at the moment to re-run my performance tests, but I will comment on some aspects of your patch...


Index: lib/Sema/SemaDecl.cpp
===================================================================

--- lib/Sema/SemaDecl.cpp	(revision 127346)
+++ lib/Sema/SemaDecl.cpp	(working copy)
@@ -264,24 +264,24 @@
 
   if (DeclarationName Corrected = CorrectTypo(Lookup, S, SS, 0, 0, CTC_Type)) {
     if (NamedDecl *Result = Lookup.getAsSingle<NamedDecl>()) {
+      std::string Suggestion = Result->getQualifiedNameAsString();
+      // FIXME: Silly formatting rules make us have to quote strings ourselves
+      // to keep quoting consistent between strings and IdentiferInfos.
+      std::string QuotedSug = "'" + Suggestion + "'";

getQualifiedNameAsString() isn't necessarily what we want here, because it always produces the fully-qualified name. That means it will overqualify when we're already in a namespace, e.g.,

	namespace N {
		namespace inner {
			class vector { /* ... */ };
		}

		void f() {
			vector v; // will suggest N::inner::vector, when inner::vector would suffice
		}
	}

A similar issue is inline namespaces. For example, if one has:

	namespace std {
		inline namespace __1 {
			class vector { /* ... */ };
		}
	}

	void f() {
		vector v; // will suggest std::__1::vector, when std::vector would suffice
	}

It would be preferable to add some kind of function that computes the minimum qualification needed to get from the specified context (e.g., CurContext, for unqualified name lookup) to the entity, since that's what the user would want to type. 

Incidentally, when testing this, I actually used

	template<typename T> class vector { /* ... */ };

in my examples, and got incorrect Fix-Its and diagnostics:

t.cpp:7:4: error: no template named 'vector'; did you mean 'vector'?
                        vector<int> v; // will suggest ...
                        ^~~~~~
                        vector


+void TypoCorrectionConsumer::CheckEditDistanceAndStore(
+    llvm::StringRef Name, DeclSpecPair DSPair, unsigned DistOffset) {
+  using namespace std;
+
   // Compute an upper bound on the allowable edit distance, so that the
   // edit-distance algorithm can short-circuit.
   unsigned UpperBound = min(unsigned((Typo.size() + 2) / 3), BestEditDistance);
@@ -2872,7 +2884,7 @@
   // Compute the edit distance between the typo and the name of this
   // entity. If this edit distance is not worse than the best edit
   // distance we've seen so far, add it to the list of results.
-  unsigned ED = Typo.edit_distance(Name, true, UpperBound);
+  unsigned ED = Typo.edit_distance(Name, true, UpperBound) + DistOffset;
   if (ED == 0)
     return;

The DistOffset should be added into the UpperBound, so that the edit_distance implementation can abort early if the resulting edit distance is going to be too high.

+  // Only consider entities that haven't already been found.
+  DeclContext *NDContext = ND->getDeclContext()->getPrimaryContext();
+  iterator N = BestResults.find(ShortName);
+  if (N != BestResults.end()) {
+    for (DeclSpecList::iterator D = N->second.begin(), DEnd = N->second.end();
+         D != DEnd; ++D) {
+      if (NDContext == D->first) return;
+    }
+  }

Should we use a SmallPtrSet for this, so that we get faster lookup if the BestResults set happens to get large?

+void ScanNamespaceForCorrections(DeclContext *DC,
+                                 TypoCorrectionConsumer &Consumer,
+                                 unsigned DistanceOffset, bool DownOnly) {
+
+  if (DistanceOffset > Consumer.getBestEditDistance())
+    return;
+
+  DeclContext *ParentDC = DC->getLookupParent();
+
+  for (DC = DC->getPrimaryContext(); DC != NULL; DC = DC->getNextContext()) {
+    for (DeclContext::decl_iterator D = DC->decls_begin(),
+         DEnd = DC->decls_end(); D != DEnd; ++D) {
+      NamedDecl *ChildDecl = dyn_cast<NamedDecl>(D->getCanonicalDecl());
+      NamespaceDecl *ChildND = dyn_cast_or_null<NamespaceDecl>(*D);
+      if (ChildDecl)
+        Consumer.FoundDecl(ChildDecl, NULL, false, DistanceOffset);
+      if (ChildND)
+        ScanNamespaceForCorrections(ChildND, Consumer,
+                                    DistanceOffset + 1, true);
+    }
+  }
+
+  if (DownOnly || !ParentDC) return;
+
+  ScanNamespaceForCorrections(ParentDC, Consumer, DistanceOffset, false);
+}

This will end up walking most of the translation unit. It's better than the RecursiveASTVisitor, since it won't walk into function bodies or the definitions of classes, but there's still some redundant work. I'll make a concrete suggestion at the end of this reply.

+    for (DeclSpecList::iterator DSI = I->second.begin(),
+                                DSIEnd = I->second.end();
+         DSI != DSIEnd; /* Increment in loop, */) {
+      // Perform name lookup on this name.
+      IdentifierInfo *Name = &Context.Idents.get(I->getKey());
+      LookupPotentialTypoResult(*this, Res, Name, S, SS,
+                                DSI->second ? DSI->first : MemberContext,
+                                EnteringContext, CTC);

Using DSI->first as the member context isn't quite what we want, because we don't want to turn an unqualified lookup into a member lookup. Rather, we want to turn  an unqualified lookup into a qualified lookup. So, I suggest that you form a new nested-name-specifier (e.g., a new "SS") that provides the minimum qualification needed to get from the current context over to the result we found (this is related to my first comment). If that works, you could write the results back to SS, so that our caller can go ahead and use that nested-name-specifier for the rest of semantic analysis as if the user had written it explicitly. (Naturally, callers would need some updating to understand that this could happen). The key here is that, on successfully returning an alternative result from typo correction, the name-lookup state looks *exactly* as it would had the user typed the right thing.

Incidentally, I think this is the source of the SemaObjC/synth-provisional-ivars.m failure, since we're allowing lookup into an Objective-C class even though one can never qualify the name of an entity in an Objective-C class. We shouldn't ever do qualifier checks in (Objective-)C, although it's happening here.

Back to my comment about repeated work. The typo correction code for unqualified lookup makes a pass over all of the identifiers known in the translation unit, which includes all of the names of entities buried in namespaces. So, when we also go walk the AST to find all of the namespaces (and the members of those namespaces), so we end up hitting many of those identifiers a second time, and traversing most of the AST in the process (which is more expensive than just looking at the identifiers). So, here's a concrete recommendation:

	1) Don't walk the namespaces at all; let the check against all of the identifiers in the translation unit winnow the list down to the best candidates based on name alone. We don't know which of these candidates is visible, or how to get there.
	2) When we're performing lookup for each candidate (e.g., the calls to LookupPotentialTypoResult), first try unqualified lookup. If that fails, save the candidate in a separate "try qualified name lookup" set.
	3) If none of the candidates can be found by unqualified lookup, start looking in each of the namespaces we know about to see if qualified name lookup would work to find each of the candidates. You'll likely need a side structure containing all of the (unique, canonicalized) NamespaceDecls in the translation unit to make this efficient.

Implemented this way, there's no additional penalty for typo correction of visible names. Moreover, we skip the AST walk entirely, and only perform targeted lookups for names that we know exist (because they were in the identifier table). This also permits a few more optimizations:

	- We could (roughly) order namespaces based on their depth, so that we'll tend to visit the more top-level (and, therefore, more likely to have short qualifiers) namespaces first. If the qualifier for a namespace would be longer than the shortest qualifier we've found so far, we don't have to look in that namespace.

	- For a given namespace, we could look for all of the candidates in that namespace at the same time, improving locality (which is particularly import when those lookups hit the disk in the PCH case).

Sorry for the long e-mail; you've got me thinking a lot about this :)

	- Doug