[cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Kaelyn Uhrain rikka at google.com
Fri Mar 25 14:15:42 PDT 2011


On Mon, Mar 14, 2011 at 10:28 AM, Douglas Gregor <dgregor at apple.com> wrote:

>
>        1) Don't walk the namespaces at all; let the check against all of
> the identifiers in the translation unit winnow the list down to the best
> candidates based on name alone. We don't know which of these candidates is
> visible, or how to get there.
>

The problem with first checking by name alone is that you can mistakenly
discard the best match--I had that problem in early iterations of my patch.
Take for example:

namespace foo {
  namespace bar {
     void tree_branches() { /* ... */ }
  }
}

void three_branches() { /* ... */ }

int main() {
  tree_branches();
}

Currently my patch will suggest three_branches as it has an edit distance of
1 versus an edit distance of 2 for foo::bar::tree_branches.  Checking
against all identifiers first would cause the TypoCorrectionConsumer to
discard three_branches in favor of tree_branches since it would think
tree_branches has an edit distance of 0... but will later realize
tree_branches is unreachable from the current context and discard it (as the
code would do now), or possibly suggest a sub-optimal correction (if the
code waited to compute the qualifier).  So the trick would be to figure out
the qualifier needed for each identifier before making a judgment about edit
distance without doing an unqualified lookup and potentially trying to
figure out a valid qualified lookup for each identifier.  The trick is
trying to get the right info at the right time without walking through the
namespaces in the AST given that the IdentifierInfo objects stored in
Context.Idents lack any reference to a declaration context.

        2) When we're performing lookup for each candidate (e.g., the calls
> to LookupPotentialTypoResult), first try unqualified lookup. If that fails,
> save the candidate in a separate "try qualified name lookup" set.
>

With my above comments in mind, the unqualified lookup would have to be done
for potentially every identifier in the translation unit (any subsequent
identifier that has a greater edit distance than the closest correction so
far that was successfully lookup can be skipped, but the worst-case scenario
is for all identifiers to need to be looked up), followed by qualifier
computation (involving walking the decls in the AST for child namespaces,
unless I'm missing some handy method for building qualified identifiers) on
every identifier that failed the unqualified lookup but does not have a
greater edit distance than the closest valid unqualified identifier.

       3) If none of the candidates can be found by unqualified lookup,
> start looking in each of the namespaces we know about to see if qualified
> name lookup would work to find each of the candidates. You'll likely need a
> side structure containing all of the (unique, canonicalized) NamespaceDecls
> in the translation unit to make this efficient.
>

How would building the side structure containing all of those NamespaceDecls
be more efficient than  ScanNamespaceForCorrections(), which stops walking
through namespaces once the distance between a namespace and the current
context is greater than the best edit distance known to the
TypoCorrectionConsumer?  Or walk any less of the AST than
ScanNamespaceForCorrections?


> Implemented this way, there's no additional penalty for typo correction of
> visible names. Moreover, we skip the AST walk entirely, and only perform
> targeted lookups for names that we know exist (because they were in the
> identifier table). This also permits a few more optimizations:
>
>        - We could (roughly) order namespaces based on their depth, so that
> we'll tend to visit the more top-level (and, therefore, more likely to have
> short qualifiers) namespaces first. If the qualifier for a namespace would
> be longer than the shortest qualifier we've found so far, we don't have to
> look in that namespace.
>

 ScanNamespaceForCorrections already skips namespaces that would require
overly-long qualifiers.

       - For a given namespace, we could look for all of the candidates in
> that namespace at the same time, improving locality (which is particularly
> import when those lookups hit the disk in the PCH case).
>

ScanNamespaceForCorrections currently feeds candidates to the
TypoCorrectionConsumer by namespace.  Admittedly, it could store child
NamespaceDecls to be searched after searching the current DeclContext
instead of searching them as they are found, so that the overall search for
candidates is breadth-first instead depth-first.  (And now that I've thought
of this, I'm going to be updating ScanNamespaceForCorrections to do so since
it will further limit the number of namespaces searched).

Cheers,
Kaelyn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110325/81854933/attachment.html>


More information about the cfe-commits mailing list