[cfe-commits] cfe-commits Digest, Vol 40, Issue 115

Matthieu Monrocq matthieu.monrocq at gmail.com
Wed Oct 20 13:13:21 PDT 2010


>
> Date: Tue, 19 Oct 2010 19:39:10 -0000
> From: Douglas Gregor <dgregor at apple.com>
> Subject: [cfe-commits] r116849 - /cfe/trunk/lib/Sema/SemaLookup.cpp
> To: cfe-commits at cs.uiuc.edu
> Message-ID: <20101019193911.0338D2A6C12C at llvm.org>
> Content-Type: text/plain; charset="utf-8"
>
> Author: dgregor
> Date: Tue Oct 19 14:39:10 2010
> New Revision: 116849
>
> URL: http://llvm.org/viewvc/llvm-project?rev=116849&view=rev
> Log:
> Improve the performance of typo correction, by using a simple
> computation to compute the lower bound of the edit distance, so that
> we can avoid computing the edit distance for names that will clearly
> be rejected later. Since edit distance is such an expensive algorithm
> (M x N), this leads to a 7.5x speedup when correcting NSstring ->
> NSString in the presence of a Cocoa PCH.
>
> Modified:
>    cfe/trunk/lib/Sema/SemaLookup.cpp
>
> Modified: cfe/trunk/lib/Sema/SemaLookup.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaLookup.cpp?rev=116849&r1=116848&r2=116849&view=diff
>
> ==============================================================================
> --- cfe/trunk/lib/Sema/SemaLookup.cpp (original)
> +++ cfe/trunk/lib/Sema/SemaLookup.cpp Tue Oct 19 14:39:10 2010
> @@ -2730,6 +2730,12 @@
>  }
>
>  void TypoCorrectionConsumer::FoundName(llvm::StringRef Name) {
> +  // Use a simple length-based heuristic to determine the minimum possible
> +  // edit distance. If the minimum isn't good enough, bail out early.
> +  unsigned MinED = abs((int)Name.size() - (int)Typo.size());
> +  if (MinED > BestEditDistance || (MinED && Typo.size() / MinED < 3))
> +    return;
> +
>   // Compute the edit distance between the typo and the name of this
>   // entity. If this edit distance is not worse than the best edit
>   // distance we've seen so far, add it to the list of results.
>
> Hi Doug,

another simple optimization could be to count the number of occurences of
each character in both names, then add the absolute difference for each
character. If the sum of absolute differences is superior to the best edit
distance so far, then no combination of addition / deletion / substitution
(in this limit) can morph one string to another.

I've not measured it though, so it might slow down the general case.

I was also wondering if this optimization would not be better suited in
`StringRef::edit_distance` method ? (so that all users may benefit from it)

Thanks for your work :)
Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20101020/bf449b25/attachment.html>


More information about the cfe-commits mailing list