[cfe-dev] Spell Correction Efficiency

Sun Nov 21 17:53:28 PST 2010

On Sun, Nov 21, 2010 at 7:58 PM, Matthieu Monrocq
<matthieu.monrocq at gmail.com> wrote
>> We had a discussion a month ago I think about the Spell Correction
>> algorithm which was used in CLang, notably for auto-completion, based on the
>> Levenstein distance.

I presume you've thought about whether suffix-tries and prefix-tries
would work for the case of typos on computer identifiers?

> - a good Trie implementation, or hints in how to implement one. A Trie seems
> to be the best index (quite compact yet ordered), however I only have a
> naive implementation at the moment.

I've got a hash-trie implementation (the datastructure that is used,
eg, to store the dictionary of hypenation exceptions in TeX) in C++
that's heavily designed for small alphabets and integer "payloads", so
it's unlikely to be directly suitable, but you can have  a look at it
if you like. (Drop me a personal email.)

-- 
cheers, dave tweed__________________________
computer vision reasearcher: david.tweed at gmail.com
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot