[llvm-bugs] [Bug 28780] New: "edit distance" algorithm is wrong
via llvm-bugs
llvm-bugs at lists.llvm.org
Sat Jul 30 09:02:57 PDT 2016
https://llvm.org/bugs/show_bug.cgi?id=28780
Bug ID: 28780
Summary: "edit distance" algorithm is wrong
Product: compiler-rt
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P
Component: compiler-rt
Assignee: unassignedbugs at nondot.org
Reporter: jiangg at mail.ustc.edu.cn
CC: llvm-bugs at lists.llvm.org, nicolasweber at gmx.de
Classification: Unclassified
In "llvm/include/llvm/ADT/edit_distance.h:ComputeEditDistance()" function, if
two strings(or arrays) are both empty, the result would be a random value, but
expected zero.
The current version on 2016.07.29 is:
template<typename T>
unsigned ComputeEditDistance(ArrayRef<T> FromArray, ArrayRef<T> ToArray,
bool AllowReplacements = true,
unsigned MaxEditDistance = 0) {
// The algorithm implemented below is the "classic"
// dynamic-programming algorithm for computing the Levenshtein
// distance, which is described here:
//
// http://en.wikipedia.org/wiki/Levenshtein_distance
//
// Although the algorithm is typically described using an m x n
// array, only one row plus one element are used at a time, so this
// implementation just keeps one vector for the row. To update one
entry,
// only the entries to the left, top, and top-left are needed. The left
// entry is in Row[x-1], the top entry is what's in Row[x] from the last
// iteration, and the top-left entry is stored in Previous.
typename ArrayRef<T>::size_type m = FromArray.size();
typename ArrayRef<T>::size_type n = ToArray.size();
const unsigned SmallBufferSize = 64;
unsigned SmallBuffer[SmallBufferSize];
std::unique_ptr<unsigned[]> Allocated;
unsigned *Row = SmallBuffer;
if (n + 1 > SmallBufferSize) {
Row = new unsigned[n + 1];
Allocated.reset(Row);
}
for (unsigned i = 1; i <= n; ++i)
Row[i] = i;
for (typename ArrayRef<T>::size_type y = 1; y <= m; ++y) {
Row[0] = y;
unsigned BestThisRow = Row[0];
unsigned Previous = y - 1;
for (typename ArrayRef<T>::size_type x = 1; x <= n; ++x) {
int OldRow = Row[x];
if (AllowReplacements) {
Row[x] = std::min(
Previous + (FromArray[y-1] == ToArray[x-1] ? 0u : 1u),
std::min(Row[x-1], Row[x])+1);
}
else {
if (FromArray[y-1] == ToArray[x-1]) Row[x] = Previous;
else Row[x] = std::min(Row[x-1], Row[x]) + 1;
}
Previous = OldRow;
BestThisRow = std::min(BestThisRow, Row[x]);
}
if (MaxEditDistance && BestThisRow > MaxEditDistance)
return MaxEditDistance + 1;
}
unsigned Result = Row[n];
return Result;
}
The above algorithm was first introduced on 2015.07.13: git-svn-id:
https://llvm.org/svn/llvm-project/llvm/trunk@242069
91177308-0d34-0410-b5e6-96231b3b80d8. And the last right version is git-svn-id:
https://llvm.org/svn/llvm-project/llvm/trunk@240390
91177308-0d34-0410-b5e6-96231b3b80d8.
The above `ComputeEditDistance` function is used in
"llvm/lib/Support/StringRef.cpp:StringRef::edit_distance()". So you can
reproduce the bug via calling `StringRef::edit_distance()`. NOTE that the
returned random value may be zero.
How to fix it? The simple solution is to initialize the stack-based array
explictly: `unsigned SmallBuffer[SmallBufferSize]{};`. The safer solution is to
use RAII-style container, however, at the cost for allocating memory dynamicly
even if small number of elements.
Accordingly, unit test can be strengthened in
"llvm/unittests/ADT/StringRefTest.cpp":
diff --git a/unittests/ADT/StringRefTest.cpp
b/unittests/ADT/StringRefTest.cpp
index 66e5944..306dc23 100644
--- a/unittests/ADT/StringRefTest.cpp
+++ b/unittests/ADT/StringRefTest.cpp
@@ -390,6 +390,11 @@ TEST(StringRefTest, Count) {
}
TEST(StringRefTest, EditDistance) {
+ StringRef StrNull;
+ StringRef StrEmpty("");
+ EXPECT_EQ(0U, StrNull.edit_distance(""));
+ EXPECT_EQ(0U, StrEmpty.edit_distance(""));
+
StringRef Str("hello");
EXPECT_EQ(2U, Str.edit_distance("hill"));
}
That's all.
Gang JIANG
jiangg at mail.ustc.edu.cn
http://justme0.com/
University of Science and Technology of China
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160730/194f1424/attachment-0001.html>
More information about the llvm-bugs
mailing list