<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - "edit distance" algorithm is wrong"
href="https://llvm.org/bugs/show_bug.cgi?id=28780">28780</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>"edit distance" algorithm is wrong
</td>
</tr>
<tr>
<th>Product</th>
<td>compiler-rt
</td>
</tr>
<tr>
<th>Version</th>
<td>unspecified
</td>
</tr>
<tr>
<th>Hardware</th>
<td>All
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>compiler-rt
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>jiangg@mail.ustc.edu.cn
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org, nicolasweber@gmx.de
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>In "llvm/include/llvm/ADT/edit_distance.h:ComputeEditDistance()" function, if
two strings(or arrays) are both empty, the result would be a random value, but
expected zero.
The current version on 2016.07.29 is:
template<typename T>
unsigned ComputeEditDistance(ArrayRef<T> FromArray, ArrayRef<T> ToArray,
bool AllowReplacements = true,
unsigned MaxEditDistance = 0) {
// The algorithm implemented below is the "classic"
// dynamic-programming algorithm for computing the Levenshtein
// distance, which is described here:
//
// <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">http://en.wikipedia.org/wiki/Levenshtein_distance</a>
//
// Although the algorithm is typically described using an m x n
// array, only one row plus one element are used at a time, so this
// implementation just keeps one vector for the row. To update one
entry,
// only the entries to the left, top, and top-left are needed. The left
// entry is in Row[x-1], the top entry is what's in Row[x] from the last
// iteration, and the top-left entry is stored in Previous.
typename ArrayRef<T>::size_type m = FromArray.size();
typename ArrayRef<T>::size_type n = ToArray.size();
const unsigned SmallBufferSize = 64;
unsigned SmallBuffer[SmallBufferSize];
std::unique_ptr<unsigned[]> Allocated;
unsigned *Row = SmallBuffer;
if (n + 1 > SmallBufferSize) {
Row = new unsigned[n + 1];
Allocated.reset(Row);
}
for (unsigned i = 1; i <= n; ++i)
Row[i] = i;
for (typename ArrayRef<T>::size_type y = 1; y <= m; ++y) {
Row[0] = y;
unsigned BestThisRow = Row[0];
unsigned Previous = y - 1;
for (typename ArrayRef<T>::size_type x = 1; x <= n; ++x) {
int OldRow = Row[x];
if (AllowReplacements) {
Row[x] = std::min(
Previous + (FromArray[y-1] == ToArray[x-1] ? 0u : 1u),
std::min(Row[x-1], Row[x])+1);
}
else {
if (FromArray[y-1] == ToArray[x-1]) Row[x] = Previous;
else Row[x] = std::min(Row[x-1], Row[x]) + 1;
}
Previous = OldRow;
BestThisRow = std::min(BestThisRow, Row[x]);
}
if (MaxEditDistance && BestThisRow > MaxEditDistance)
return MaxEditDistance + 1;
}
unsigned Result = Row[n];
return Result;
}
The above algorithm was first introduced on 2015.07.13: git-svn-id:
<a href="https://llvm.org/svn/llvm-project/llvm/trunk@242069">https://llvm.org/svn/llvm-project/llvm/trunk@242069</a>
91177308-0d34-0410-b5e6-96231b3b80d8. And the last right version is git-svn-id:
<a href="https://llvm.org/svn/llvm-project/llvm/trunk@240390">https://llvm.org/svn/llvm-project/llvm/trunk@240390</a>
91177308-0d34-0410-b5e6-96231b3b80d8.
The above `ComputeEditDistance` function is used in
"llvm/lib/Support/StringRef.cpp:StringRef::edit_distance()". So you can
reproduce the bug via calling `StringRef::edit_distance()`. NOTE that the
returned random value may be zero.
How to fix it? The simple solution is to initialize the stack-based array
explictly: `unsigned SmallBuffer[SmallBufferSize]{};`. The safer solution is to
use RAII-style container, however, at the cost for allocating memory dynamicly
even if small number of elements.
Accordingly, unit test can be strengthened in
"llvm/unittests/ADT/StringRefTest.cpp":
diff --git a/unittests/ADT/StringRefTest.cpp
b/unittests/ADT/StringRefTest.cpp
index 66e5944..306dc23 100644
--- a/unittests/ADT/StringRefTest.cpp
+++ b/unittests/ADT/StringRefTest.cpp
@@ -390,6 +390,11 @@ TEST(StringRefTest, Count) {
}
TEST(StringRefTest, EditDistance) {
+ StringRef StrNull;
+ StringRef StrEmpty("");
+ EXPECT_EQ(0U, StrNull.edit_distance(""));
+ EXPECT_EQ(0U, StrEmpty.edit_distance(""));
+
StringRef Str("hello");
EXPECT_EQ(2U, Str.edit_distance("hill"));
}
That's all.
Gang JIANG
<a href="mailto:jiangg@mail.ustc.edu.cn">jiangg@mail.ustc.edu.cn</a>
<a href="http://justme0.com/">http://justme0.com/</a>
University of Science and Technology of China</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>