<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - "edit distance" algorithm is wrong"
   href="https://llvm.org/bugs/show_bug.cgi?id=28780">28780</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>"edit distance" algorithm is wrong
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>compiler-rt
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>compiler-rt
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>jiangg@mail.ustc.edu.cn
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org, nicolasweber@gmx.de
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>In "llvm/include/llvm/ADT/edit_distance.h:ComputeEditDistance()" function, if
two strings(or arrays) are both empty, the result would be a random value, but
expected zero.
The current version on 2016.07.29 is:

    template<typename T>
    unsigned ComputeEditDistance(ArrayRef<T> FromArray, ArrayRef<T> ToArray,
                                 bool AllowReplacements = true,
                                 unsigned MaxEditDistance = 0) {
      // The algorithm implemented below is the "classic"
      // dynamic-programming algorithm for computing the Levenshtein
      // distance, which is described here:
      //
      //   <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">http://en.wikipedia.org/wiki/Levenshtein_distance</a>
      //
      // Although the algorithm is typically described using an m x n
      // array, only one row plus one element are used at a time, so this
      // implementation just keeps one vector for the row.  To update one
entry,
      // only the entries to the left, top, and top-left are needed.  The left
      // entry is in Row[x-1], the top entry is what's in Row[x] from the last
      // iteration, and the top-left entry is stored in Previous.
      typename ArrayRef<T>::size_type m = FromArray.size();
      typename ArrayRef<T>::size_type n = ToArray.size();

      const unsigned SmallBufferSize = 64;
      unsigned SmallBuffer[SmallBufferSize];
      std::unique_ptr<unsigned[]> Allocated;
      unsigned *Row = SmallBuffer;
      if (n + 1 > SmallBufferSize) {
        Row = new unsigned[n + 1];
        Allocated.reset(Row);
      }

      for (unsigned i = 1; i <= n; ++i)
        Row[i] = i;

      for (typename ArrayRef<T>::size_type y = 1; y <= m; ++y) {
        Row[0] = y;
        unsigned BestThisRow = Row[0];

        unsigned Previous = y - 1;
        for (typename ArrayRef<T>::size_type x = 1; x <= n; ++x) {
          int OldRow = Row[x];
          if (AllowReplacements) {
            Row[x] = std::min(
                Previous + (FromArray[y-1] == ToArray[x-1] ? 0u : 1u),
                std::min(Row[x-1], Row[x])+1);
          }
          else {
            if (FromArray[y-1] == ToArray[x-1]) Row[x] = Previous;
            else Row[x] = std::min(Row[x-1], Row[x]) + 1;
          }
          Previous = OldRow;
          BestThisRow = std::min(BestThisRow, Row[x]);
        }

        if (MaxEditDistance && BestThisRow > MaxEditDistance)
          return MaxEditDistance + 1;
      }

      unsigned Result = Row[n];
      return Result;
    }

The above algorithm was first introduced on 2015.07.13: git-svn-id:
<a href="https://llvm.org/svn/llvm-project/llvm/trunk@242069">https://llvm.org/svn/llvm-project/llvm/trunk@242069</a>
91177308-0d34-0410-b5e6-96231b3b80d8. And the last right version is git-svn-id:
<a href="https://llvm.org/svn/llvm-project/llvm/trunk@240390">https://llvm.org/svn/llvm-project/llvm/trunk@240390</a>
91177308-0d34-0410-b5e6-96231b3b80d8.

The above `ComputeEditDistance` function is used in
"llvm/lib/Support/StringRef.cpp:StringRef::edit_distance()". So you can
reproduce the bug via calling `StringRef::edit_distance()`. NOTE that the
returned random value may be zero.

How to fix it? The simple solution is to initialize the stack-based array
explictly: `unsigned SmallBuffer[SmallBufferSize]{};`. The safer solution is to
use RAII-style container, however, at the cost for allocating memory dynamicly
even if small number of elements.

Accordingly, unit test can be strengthened in
"llvm/unittests/ADT/StringRefTest.cpp":

    diff --git a/unittests/ADT/StringRefTest.cpp
b/unittests/ADT/StringRefTest.cpp
    index 66e5944..306dc23 100644
    --- a/unittests/ADT/StringRefTest.cpp
    +++ b/unittests/ADT/StringRefTest.cpp
    @@ -390,6 +390,11 @@ TEST(StringRefTest, Count) {
     }

     TEST(StringRefTest, EditDistance) {
    +  StringRef StrNull;
    +  StringRef StrEmpty("");
    +  EXPECT_EQ(0U, StrNull.edit_distance(""));
    +  EXPECT_EQ(0U, StrEmpty.edit_distance(""));
    +
       StringRef Str("hello");
       EXPECT_EQ(2U, Str.edit_distance("hill"));
     }

That's all.

Gang JIANG
<a href="mailto:jiangg@mail.ustc.edu.cn">jiangg@mail.ustc.edu.cn</a>
<a href="http://justme0.com/">http://justme0.com/</a>
University of Science and Technology of China</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>