<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - directory_iterator assert crash on Unicode input on Windows"
   href="https://bugs.llvm.org/show_bug.cgi?id=46236">46236</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>directory_iterator assert crash on Unicode input on Windows
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows 2000
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Support Libraries
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>andrey@futoin.org
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>In short: there is a bug that UTF-8 byte length is used in UTF-16 condition
checks whats leads to out-of-range assertion.

Why so: UTF-8 may be longer in bytes than UTF-16 in wchar_t

Severity: unlikely to be critical, but I have spotted the problem in
third-party software. It seems only a few LLVM/clang internals use the
functionality.

A short obvious blind bug fix, that explains everything, but not tested:

diff --git a/llvm/lib/Support/Windows/Path.inc
b/llvm/lib/Support/Windows/Path.inc
index ec62e656ddf..49fc8dbdfb0 100644
--- a/llvm/lib/Support/Windows/Path.inc
+++ b/llvm/lib/Support/Windows/Path.inc
@@ -941,32 +941,32 @@ static basic_file_status
status_from_find_data(WIN32_FIND_DATAW *FindData) {
                            FindData->ftLastWriteTime.dwHighDateTime,
                            FindData->ftLastWriteTime.dwLowDateTime,
                            FindData->nFileSizeHigh, FindData->nFileSizeLow);
 }

 std::error_code detail::directory_iterator_construct(detail::DirIterState &IT,
                                                      StringRef Path,
                                                      bool FollowSymlinks) {
   SmallVector<wchar_t, 128> PathUTF16;

   if (std::error_code EC = widenPath(Path, PathUTF16))
     return EC;

   // Convert path to the format that Windows is happy with.
   if (PathUTF16.size() > 0 &&
-      !is_separator(PathUTF16[Path.size() - 1]) &&
-      PathUTF16[Path.size() - 1] != L':') {
+      !is_separator(PathUTF16[PathUTF16.size() - 1]) &&
+      PathUTF16[PathUTF16.size() - 1] != L':') {
     PathUTF16.push_back(L'\\');
     PathUTF16.push_back(L'*');
   } else {
     PathUTF16.push_back(L'*');
   }

   //  Get the first directory entry.
   WIN32_FIND_DATAW FirstFind;
   ScopedFindHandle FindHandle(::FindFirstFileExW(
       c_str(PathUTF16), FindExInfoBasic, &FirstFind, FindExSearchNameMatch,
       NULL, FIND_FIRST_EX_LARGE_FETCH));
   if (!FindHandle)
     return mapWindowsError(::GetLastError());

   size_t FilenameLen = ::wcslen(FirstFind.cFileName);</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>