<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - directory_iterator assert crash on Unicode input on Windows"
href="https://bugs.llvm.org/show_bug.cgi?id=46236">46236</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>directory_iterator assert crash on Unicode input on Windows
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows 2000
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Support Libraries
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>andrey@futoin.org
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>In short: there is a bug that UTF-8 byte length is used in UTF-16 condition
checks whats leads to out-of-range assertion.
Why so: UTF-8 may be longer in bytes than UTF-16 in wchar_t
Severity: unlikely to be critical, but I have spotted the problem in
third-party software. It seems only a few LLVM/clang internals use the
functionality.
A short obvious blind bug fix, that explains everything, but not tested:
diff --git a/llvm/lib/Support/Windows/Path.inc
b/llvm/lib/Support/Windows/Path.inc
index ec62e656ddf..49fc8dbdfb0 100644
--- a/llvm/lib/Support/Windows/Path.inc
+++ b/llvm/lib/Support/Windows/Path.inc
@@ -941,32 +941,32 @@ static basic_file_status
status_from_find_data(WIN32_FIND_DATAW *FindData) {
FindData->ftLastWriteTime.dwHighDateTime,
FindData->ftLastWriteTime.dwLowDateTime,
FindData->nFileSizeHigh, FindData->nFileSizeLow);
}
std::error_code detail::directory_iterator_construct(detail::DirIterState &IT,
StringRef Path,
bool FollowSymlinks) {
SmallVector<wchar_t, 128> PathUTF16;
if (std::error_code EC = widenPath(Path, PathUTF16))
return EC;
// Convert path to the format that Windows is happy with.
if (PathUTF16.size() > 0 &&
- !is_separator(PathUTF16[Path.size() - 1]) &&
- PathUTF16[Path.size() - 1] != L':') {
+ !is_separator(PathUTF16[PathUTF16.size() - 1]) &&
+ PathUTF16[PathUTF16.size() - 1] != L':') {
PathUTF16.push_back(L'\\');
PathUTF16.push_back(L'*');
} else {
PathUTF16.push_back(L'*');
}
// Get the first directory entry.
WIN32_FIND_DATAW FirstFind;
ScopedFindHandle FindHandle(::FindFirstFileExW(
c_str(PathUTF16), FindExInfoBasic, &FirstFind, FindExSearchNameMatch,
NULL, FIND_FIRST_EX_LARGE_FETCH));
if (!FindHandle)
return mapWindowsError(::GetLastError());
size_t FilenameLen = ::wcslen(FirstFind.cFileName);</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>