<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - regex_search on MacOS gives wrong results when \D found in a character class"
href="https://bugs.llvm.org/show_bug.cgi?id=40904">40904</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>regex_search on MacOS gives wrong results when \D found in a character class
</td>
</tr>
<tr>
<th>Product</th>
<td>libc++
</td>
</tr>
<tr>
<th>Version</th>
<td>unspecified
</td>
</tr>
<tr>
<th>Hardware</th>
<td>Macintosh
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>All Bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>tom@kera.name
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org, mclow.lists@gmail.com
</td>
</tr></table>
<p>
<div>
<pre>Pre-C++20, there's no way to turn on /s, so instead of a pattern like /ab.cd/
(where the third character could be a newline) we must write something like
/ab[/d/D]cd/ (using the union of "digits" and "non-digits" to match "any
character").
Unfortunately, libc++ doesn't match properly on this.
Example:
#include <regex>
#include <string>
#include <iostream>
#include <iomanip>
int main()
{
const std::string input = "abZcd";
char const* pattern = R"REGEX(^ab[\d\D]cd)REGEX";
std::regex::flag_type flags = std::regex_constants::ECMAScript;
std::regex re(pattern, flags);
std::cout << std::boolalpha << std::regex_search(input.cbegin(),
input.cend(), re) << '\n';
}
Output is "false" with:
$ clang --version
Apple LLVM version 10.0.0 (clang-1000.10.44.4)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
But "true" (as expected) with g++ (GCC) 8.2.0.
Looking into it a bit, here are the results with some variants:
Pattern Input Should match? Matches?
-------------------------------------------------
/^ab[\d\D]cd/ abZcd Yes No <--- !
/^ab[\d\D]cd/ ab5cd Yes No <--- !
/^ab[\D]cd/ abZcd Yes No <--- !
/^ab\Dcd/ abZcd Yes Yes
/^ab[\d]cd/ ab5cd Yes Yes
/^ab\dcd/ ab5cd Yes Yes
/^ab\dcd/ abZcd No No
/^ab\Dcd/ ab5cd No No
The common feature amongst the three failures is the \D inside a character
class.
The behaviour is the same when switching to std::regex_match.
For added fun, I get the expected results on Linux:
$ clang++ --version
clang version 5.0.0-3~16.04.1 (tags/RELEASE_500/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Related to <a class="bz_bug_link
bz_status_NEW "
title="NEW - Investigate and fix failing regex tests on linux."
href="show_bug.cgi?id=21363">bug 21363</a> (locale fun)?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>