<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/131516>131516</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[libc++] `<regex>`: Character class `[\W\D]` fails to match alphabetic characters
</td>
</tr>
<tr>
<th>Labels</th>
<td>
libc++
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
muellerj2
</td>
</tr>
</table>
<pre>
The (ECMAScript) regular expression `[\W\D]` describes a character class that matches the union of (a) all non-alphanumeric characters and (b) all non-digits. So effectively, the character class should be equivalent to `[\D]` and thus match all non-digits. However, libc++'s regex implementation only matches non-alphanumeric characters.
Test case:
```
#include <iostream>
#include <regex>
using namespace std;
int main()
{
regex re(R"([\W\D])");
cout << "matches alphabetic: " << regex_match("a", re) << '\n'
<< "matches digit: " << regex_match("0", re) << '\n'
<< "matches non-alphanumeric: " << regex_match(".", re);
return 0;
}
```
https://godbolt.org/z/YdvY4Pb6a
This prints:
```
matches alphabetic: 0
matches digit: 0
matches non-alphanumeric: 1
```
But it should print (as MSVC STL and libstdc++ do here):
```
matches alphabetic: 1
matches digit: 0
matches non-alphanumeric: 1
```
The problem lies here:
https://github.com/llvm/llvm-project/blob/215c0d2b651dc757378209a3edaff1a130338dd8/libcxx/include/regex#L2139-L2141
The negated character classes are bitwise or'ed, but De Morgan's law says that `(not w) or (not d) = not (w and d)`, so the bit masks should really be bitwise and'ed.
But bitwise and'ing is problematic as well, because the standard only provides a guarantee that bitwise or'ing works, but doesn't state that bitwise and'ing corresponds to the intersection of the character classes (see [\[re.grammar/9\]](https://eel.is/c++draft/re.grammar#9)). Maybe and'ing will still work for libc++'s `std::regex_traits<char>` and `std::regex_traits<wchar_t>` traits classes (although I haven't checked that), but it might not do the right thing for some user-provided traits classes.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVk1v4zYQ_TX0ZRBDomzZOvjg2Gu0wAYomkWLPS2G4ljiLkW6JGUn_fUFKTl2nCDbAg2MCCKpefPm4w3Re9UYohWb37P5doJ9aK1bdT1pTe47nwgrn1dfWgLGl582D-vH2qlDYLwCR02v0QE9HRx5r6wBVmbJzuZPNt9s2XzLygwk-dopQR4Q6hYd1oEc1Bq9h9BigA5D3VJ8IehNtGP3EQ4jCmoNxpo71IcWTd-RU_XFjAc0Mp4V12elalTwU3i0QPs91UEdST8zvkkQtz741vZagiCgv3p1RE0mQLAXMmciESq0vR8cfoP2iz3RkVyE0UrUjN-n38LHSNETqO6gqSMTMCSORj-_UP-A4ZRla5atv5APUKMnVsTX6Nzwy9aMF8rUupcErNgo64Mj7Fjx6c1ecmTcyNa9V6YBgx35A9YEPkhW3A97ysS8KMP4kvEqri3izkjFEePL3xnncft1wnmVlqvBEgBAbfsQwVmxAcb5mXKiKyiomhXruHE-kyC-pWMJnWOyuEmo1cXSgs03Jj4GmPT3FiZl52cI2ccI8DHEbfJ-hja9RrvEaXw4Cr0zkI2pWGxvst2GcPCxCPiO8V1jpbA6TK1rGN_9zfjuqzx-nf0mShzrplUeDk6Z4N9Wzvu5uN55Cd_14nuE89uizNb3fQAVzv2VfEht7eHh8Y8NPH75nFpKK-GDHPsFpIWWxsj8O3fz_8ndKHIHZ4WmDrQiP_iRnLiJuQptL6a17RjfaX08P-4Ozn6nOjC-E9oKxnc8n9eZ5KKc57JezBfFYsmzCguSuN_nmBdZUSylXEYDStRPT4zvxnZlfDd0Ky8-87yo7j7zfJZfPDXUYCB5q2YxOo5AqHBSnsA6xhckY7WJPsCW4MG6Bk2SJY0n8Pg8qnCMA18aG-AUm8A6GF_l0BNbiC-ML08pa3E1fbIBb5OuChUlw_94UVRHqPVzFNazO2hk8mcKlwp5vRcVKRVsygMGVQN6OJHWiQPV2HtKcD6gkejkIKQHZ49KphnT9OjQBKKB16tQRPMn6374c0SkJR-jEaK9cPPJxaXaOkf-YI30cTREfGWiOsfhMgysd0YL-RgvTwSDSLL5vaNp47Dr0DG-q9LaIJvL1zVGpKfKM74b-0I63IdUE5fviyqpbTWFB3wW1-6elNbgQ_wf2cLeutuZxMos6f2aFetBoYJDFUViE1nEITGOvA9OnuLRb2E8PCxfU0cdWts3LfwKLR5pCHTdUv2DZAp1IjBkIlaPatqQykwOMXZpIbSRU-TgbUfQe3J3Y77lDeh0IleFrIoKJ7TKFzOeF3xezSbtKltWpeB5sdhzQSXKXFZlWe4XXCyxFvtqolY84_OsyMu8mBX5bEolUSXlIiNCIWXOZhl1qPQ09noU3InyvqdVXuTzvJxoFKR9ukdxfh1sHu9VbpUUQvSNZ7NMKx_8xU5QQacb2NVn823qyKuZHd_WsLm5vbx75dqj0qlQzzeVs15e3SsmvdOr_yxsiXKsy5H1ccX_CQAA__-RZC-R">