<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/154408>154408</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[libc++] <regex>: Unmatched backrefs should always succeed in ECMAScript mode.
</td>
</tr>
<tr>
<th>Labels</th>
<td>
libc++
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
SainoNamkho
</td>
</tr>
</table>
<pre>
ECMA-262, 1999, 15.10.2.9 states
> An escape sequence of the form `\` followed by a nonzero decimal number $n$
matches the result of the nth set of capturing parentheses (see 15.10.2.11). It is an error if the regular
expression has fewer than $n$ capturing parentheses. If the regular expression has $n$ or more capturing
parentheses but the nth one is **undefined** because it hasn't captured anything, then the backreference
always succeeds.
So this js code
``` js
for (let pattern of [/(\1)a/, /\1(a)/, /(a()|)\2a/, /(b()|)\2a/, /(b()|)\2/])
if (pattern.test('a'))
console.log(`${pattern} matches "a".`)
else
console.log(`${pattern} does not match "a".`)
```
prints
```
/(\1)a/ matches "a".
/\1(a)/ matches "a".
/(a()|)\2a/ matches "a".
/(b()|)\2a/ matches "a".
/(b()|)\2/ matches "a".
```
C++ implementations diverge https://godbolt.org/z/r51f4Wsas:
``` c++
#include <regex>
#include <print>
int main()
{
for (auto pattern : {
"(\\1)a", "\\1(a)",
"(a()|)\\2a", "(b()|)\\2",
}) {
try {
if (std::regex_search("a", std::regex{pattern})
)
std::println("\x1b[32m/{}/ matches.\x1b[m", pattern);
else
std::println("\x1b[31m/{}/ does not match.\x1b[m", pattern);
}
catch (const std::regex_error& e) {
if (e.code() == std::regex_constants::error_backref)
std::println("\x1b[34m/{}/: {}\x1b[m", pattern, e.what());
else
throw;
}
}
}
```
libstdc++ and msstl rejects `(\1)a` and `\1(a)` as invalid regex, libc++ accepts `(\1)a`, I believe there're no invalid backrefs in ecma flavor, but this can be dicussed in another issue.
I'm intereted in that `(a()|)\2a` and `(b()|)\2` should match `a`.
[A POC fix](https://godbolt.org/z/x6xnKjx8f)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyUVm9v46gT_jTkzWgtG8dx_CIvHGcrVT_97k5ane7lCuNxTBdDDnCb7qc_gcnfttquVdUxDM8MzzMzwKwVe4W4IcWWFLsFm9ygzeYbE0r_wcYfg160unvdfG3-X3-hK0poA1lVVeFdJFma0KQC65hDS9Ka5F-hVoCWswOCxX8nVBxB9-AGhF6bEcgqJUVDVin0Wkr9gh20r8BAafUTjYYOuRiZBDWNLRogdKkIXZK0HpnjA9qAZNBO0p1wlRvAYvjk7OAmI9QeDsygcgNatEDo2iKeA84yQqsEHh0IC0wBGqMNiD5C7yfJDElrPB4MWiu0goFZ6PEFDbiBqVNQ73tL4PEGCe5wTou1gVEbvICQtL4Oup3ceXdaoY-V0JrQelId9kJhN39Ci5xNFkE4j68ILV0ExQ6YenWDB6eNR1MBsmX8h8EejVeHpDWTL-zVgp04R-xs4pVM628a3CAsPFnguvN2XrvwB09e7V57fdYSHRyYc2iUl8BnEn0gdE2KxhPNwlcD_hVG1ozQ6mowDKz9WNn4_0VD2c10-5vTwdOO0ApIWgOAl5bQdYwxcWhdWFIyQssQS-UNuVZWS0yk3vvpVerzrtzGZaTcwSkFCfUh0CTYVNEJSoufg-k0WlDazXhv0c40-4wwQjl7N_iG37eRzUY3fH9k9AH_H5u_q8dvmH9kfb3DtG4I3RK6BTEeJI6oHHNCKwudeEazRxicO1iSz04e9rprtXSJNntCH34S-mCKrF_-Y1mwuc5dPgOH-HKhuJw6BJI3Bvd4JPnXNxNBgziR1kJ54YSKu0prUm5jCsSCYJPT54ogeQ3RglA6y3ZRLjRU__ifp4komJ-LwCeLt1JF-ulcDvQdumfGr8BIOdfGOWz_OPN6N3IpHOs6z2FeB4a-W2SGD8HL2fWtyU26XyrkspMwcl4T-JUzoZ6FY9aSYpvT0UtbbgPGOWOSs8EYfZ9c0YrkYQOnSrx7fuUvu_V3W6WfcHvaXbm7-uKxxNe-Lbg7nr6Hk4fQFeCVIjPpmISmG7QEku9IvrtfHSCZbw9hMIB9j739kxwvb_Z8ytVy9_FuG8DkZWAuZtmvSHeD0S8fETT_iv9vq1-K1rouliow1cForZNg8Am5s-EacdUCV2mwmS8X5xLyoxaEemZSdDCnJm1AivYMzDke3oPzdo_QohT4jP7QNEhoaRCUPgNGqr0HQD4y6CV79no28fAWFjhT0CJ0gk_WYudNmdIeD4S1E8az9pHQcgShHBp0s5kbmItxvdOfrzb8XoddpWAHPcnudMasUr8meiPFtoa__mygF8dwUq5_1UyPq6P639Nx7fNq0W3yrsortsBNVhbFKk_LjC6GTdHny3bFcr4q1lmRlxljRU8rxipedKusWIgNTWmRrrMqq7KsoMky7zErK97lfUHZckmWKY5MyETK59G7XwSSNlmxXKbrhWQtShvuqpReVAwZvVuYjV_1pZ32lixTKayzFxwnnAy33Ktlxe6m7ec1_K3mLnOlbaTx9ork9fFX4m_ciIODUXeYLCYjN3c8CjdMbcK1rzIfSnx9ORjt05jQh7A_S-hD3OLzhv4XAAD__5tvP5I">