[all-commits] [llvm/llvm-project] e9adcc: [libc++][regex] Correctly adjust match prefix for ...
Konstantin Varlamov via All-commits
all-commits at lists.llvm.org
Fri Jun 7 08:15:25 PDT 2024
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: e9adcc488f96a9f2b8c4344f5e3c7ca6639b9562
https://github.com/llvm/llvm-project/commit/e9adcc488f96a9f2b8c4344f5e3c7ca6639b9562
Author: Konstantin Varlamov <varconsteq at gmail.com>
Date: 2024-06-07 (Fri, 07 Jun 2024)
Changed paths:
M libcxx/include/regex
A libcxx/test/std/re/re.alg/re.alg.replace/zero_length_matches.pass.cpp
M libcxx/test/std/re/re.iter/re.regiter/re.regiter.incr/post.pass.cpp
Log Message:
-----------
[libc++][regex] Correctly adjust match prefix for zero-length matches. (#94550)
For regex patterns that produce zero-length matches, there is one
(imaginary) match in-between every character in the sequence being
searched (as well as before the first character and after the last
character). It's easiest to demonstrate using replacement:
`std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where
each exclamation mark makes a zero-length match visible.
Currently our implementation doesn't correctly set the prefix of each
zero-length match, "swallowing" the characters separating the imaginary
matches -- e.g. when going through zero-length matches within `abc`, the
corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this
patch they will all be empty (`{'', '', '', ''}`). This happens in the
implementation of `regex_iterator::operator++`. Note that the Standard
spells out quite explicitly that the prefix might need to be adjusted
when dealing with zero-length matches in
[`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr):
> In all cases in which the call to `regex_search` returns `true`,
`match.prefix().first` shall be equal to the previous value of
`match[0].second`... It is unspecified how the implementation makes
these adjustments.
[Reproduction example](https://godbolt.org/z/8ve6G3dav)
```cpp
#include <iostream>
#include <regex>
#include <string>
int main() {
std::string str = "abc";
std::regex empty_matching_pattern("");
{ // The underlying problem is that `regex_iterator::operator++` doesn't update
// the prefix correctly.
std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e;
std::cout << "\"";
for (; i != e; ++i) {
const std::ssub_match& prefix = i->prefix();
std::cout << prefix.str();
}
std::cout << "\"\n";
// Before the patch: ""
// After the patch: "abc"
}
{ // `regex_replace` makes the problem very visible.
std::string replaced = std::regex_replace(str, empty_matching_pattern, "!");
std::cout << "\"" << replaced << "\"\n";
// Before the patch: "!!!!"
// After the patch: "!a!b!c!"
}
}
```
Fixes #64451
rdar://119912002
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list