[all-commits] [llvm/llvm-project] e9adcc: [libc++][regex] Correctly adjust match prefix for ...

Konstantin Varlamov via All-commits all-commits at lists.llvm.org
Fri Jun 7 08:15:25 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e9adcc488f96a9f2b8c4344f5e3c7ca6639b9562
      https://github.com/llvm/llvm-project/commit/e9adcc488f96a9f2b8c4344f5e3c7ca6639b9562
  Author: Konstantin Varlamov <varconsteq at gmail.com>
  Date:   2024-06-07 (Fri, 07 Jun 2024)

  Changed paths:
    M libcxx/include/regex
    A libcxx/test/std/re/re.alg/re.alg.replace/zero_length_matches.pass.cpp
    M libcxx/test/std/re/re.iter/re.regiter/re.regiter.incr/post.pass.cpp

  Log Message:
  -----------
  [libc++][regex] Correctly adjust match prefix for zero-length matches. (#94550)

For regex patterns that produce zero-length matches, there is one
(imaginary) match in-between every character in the sequence being
searched (as well as before the first character and after the last
character). It's easiest to demonstrate using replacement:
`std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where
each exclamation mark makes a zero-length match visible.

Currently our implementation doesn't correctly set the prefix of each
zero-length match, "swallowing" the characters separating the imaginary
matches -- e.g. when going through zero-length matches within `abc`, the
corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this
patch they will all be empty (`{'', '', '', ''}`). This happens in the
implementation of `regex_iterator::operator++`. Note that the Standard
spells out quite explicitly that the prefix might need to be adjusted
when dealing with zero-length matches in
[`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr):
> In all cases in which the call to `regex_search` returns `true`,
`match.prefix().first` shall be equal to the previous value of
`match[0].second`... It is unspecified how the implementation makes
these adjustments.

[Reproduction example](https://godbolt.org/z/8ve6G3dav)
```cpp
#include <iostream>
#include <regex>
#include <string>

int main() {
  std::string str = "abc";
  std::regex empty_matching_pattern("");

  { // The underlying problem is that `regex_iterator::operator++` doesn't update
    // the prefix correctly.
    std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e;
    std::cout << "\"";
    for (; i != e; ++i) {
      const std::ssub_match& prefix = i->prefix();
      std::cout << prefix.str();
    }
    std::cout << "\"\n";
    // Before the patch: ""
    // After the patch: "abc"
  }

  { // `regex_replace` makes the problem very visible.
    std::string replaced = std::regex_replace(str, empty_matching_pattern, "!");
    std::cout << "\"" << replaced << "\"\n";
    // Before the patch: "!!!!"
    // After the patch: "!a!b!c!"
  }
}
```

Fixes #64451

rdar://119912002



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list