[libcxx-commits] [PATCH] D131668: [libc++] Implement P2438R2 (std::string::substr() &&)

Louis Dionne via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Sep 8 09:14:37 PDT 2022


ldionne accepted this revision as: ldionne.
ldionne added a comment.

The implementation LGTM, I have a few comments.



================
Comment at: libcxx/include/string:880
+    _LIBCPP_HIDE_FROM_ABI constexpr
+    basic_string(basic_string&& __str, size_type __pos, size_type __n, const _Allocator& __alloc = _Allocator())
+        : __r_(__default_init_tag(), __alloc) {
----------------
var-const wrote:
> philnik wrote:
> > var-const wrote:
> > > Other constructors also insert the newly-created string into the debug database, should this be done here as well?
> > Yes, I think so. I couldn't find any comprehensive tests for the debug mode though. Do you know if we have any/where they are?
> I suspect we don't have those tests, but let's confirm with @ldionne.
I don't think we have extremely comprehensive tests for the debug mode for string, but we do have e.g. `libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.index.pass.cpp` & friends in the same directory.

I think we do want a simple test to make sure that the added constructor works properly with the debug mode. Concretely, I would imagine something like:

```
std::string s = "hello world";
std::string s2(std::move(s), 0, 5); // "hello"
std::string::iterator i = s2.begin();
assert(i[0] == 'h');
TEST_LIBCPP_ASSERT_FAILURE(i[5], "Attempted to subscript an iterator outside its valid range");
```


================
Comment at: libcxx/include/string:896
+      } else {
+        __init(__str.data() + __pos, __len);
+      }
----------------
philnik wrote:
> var-const wrote:
> > philnik wrote:
> > > var-const wrote:
> > > > Consider adding a comment like `Perform a copy because allocators have different types`.
> > > Would it make more sense to rename `__init` to something like `__init_from_char_array` to make it obvious that this path copies the array?
> > I'd still prefer a comment. Even `__init_from_char_array` doesn't make it obvious that a copy is performed, and the part about allocators having different types is helpful as well.
> I'm not sure if you are just a bit sloppy with the wording or if you misunderstand something. The allocators always have the same type, but the allocators may not compare equal.
I think what @var-const meant was:

```
// Perform a copy because the allocators are not compatible.
```

The allocators have the same type, they just compare unequal, which in alllocator-land means that the memory allocated by one can't be freed by the other.


================
Comment at: libcxx/include/string:899
+
+      std::__debug_db_insert_c(this);
+    }
----------------
var-const wrote:
> The move constructor also performs
> ```
>     if (__is_long())
>         std::__debug_db_swap(this, &__str);
> ```
> I presume this is necessary here as well -- @ldionne would know more.
Test needed!


================
Comment at: libcxx/include/string:1297
+    basic_string substr(size_type __pos = 0, size_type __n = npos) const& {
+      return basic_string(*this, __pos, __n, __alloc());
+    }
----------------
philnik wrote:
> Mordante wrote:
> > Mordante wrote:
> > > var-const wrote:
> > > > philnik wrote:
> > > > > var-const wrote:
> > > > > > Mordante wrote:
> > > > > > > philnik wrote:
> > > > > > > > Mordante wrote:
> > > > > > > > > philnik wrote:
> > > > > > > > > > Mordante wrote:
> > > > > > > > > > > This seems to be https://wg21.link/lwg3752, can you mention in the status page we implement this not yet accepted LWG-issue?
> > > > > > > > > > I wouldn't consider LWG3752 to be implemented. It doesn't propose a solution and IMO the right thing would be to use `select_on_container_copy_construction` on the `const&` overload and passing the allocator on the `&&` overload.
> > > > > > > > > But this deviates from the wording in the paper, there it's defined as
> > > > > > > > > `return basic_string(*this, pos, n);` which uses `basic_string`'s default allocator.
> > > > > > > > > So there should be some comments as why we have this non-conforming behaviour.
> > > > > > > > I don't know why we have the non-conforming behaviour. I just didn't want to change the behaviour in this patch. I've filed https://llvm.org/PR57190.
> > > > > > > I still would like a comment in the code mentioning this is non-standard behaviour and a link to the bug report.
> > > > > > > That way when somebody sees this and wonders why it's clear this was done intentionally.
> > > > > > Is the proposed resolution of LWG3752 to pass the allocator? I couldn't find the discussion it refers to, but because it says "we will fix it", I take it to mean that the current behavior, i.e., passing `__alloc()` should be changed, and the paper merely asks for confirmation.
> > > > > > 
> > > > > There is no officially proposed solution (as you can see). If you're asking if that's what the standard says, yes.
> > > > In that case, I feel we shouldn't pass `__alloc` for the rvalue overload -- yes, it would be inconsistent, but I don't think we should introduce more divergence and have more stuff to fix later. This is definitely something to check with @ldionne.
> > > For me it's fine to keep this since we already have this behaviour in our original `substr` code. So I don't mind waiting for the final resolution of LWG3752 before changing the code.
> > Since @ldionne mentioned on Discord LWG-3752 probably will be closed as NAD I agree more with @var-const.(I had expected LWG to mandate the current MSVC STL and libc++ implementation.)
> I'd still rather do it in a separate patch, because it is a significant behaviour change that has nothing to do with P2438. The overloads should not have different behaviour IMO.
I agree with @philnik here: let's implement this consistently with our current `substr` behavior, but then fix both `substr` behaviors to be standards conforming per LWG3752's expected resolution ASAP.

Alternatively, we could implement LWG3752's expected resolution first, and then land this patch consistently with the fixed `substr()`. I don't care in what order this happens, as long as we don't ship another release without LWG3752's resolution. But I think we should really avoid introducing a difference between our two `substr()` implementations.


================
Comment at: libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp:28-50
+template <class S>
+constexpr void test_string_pos(S orig, typename S::size_type pos, S expected) {
+  if (pos <= orig.size()) {
+#ifdef _LIBCPP_VERSION
+    DisableAllocationGuard g;
+#endif
+    S substr(std::move(orig), pos);
----------------
Instead, I would suggest the following:

```
template <class S>
constexpr void test_string_pos(S orig, typename S::size_type pos, S expected) {
#ifdef _LIBCPP_VERSION
  DisableAllocationGuard g;
#endif
  S substr(std::move(orig), pos);
  LIBCPP_ASSERT(orig.__invariants());
  LIBCPP_ASSERT(orig.empty());
  LIBCPP_ASSERT(substr.__invariants());
  assert(substr == expected);
}

constexpr struct should_throw_exception_t { } should_throw_exception{};

template <class S>
constexpr void test_string_pos(S orig, typename S::size_type pos, should_throw_exception_t) {
#ifndef TEST_HAS_NO_EXCEPTIONS
  if (!std::is_constant_evaluated()) {
    try {
      [[maybe_unused]] S substr = S(std::move(orig), pos);
      assert(false);
    } catch (const std::out_of_range&) {

    }
  }
#endif
}
```

We could also use a differently-named function, but you seemed to like the naming consistency at the callsite IIUC. Basically, what I care about is that we remove this logic from the test since it's so easy to remove, and also since those two functions do entirely different things.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131668/new/

https://reviews.llvm.org/D131668



More information about the libcxx-commits mailing list