[libcxx-commits] [PATCH] D131668: [libc++] Implement P2438R2 (std::string::substr() &&)

Tue Aug 23 17:54:40 PDT 2022

var-const added inline comments.

================
Comment at: libcxx/include/string:1297
+    basic_string substr(size_type __pos = 0, size_type __n = npos) const& {
+      return basic_string(*this, __pos, __n, __alloc());
+    }
----------------
philnik wrote:
> var-const wrote:
> > Mordante wrote:
> > > philnik wrote:
> > > > Mordante wrote:
> > > > > philnik wrote:
> > > > > > Mordante wrote:
> > > > > > > This seems to be https://wg21.link/lwg3752, can you mention in the status page we implement this not yet accepted LWG-issue?
> > > > > > I wouldn't consider LWG3752 to be implemented. It doesn't propose a solution and IMO the right thing would be to use `select_on_container_copy_construction` on the `const&` overload and passing the allocator on the `&&` overload.
> > > > > But this deviates from the wording in the paper, there it's defined as
> > > > > `return basic_string(*this, pos, n);` which uses `basic_string`'s default allocator.
> > > > > So there should be some comments as why we have this non-conforming behaviour.
> > > > I don't know why we have the non-conforming behaviour. I just didn't want to change the behaviour in this patch. I've filed https://llvm.org/PR57190.
> > > I still would like a comment in the code mentioning this is non-standard behaviour and a link to the bug report.
> > > That way when somebody sees this and wonders why it's clear this was done intentionally.
> > Is the proposed resolution of LWG3752 to pass the allocator? I couldn't find the discussion it refers to, but because it says "we will fix it", I take it to mean that the current behavior, i.e., passing `__alloc()` should be changed, and the paper merely asks for confirmation.
> > 
> There is no officially proposed solution (as you can see). If you're asking if that's what the standard says, yes.
In that case, I feel we shouldn't pass `__alloc` for the rvalue overload -- yes, it would be inconsistent, but I don't think we should introduce more divergence and have more stuff to fix later. This is definitely something to check with @ldionne.

================
Comment at: libcxx/include/string:874
     _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX17
-    basic_string(const basic_string& __str, size_type __pos,
-                 const _Allocator& __a = _Allocator());
+    basic_string(const basic_string& __str, size_type __pos, const _Allocator& __a = _Allocator());
+
----------------
philnik wrote:
> var-const wrote:
> > I'm a little surprised this constructor is not conditionally `noexcept`. Do you know if this is deliberate or an omission in the paper?
> Why are you surprised this isn't conditionally `noexcept`? Only the default constructor is conditionally `noexcept`, and the allocator and move constructors are always `noexcept`. Nothing else is.
To be clear, I'm surprised that this isn't `noexcept` in some way (I just think it would have to be conditionally, but I didn't give much thought to it), since it's similar to the move constructor in many ways.

================
Comment at: libcxx/include/string:888
+      auto __len = std::min(__n, __str.size() - __pos);
+      if (__alloc_traits::is_always_equal::value || __alloc == __str.__alloc()) {
+        __r_.first() = __str.__r_.first();
----------------
philnik wrote:
> var-const wrote:
> > Can you confirm that the `is_always_equal` check optimizes away even without using `if constexpr`?
> Do you mean something like [this](https://godbolt.org/z/K31PnjMMx)?
The interesting part, IMO, is that it depends on the template type, but I presume it won't make a difference to the compiler.

================
Comment at: libcxx/include/string:892
+
+        _Traits::move(data(), data() + __pos, __len);
+        __set_size(__len);
----------------
philnik wrote:
> philnik wrote:
> > var-const wrote:
> > > var-const wrote:
> > > > Consider a comment like `Move the given string then shrink it in place`.
> > > Reevaluating `data()` calls `to_address()` and `is_long()` repeatedly. Would it make sense to store the result in an intermediate variable?
> > If the compiler can't optimize it IMO we should give the compiler hints on `data()` instead of decreasing the readability of the code.
> I think this comment is more misleading than helpful. `_Traits::move()` and `__set_size()` seem to me quite intuitive. `shrink in place` sounds a lot like we `shrink_to_fit()` here, which we definitely don't want to do.
Personally, I find using a variable more readable (for me, the repeated function calls are "noisy" and add extra cognitive overhead of thinking whether the function result might change upon subsequent calls). Also, performance in debug mode is a common concern among users.

================
Comment at: libcxx/include/string:896
+      } else {
+        __init(__str.data() + __pos, __len);
+      }
----------------
philnik wrote:
> var-const wrote:
> > Consider adding a comment like `Perform a copy because allocators have different types`.
> Would it make more sense to rename `__init` to something like `__init_from_char_array` to make it obvious that this path copies the array?
I'd still prefer a comment. Even `__init_from_char_array` doesn't make it obvious that a copy is performed, and the part about allocators having different types is helpful as well.

================
Comment at: libcxx/include/string:1303
+    _LIBCPP_HIDE_FROM_ABI constexpr
+    basic_string substr(size_type __pos = 0, size_type __n = npos) const& {
+      return basic_string(*this, __pos, __n, __alloc());
----------------
philnik wrote:
> var-const wrote:
> > +1 to @Mordante's comment re. ABI stability. This should also be documented in the commit message (either why the ABI break doesn't happen in our case, or what we are doing to mitigate it).
> It's simple actually. The mangling for the functions is different (https://godbolt.org/z/P8dfdYPs7) and we don't have it in our ABI.
Please add that explanation to the patch / commit description.

================
Comment at: libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp:35
+    S substr(std::move(orig), pos);
+    LIBCPP_ASSERT(orig.__invariants());
+    LIBCPP_ASSERT(orig.empty());
----------------
var-const wrote:
> Optional: consider moving the checks into a helper function to cut down on boilerplate.
Friendly ping (feel free to push back).

================
Comment at: libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp:61
+#ifndef TEST_HAS_NO_EXCEPTIONS
+  else if (!std::is_constant_evaluated()) {
+    try {
----------------
philnik wrote:
> var-const wrote:
> > I'd split the normal case and the exceptional case in two functions. These are very different concerns, and I find the current triple-check (if `pos` is not valid, and it's not constant-evaluated, and exceptions are enabled) a little overwhelming. Making the `expected` parameter double as essentially a boolean also doesn't feel very clean. This would also be clearer at the call site (I'd group all normal cases and all exception cases together, i.e. a bunch of `test_string_pos`, `test_string_pos_n`, etc., followed by a bunch of `test_string_pos_exception`, `test_string_pos_n_exception`, etc.
> > 
> > I know it will increase the number of `test_string_*` functions even more, but I think it's the lesser evil. The amount of code won't increase substantially, and the smaller functions will be easier to read.
> I really wouldn't group normal and exception cases together. Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else. IMO it's also cleaner to always call the same function for testing the same function in this case. Having the `"exception"` at the end makes it quite obvious which are throwing while making the rest of the arguments easily comparable.
> Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else.
I don't see how it would be less readable. The current state is
```
  test_string_pos<S>(STR(""), 0, STR(""));
  test_string_pos<S>(STR(""), 1, STR("exception"));
  test_string_pos<S>(STR("Banane"), 1, STR("anane"));
  test_string_pos<S>(STR("Banane"), 6, STR(""));
  test_string_pos<S>(STR("Banane"), 7, STR("exception"));
  test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
```
I would prefer
```
  test_string_pos<S>(STR(""), 0, STR(""));
  test_string_pos<S>(STR("Banane"), 1, STR("anane"));
  test_string_pos<S>(STR("Banane"), 6, STR(""));
  test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));

  test_string_pos_exception<S>(STR(""), 1);
  test_string_pos_exception<S>(STR("Banane"), 7);
```
Which I think makes it even more obvious that now we're testing corner cases. Alternatively, keeping the same order is fine too:
```
  test_string_pos<S>(STR(""), 0, STR(""));
  test_string_pos_exception<S>(STR(""), 1);
  test_string_pos<S>(STR("Banane"), 1, STR("anane"));
  test_string_pos<S>(STR("Banane"), 6, STR(""));
  test_string_pos_exception<S>(STR("Banane"), 7);
  test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
```

Personally, I think this would be more readable, or at the very least no less readable, while making the existing issues (overcomplicated condition in `test_string_pos`, magic value in `expected`) go away.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131668/new/

https://reviews.llvm.org/D131668