[libcxx-commits] [PATCH] D131668: [libc++] Implement P2438R2 (std::string::substr() &&)

Tue Sep 6 10:47:19 PDT 2022

philnik marked 4 inline comments as done.
philnik added inline comments.

================
Comment at: libcxx/include/string:1297
+    basic_string substr(size_type __pos = 0, size_type __n = npos) const& {
+      return basic_string(*this, __pos, __n, __alloc());
+    }
----------------
Mordante wrote:
> Mordante wrote:
> > var-const wrote:
> > > philnik wrote:
> > > > var-const wrote:
> > > > > Mordante wrote:
> > > > > > philnik wrote:
> > > > > > > Mordante wrote:
> > > > > > > > philnik wrote:
> > > > > > > > > Mordante wrote:
> > > > > > > > > > This seems to be https://wg21.link/lwg3752, can you mention in the status page we implement this not yet accepted LWG-issue?
> > > > > > > > > I wouldn't consider LWG3752 to be implemented. It doesn't propose a solution and IMO the right thing would be to use `select_on_container_copy_construction` on the `const&` overload and passing the allocator on the `&&` overload.
> > > > > > > > But this deviates from the wording in the paper, there it's defined as
> > > > > > > > `return basic_string(*this, pos, n);` which uses `basic_string`'s default allocator.
> > > > > > > > So there should be some comments as why we have this non-conforming behaviour.
> > > > > > > I don't know why we have the non-conforming behaviour. I just didn't want to change the behaviour in this patch. I've filed https://llvm.org/PR57190.
> > > > > > I still would like a comment in the code mentioning this is non-standard behaviour and a link to the bug report.
> > > > > > That way when somebody sees this and wonders why it's clear this was done intentionally.
> > > > > Is the proposed resolution of LWG3752 to pass the allocator? I couldn't find the discussion it refers to, but because it says "we will fix it", I take it to mean that the current behavior, i.e., passing `__alloc()` should be changed, and the paper merely asks for confirmation.
> > > > > 
> > > > There is no officially proposed solution (as you can see). If you're asking if that's what the standard says, yes.
> > > In that case, I feel we shouldn't pass `__alloc` for the rvalue overload -- yes, it would be inconsistent, but I don't think we should introduce more divergence and have more stuff to fix later. This is definitely something to check with @ldionne.
> > For me it's fine to keep this since we already have this behaviour in our original `substr` code. So I don't mind waiting for the final resolution of LWG3752 before changing the code.
> Since @ldionne mentioned on Discord LWG-3752 probably will be closed as NAD I agree more with @var-const.(I had expected LWG to mandate the current MSVC STL and libc++ implementation.)
I'd still rather do it in a separate patch, because it is a significant behaviour change that has nothing to do with P2438. The overloads should not have different behaviour IMO.

================
Comment at: libcxx/include/string:896
+      } else {
+        __init(__str.data() + __pos, __len);
+      }
----------------
var-const wrote:
> philnik wrote:
> > var-const wrote:
> > > Consider adding a comment like `Perform a copy because allocators have different types`.
> > Would it make more sense to rename `__init` to something like `__init_from_char_array` to make it obvious that this path copies the array?
> I'd still prefer a comment. Even `__init_from_char_array` doesn't make it obvious that a copy is performed, and the part about allocators having different types is helpful as well.
I'm not sure if you are just a bit sloppy with the wording or if you misunderstand something. The allocators always have the same type, but the allocators may not compare equal.

================
Comment at: libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp:35
+    S substr(std::move(orig), pos);
+    LIBCPP_ASSERT(orig.__invariants());
+    LIBCPP_ASSERT(orig.empty());
----------------
var-const wrote:
> var-const wrote:
> > Optional: consider moving the checks into a helper function to cut down on boilerplate.
> Friendly ping (feel free to push back).
In my experience having the checks in it's own function just makes the test harder to debug, so I'd rather keep them duplicated.

================
Comment at: libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp:61
+#ifndef TEST_HAS_NO_EXCEPTIONS
+  else if (!std::is_constant_evaluated()) {
+    try {
----------------
var-const wrote:
> philnik wrote:
> > var-const wrote:
> > > I'd split the normal case and the exceptional case in two functions. These are very different concerns, and I find the current triple-check (if `pos` is not valid, and it's not constant-evaluated, and exceptions are enabled) a little overwhelming. Making the `expected` parameter double as essentially a boolean also doesn't feel very clean. This would also be clearer at the call site (I'd group all normal cases and all exception cases together, i.e. a bunch of `test_string_pos`, `test_string_pos_n`, etc., followed by a bunch of `test_string_pos_exception`, `test_string_pos_n_exception`, etc.
> > > 
> > > I know it will increase the number of `test_string_*` functions even more, but I think it's the lesser evil. The amount of code won't increase substantially, and the smaller functions will be easier to read.
> > I really wouldn't group normal and exception cases together. Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else. IMO it's also cleaner to always call the same function for testing the same function in this case. Having the `"exception"` at the end makes it quite obvious which are throwing while making the rest of the arguments easily comparable.
> > Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else.
> I don't see how it would be less readable. The current state is
> ```
>   test_string_pos<S>(STR(""), 0, STR(""));
>   test_string_pos<S>(STR(""), 1, STR("exception"));
>   test_string_pos<S>(STR("Banane"), 1, STR("anane"));
>   test_string_pos<S>(STR("Banane"), 6, STR(""));
>   test_string_pos<S>(STR("Banane"), 7, STR("exception"));
>   test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
> ```
> I would prefer
> ```
>   test_string_pos<S>(STR(""), 0, STR(""));
>   test_string_pos<S>(STR("Banane"), 1, STR("anane"));
>   test_string_pos<S>(STR("Banane"), 6, STR(""));
>   test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
> 
>   test_string_pos_exception<S>(STR(""), 1);
>   test_string_pos_exception<S>(STR("Banane"), 7);
> ```
> Which I think makes it even more obvious that now we're testing corner cases. Alternatively, keeping the same order is fine too:
> ```
>   test_string_pos<S>(STR(""), 0, STR(""));
>   test_string_pos_exception<S>(STR(""), 1);
>   test_string_pos<S>(STR("Banane"), 1, STR("anane"));
>   test_string_pos<S>(STR("Banane"), 6, STR(""));
>   test_string_pos_exception<S>(STR("Banane"), 7);
>   test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
> ```
> 
> Personally, I think this would be more readable, or at the very least no less readable, while making the existing issues (overcomplicated condition in `test_string_pos`, magic value in `expected`) go away.
I prefer the first one, so I guess we just disagree here. IMO the second one makes it a lot less obvious that we're checking corner cases, because I can't immediately see that `STR("Banane"), 7` is the exact case where it should throw, which I do immediately see if the line before was `STR("Banane"), 6`. The last one seems just messy to me, but that's nothing you can argue much about.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131668/new/

https://reviews.llvm.org/D131668