[libcxx-commits] [PATCH] D131668: [libc++] Implement P2438R2 (std::string::substr() &&)

Tue Sep 20 16:05:51 PDT 2022

var-const accepted this revision as: var-const.
var-const added inline comments.

================
Comment at: libcxx/include/string:892
+
+        _Traits::move(data(), data() + __pos, __len);
+        __set_size(__len);
----------------
var-const wrote:
> philnik wrote:
> > philnik wrote:
> > > var-const wrote:
> > > > var-const wrote:
> > > > > Consider a comment like `Move the given string then shrink it in place`.
> > > > Reevaluating `data()` calls `to_address()` and `is_long()` repeatedly. Would it make sense to store the result in an intermediate variable?
> > > If the compiler can't optimize it IMO we should give the compiler hints on `data()` instead of decreasing the readability of the code.
> > I think this comment is more misleading than helpful. `_Traits::move()` and `__set_size()` seem to me quite intuitive. `shrink in place` sounds a lot like we `shrink_to_fit()` here, which we definitely don't want to do.
> Personally, I find using a variable more readable (for me, the repeated function calls are "noisy" and add extra cognitive overhead of thinking whether the function result might change upon subsequent calls). Also, performance in debug mode is a common concern among users.
Ping.

================
Comment at: libcxx/include/string:896
+      } else {
+        __init(__str.data() + __pos, __len);
+      }
----------------
ldionne wrote:
> philnik wrote:
> > var-const wrote:
> > > philnik wrote:
> > > > var-const wrote:
> > > > > Consider adding a comment like `Perform a copy because allocators have different types`.
> > > > Would it make more sense to rename `__init` to something like `__init_from_char_array` to make it obvious that this path copies the array?
> > > I'd still prefer a comment. Even `__init_from_char_array` doesn't make it obvious that a copy is performed, and the part about allocators having different types is helpful as well.
> > I'm not sure if you are just a bit sloppy with the wording or if you misunderstand something. The allocators always have the same type, but the allocators may not compare equal.
> I think what @var-const meant was:
> 
> ```
> // Perform a copy because the allocators are not compatible.
> ```
> 
> The allocators have the same type, they just compare unequal, which in alllocator-land means that the memory allocated by one can't be freed by the other.
Yeah, I'm not sure why I was thinking about types when I was typing this. Current wording LGTM.

================
Comment at: libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp:61
+#ifndef TEST_HAS_NO_EXCEPTIONS
+  else if (!std::is_constant_evaluated()) {
+    try {
----------------
philnik wrote:
> var-const wrote:
> > philnik wrote:
> > > var-const wrote:
> > > > I'd split the normal case and the exceptional case in two functions. These are very different concerns, and I find the current triple-check (if `pos` is not valid, and it's not constant-evaluated, and exceptions are enabled) a little overwhelming. Making the `expected` parameter double as essentially a boolean also doesn't feel very clean. This would also be clearer at the call site (I'd group all normal cases and all exception cases together, i.e. a bunch of `test_string_pos`, `test_string_pos_n`, etc., followed by a bunch of `test_string_pos_exception`, `test_string_pos_n_exception`, etc.
> > > > 
> > > > I know it will increase the number of `test_string_*` functions even more, but I think it's the lesser evil. The amount of code won't increase substantially, and the smaller functions will be easier to read.
> > > I really wouldn't group normal and exception cases together. Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else. IMO it's also cleaner to always call the same function for testing the same function in this case. Having the `"exception"` at the end makes it quite obvious which are throwing while making the rest of the arguments easily comparable.
> > > Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else.
> > I don't see how it would be less readable. The current state is
> > ```
> >   test_string_pos<S>(STR(""), 0, STR(""));
> >   test_string_pos<S>(STR(""), 1, STR("exception"));
> >   test_string_pos<S>(STR("Banane"), 1, STR("anane"));
> >   test_string_pos<S>(STR("Banane"), 6, STR(""));
> >   test_string_pos<S>(STR("Banane"), 7, STR("exception"));
> >   test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
> > ```
> > I would prefer
> > ```
> >   test_string_pos<S>(STR(""), 0, STR(""));
> >   test_string_pos<S>(STR("Banane"), 1, STR("anane"));
> >   test_string_pos<S>(STR("Banane"), 6, STR(""));
> >   test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
> > 
> >   test_string_pos_exception<S>(STR(""), 1);
> >   test_string_pos_exception<S>(STR("Banane"), 7);
> > ```
> > Which I think makes it even more obvious that now we're testing corner cases. Alternatively, keeping the same order is fine too:
> > ```
> >   test_string_pos<S>(STR(""), 0, STR(""));
> >   test_string_pos_exception<S>(STR(""), 1);
> >   test_string_pos<S>(STR("Banane"), 1, STR("anane"));
> >   test_string_pos<S>(STR("Banane"), 6, STR(""));
> >   test_string_pos_exception<S>(STR("Banane"), 7);
> >   test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));
> > ```
> > 
> > Personally, I think this would be more readable, or at the very least no less readable, while making the existing issues (overcomplicated condition in `test_string_pos`, magic value in `expected`) go away.
> I prefer the first one, so I guess we just disagree here. IMO the second one makes it a lot less obvious that we're checking corner cases, because I can't immediately see that `STR("Banane"), 7` is the exact case where it should throw, which I do immediately see if the line before was `STR("Banane"), 6`. The last one seems just messy to me, but that's nothing you can argue much about.
I think the function name having `_exception` conveys that this tests corner cases a lot clearer than anything else could. On the other hand, there isn't anything that semantically connects the `6` and `7` test cases together -- this is at best very brittle (in the future, it's easy for someone to rearrange these test cases, insert a new test case between them, etc.) and requires reading the code closely and comparing test cases across the lines.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131668/new/

https://reviews.llvm.org/D131668