[libcxx-commits] [PATCH] D116023: [libc++][ranges] Implement `uninitialized_copy{, _n}` and `uninitialized_move{, _n}`.

Sun Jan 2 22:32:22 PST 2022

var-const added inline comments.

================
Comment at: libcxx/include/__memory/uninitialized_algorithms.h:281
+          class _Move>
+inline _LIBCPP_HIDE_FROM_ABI pair<_InputIterator, _ForwardIterator>
+__uninitialized_move(_InputIterator __ifirst, _Sentinel1 __ilast, _ForwardIterator __ofirst, _Sentinel2 __olast,
----------------
Quuxplusone wrote:
> ldionne wrote:
> > var-const wrote:
> > > Note: I could use `tuple` here to get EBO (suggested by @ldionne). I'm a little concerned about consistency with the very similar `uninitialized_copy` algorithm (though it can be argues that `uninitialized_move` already diverges quite a bit). Happy to change to `tuple` if other people feel it's appropriate here.
> > I am not utterly concerned by using `pair` and not doing EBO. I believe these algorithms will be used with stateful iterators where EBO wouldn't help most of the time anyway.
> EBO is only a concern when ABI is a concern. Here, if you're thinking about ABI, you've already lost. If this return value is ever materialized into memory, your perf is already shot to heck. (This is a general complaint of mine about C++20 Ranges //and// refactorings-to-use-more-helper-functions: the optimizer can paper over a lot of sins, but the flip side of that is that the library gradually becomes even less usable at `-O0`.)
> but the flip side of that is that the library gradually becomes even less usable at `-O0`.
It's a very valid concern, but I don't see a great way around it without a lot of code duplication (or code generation, which comes with its own set of problems).

================
Comment at: libcxx/include/__memory/uninitialized_algorithms.h:44
 #endif
-        for (; __f != __l; ++__f, (void) ++__r)
-            ::new ((void*)_VSTD::addressof(*__r)) value_type(*__f);
+    for (; __ifirst != __ilast && __idx != __olast; ++__ifirst, (void)++__idx)
+      ::new (_VSTD::__voidify(*__idx)) _ValueType(*__ifirst);
----------------
ldionne wrote:
> Note from our discussion: you noticed you are missing a test case in the ranges version when `__idx` reaches `__olast` before `__ifirst` reaches `__ilast`.
Added.

================
Comment at: libcxx/include/module.modulemap:223
       module adjacent_find            { private header "__algorithm/adjacent_find.h" }
+      module algorithm_results        { private header "__algorithm/algorithms_results.h" }
       module all_of                   { private header "__algorithm/all_of.h" }
----------------
Quuxplusone wrote:
> var-const wrote:
> > Quuxplusone wrote:
> > > (1, moot) Inconsistent pluralization here.
> > > (2) Please call this `__algorithm/in_out_result.h` for now. If there are more types that need to be rolled into this same header, we can always rename it later //if needed//.
> > > If there are more types that need to be rolled into this same header, we can always rename it later if needed.
> > 
> > There are a plenty of very similar closely-related types:
> > ```
> > in_fun_result
> > in_in_result
> > in_out_result
> > in_in_out_result
> > in_out_out_result
> > min_max_result
> > in_found_result
> > ```
> > 
> > Given how small and similar they are, it seemed to me like consolidating them in a single file makes sense. Do you feel these should be separated?
> > Given how small and similar they are, it seemed to me like consolidating them in a single file makes sense. Do you feel these should be separated?
> 
> I feel that there are two things to consider, and //right now// both of them militate in favor of `in_out_result.h`:
> (1) .h files should be named for what's in them. What's in this file right now is `in_out_result`. Maybe that'll change in a few weeks; if it does, then we can change the file's name to match its new contents.
> (2) Minimize dependencies: things should be combined if they're often used together, and split up if they're basically never used together. My ill-informed impression is that these `*_result` types are (2a) literally never used together, and (2b) in some cases, single-purpose types which really belong in their appropriate algorithm .h file; e.g. `min_max_result` probably belongs in `__algorithm/ranges_min_max.h` (and nowhere else).
Renamed to `in_out_result.h`.

Regarding whether they're used together -- I don't think they're ever used within the same function, but I'm also pretty sure that including `<algorithm>` will drag in all of them. That was part of my thinking when I decided to define them in a single file. However, having them in separate files would mean we "import" less stuff internally.

================
Comment at: libcxx/test/std/algorithms/algorithms.results/algorithms.results.compile.pass.cpp:23
+// Successful conversion.
+static_assert(std::is_convertible_v<std::ranges::in_out_result<int, long>, std::ranges::in_out_result<bool, double>>);
+
----------------
ldionne wrote:
> You seem to be missing a test case for when the members of `in_out_result` are move constructible vs copy constructible, i.e. the `const&` vs `&&` variant.
> 
> Also, there should be runtime tests actually running those conversion operators to make sure they are implemented correctly, even though I know it's an almost trivial class. (This means you'll need to move from `.compile.pass.cpp` to `.pass.cpp`)
Done.

================
Comment at: libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.copy/ranges_uninitialized_copy.pass.cpp:215-216
+
+  // An exception is thrown while objects are being created -- the existing objects should stay
+  // valid. (iterator, sentinel) overload.
+#ifndef TEST_HAS_NO_EXCEPTIONS
----------------
Quuxplusone wrote:
> ldionne wrote:
> > This comment should clarify what "existing objects" means. Are we talking about the objects in the source range?
> I assume it means "any of the existing objects in the //destination// range (which have not yet been overwritten)."
> 
> Line 224 says "the fourth object," but N is actually 3 and there's also a 5 involved, so I don't know which direction the off-by-one math is happening. IIUC, that comment could just say `Throw when constructing out[3]`.
> 
> Line 226 should be followed by an `assert(false)` (inside the try, but never hit because we expect to throw an exception).
> 
> And then circa line 229, we should do what the comment says we're doing: we should check that the value of `out[4].value` remains `5`. Without that check, I don't think this test is doing anything.
Thanks for calling it out. Should be in a better shape now, PTAL.

> I assume it means "any of the existing objects in the destination range (which have not yet been overwritten)."

Yes (because it uses copying, the fact that objects in the source range are unchanged seems like a given, but please let me know if you disagree and I can add checks). However, now that I think of it, I'm not sure this is a valid use case -- this algorithm is supposed to be used on uninitialized memory. I'm also not 100% sure that calling placement new on an object without calling its destructor first isn't undefined behavior for trivially-destructible types (though I would think so).

> Line 224 says "the fourth object," but N is actually 3 and there's also a 5 involved, so I don't know which direction the off-by-one math is happening. IIUC, that comment could just say `Throw when constructing out[3]`.

Your rephrasing is much better, thanks. The original comment meant "fourth" if using "natural" counting order (counting from one).

================
Comment at: libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.copy/ranges_uninitialized_copy.pass.cpp:60
+
+    forward_iterator<Counted*> it(in);
+    auto range = std::ranges::subrange(it, sentinel_wrapper<forward_iterator<Counted*>>(it));
----------------
Quuxplusone wrote:
> var-const wrote:
> > Quuxplusone wrote:
> > > Style nit: I'd prefer
> > > ```
> > > auto it = forward_iterator<Counted*>(in);
> > > ```
> > > for readability. ( https://quuxplusone.github.io/blog/2019/02/18/knightmare-of-initialization/#simple-guidelines-for-variable-i )
> > Thank you for the link. While I can see the benefit described there, this form also makes me wonder whether `forward_iterator` is a type or a function, and if it's the latter, then what kind of a return type it might have. This makes me hesitant to make this change.
> > While I can see the benefit described there, this form also makes me wonder whether `forward_iterator` is a type or a function
> 
> True. Of course this isn't unique to `forward_iterator<int*>`; we see that kind of ambiguity with `std::ranges::subrange` (type) on line 68 versus `std::ranges::uninitialized_copy` (function) on line 69.
> Anyway, //I// think in this case it reads better with `=` instead of `()`; but I definitely can't claim that we use `=` 100% consistently.
> 
> Btw, if you're implying "Someone working on this file might not instantly understand that `forward_iterator<It>` is one of our test iterators from `test_iterators.h`," well, I //would// disagree with that implication. :) The test iterators are used in enough places that we should expect people to be at least as familiar with that vocabulary as with the `std::ranges` vocabulary (and hopefully more so!). I grant that `auto it = mary_poppins<int*>(in)` would be legitimately mysterious as to what kind of entity `mary_poppins` is; but `auto it = forward_iterator<int*>(in)` in this codebase shouldn't have quite that same level of mystery.
> 
> No action //needed// here, but if I've reduced your hesitancy, consider un-hesitating. ;)
The problem with initialization styles is that their main benefits happen when they are used consistently. Personally, I always use the old `T foo(x, y)` unless there's a reason not to -- this is why when I see `auto foo = T(x, y)`, my thought process goes something like "//I _think_ `T` is a class, but if it were, why wouldn't they just write `T foo(x, y)`? Am I missing something? Is it actually some `T2` class with a very similar name? Or perhaps it comes from some different namespace? Oh, I see, it is in fact `T`, they just prefer to use `auto` initialization for some reason//".

This would be less of an issue if `auto foo = ...` form was used consistently, but to a certain extent the issue is unavoidable -- it's inherent in the fact that this form of initialization invokes regular functions just as happily as constructors. The fact that current code sometimes uses the same naming convention for functions as for types doesn't help, either. I generally prefer to use the most restricted form that achieves the necessary result, and in this case the `T foo(x, y)` form seems more restricted (`T` can only be a type, and thus this can only be a constructor call).

It's true that the reader can be expected to know the commonly-used types, but I think it poses new questions. At which point can a type be considered a commonly-used type within a file? It's an additional judgment call for the writer, and I don't think using two different forms of initialization makes things easier for the reader either.

The fact that I'm using the `auto = ` form with `subrange` is actually a mistake on my part -- I did presume `subrange` was a function. Yes, it is my fault for not checking, but I wouldn't _need_ to check if a different form of initialization was used.

================
Comment at: libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.copy/ranges_uninitialized_copy_n.pass.cpp:38-39
+struct NotConvertibleFromInt {};
+static_assert(!std::is_invocable_v<decltype(std::ranges::uninitialized_copy_n), int*, size_t, NotConvertibleFromInt*,
+                                   NotConvertibleFromInt*>);
+
----------------
Quuxplusone wrote:
> clang-format strikes again. I suggest
> ```
> static_assert(!std::is_invocable_v<decltype(std::ranges::uninitialized_copy_n),
>     int*, size_t, NotConvertibleFromInt*, NotConvertibleFromInt*>);
> ```
> or even better, just
> ```
> static_assert(!std::is_invocable_v<decltype(std::ranges::uninitialized_copy_n), int**, size_t, int*, int*>);
> ```
> since IIUC the point of this assertion is to test that it SFINAEs away when the source element type isn't convertible to the dest element type (e.g. `int` isn't convertible to `int*`).
I don't feel too strongly about it, but my current preference is to keep it as-is. The class name, verbose as it is, makes the purpose of the test obvious, whereas the distinction between one asterisk vs. two asterisks requires a closer look.

In my experience, manually changing the output of a formatting tool largely defeats the purpose of using one. While I agree that the proposed formatting is better, I wouldn't say that what clang-format produced is broken to justify manual editing.

================
Comment at: libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.move/ranges_uninitialized_move.pass.cpp:71-75
+template <class T>
+T&& iter_move(Iterator<T>& iter) {
+  ++iter_move_invocations;
+  return std::move(*iter);
+}
----------------
Quuxplusone wrote:
> I'd feel better if this were a non-template hidden friend:
> ```
>     friend T&& iter_move(Iterator it) {
>         ++iter_move_invocations;
>         return std::move(*it);
>     }
> ```
> Also, in writing that, I noticed that you were passing `it` by mutable reference on line 72. That's because https://eel.is/c++draft/specialized.algorithms#uninitialized.move-4 is overspecified, right? Personally I think we shouldn't bother to test that, and should file yet another LWG issue about the overspecification. This is the same overspecification problem as in `resize_and_overwrite`; and I hadn't noticed it before.
> 
> Intuition tells me that for the programmer to provide an ADL `iter_move` that works //only// with lvalues, might actually be IFNDR for some reason related to https://github.com/cplusplus/nbballot/issues/183 "implicit expression variations." But I certainly don't know for sure.
Done.

================
Comment at: libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.move/ranges_uninitialized_move.pass.cpp:86-91
+    auto result1 = std::ranges::uninitialized_move(in, in, out.begin(), out.end());
+    assert(Counted::current_objects == 0);
+    assert(Counted::total_objects == 0);
+    assert(Counted::total_copies == 0);
+    assert(result1.in == in);
+    assert(result1.out == out.begin());
----------------
Quuxplusone wrote:
> ldionne wrote:
> > Suggestion. It helps clarify that the test cases are 100% unrelated, and also allows skipping the numbering. All that for just two spaces of indentation!
> +1... and note that in many existing tests, we don't even bother to pay the 2 spaces of indentation. ;)
Done. I'd prefer to use consistent indentation everywhere.

================
Comment at: libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.move/ranges_uninitialized_move.pass.cpp:249-250
+
+  // An exception is thrown while objects are being created -- the existing objects should stay
+  // valid. (iterator, sentinel) overload.
+#ifndef TEST_HAS_NO_EXCEPTIONS
----------------
ldionne wrote:
> What http://eel.is/c++draft/specialized.algorithms#uninitialized.move-5 says is basically that if one of the move constructor throws, the objects in the source range (that have been moved-from) will now be left in a moved-from state (aka the valid-but-unspecified-state wording used in the spec).
> 
> Instead, IMO we should be asserting that the moved-from objects (i.e. `in[0]`, `in[1]`, `in[2]`) are indeed moved-from after the exception has been thrown.
Done, thanks!

================
Comment at: libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.move/ranges_uninitialized_move_n.pass.cpp:99
+
+    auto result = std::ranges::uninitialized_move_n(in, N, out.begin(), out.end());
+    assert(Counted::current_objects == N);
----------------
Quuxplusone wrote:
> ldionne wrote:
> > I don't think I've seen tests that the various `std::ranges::` algorithms are returning exactly the type specified in the standard. If that's right, we should add some.
> > 
> > In other words, it's not sufficient to be returning some type that happens to have `.in` and `.out`, they need to be exactly what the spec says.
> Pro tip: in tests that already require Concepts, this can be as simple as
> ```
> std::same_as<std::ranges::in_out_result> auto result =
>     std::ranges::uninitialized_move_n(in, N, out.begin(), out.end());
> ```
> although I guess that syntax works better for shorter types/expressions. :) Here it's not saving you much (if anything) over the old-school
> ```
> auto result = std::ranges::uninitialized_move_n(in, N, out.begin(), out.end());
> ASSERT_SAME_TYPE(decltype(result), std::ranges::in_out_result);
> ```
Added, will add similar tests to the already-committed algorithms in a follow-up.

@Quuxplusone Thanks for providing two versions! I agree that Concepts don't seem to have a huge advantage, so I went with `ASSERT_SAME_TYPE`. Happy to change, though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116023/new/

https://reviews.llvm.org/D116023