[libcxx-commits] [PATCH] D128611: [libc++] Implement `std::ranges::merge`

Sun Jun 26 17:09:33 PDT 2022

philnik added a comment.

I just responded to comments. I'll do another round of review later.

================
Comment at: libcxx/include/__algorithm/ranges_merge.h:42
+    template <
+        std::input_iterator _InIt1,
+        std::sentinel_for<_InIt1> _Sent1,
----------------
huixie90 wrote:
> philnik wrote:
> > philnik wrote:
> > > Please don't annotate these unnecessarily.
> > Please use `_InIter` and `_OutIter`. We discussed this on discord a while back and decided on these.
> Could you please clarify what do you mean by "annotate"? If it means the `std::`, I am happy to remove it. Or if I miss anything?
Yes, I meant the `std::`. I probably should have said "qualify".

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:103-104
+
+template <class It, class St = sentinel_wrapper<It>>
+using Range = UncheckedRange<It, St>;
+
----------------
Is there a reason you add an alias here?

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:348
+
+    // iteartor overload
+    {
----------------
You misspelled `iterator`.

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:37-38
+
+template <class... Args>
+concept canMerge = requires(Args&&... args) { std::ranges::merge(std::forward<Args>(args)...); };
+
----------------
huixie90 wrote:
> philnik wrote:
> > Could you list the arguments and default them to something sensible? The current test is very hard to read. Also, please rename this to `HasMerge` to go with the naming style in the other algorhtm tests.
> regarding "default them to something sensible", I disagree. I actually had a read on the "range_move" test and I saw this 
> ```
> static_assert(HasMoveIt<int*>);
> ```
> I was very confused about what it means. I thought it means it is ok to do `std::ranges::move(int*)`, which doesn't make sense to me.
> 
> I think for all the arguments that can have sensible defaults, the `std::ranges::merge` already have the defaults, e.g. comparator, projection. for all other arguments, two input iterators, two sentinels, one output iterator, they don't have sensible defaults otherwise the stl would give them. creating some defaults which changes the order of the arguments than the std::ranges::merge's arguments would make the tests even harder to read. 
> 
> Part of the problem is that this algorithm takes 5 iterator arguments, which makes reading the code really hard. But it is similar to read the actual caller's code (with the same arguments order). I would rather keep it as it is than creating new confusion.
> 
> regarding `HasMerge`, I am happy to change if it is the libcxx's convention. Personally I found `HasMerge` not very suitable for algorithms as it sounds like " type T has a member function `merge` " rather than `the algorithm merge can be called with these types"
For actually calling the `ranges::merge` I agree that it doesn't make sense to add any other default arguments, but we are talking about the types here, so it makes sense that `class Iter, class Sent = sentinel_wraper<Iter>`. My main problem is that there are 5 arguments, which makes it almost impossible so look at it and see what you are testing. I'd be happy if you add `HasMergeIt_R1`, `HasMergeIt_R2` and `HasMergeIt_Out` or something like that if you think the default arguments are a bit obscure (they might be for two ranges plus an iterator). There you could just have `template <class Iter, class Sent = sentinel_wrapper<Iter>>` and then instantiate the other arguments with default values. IMO the current state is really hard to read.

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:100
+
+// Negative tests: !std::sentinel_for<S2, I2>
+static_assert(!canMerge<int*, int*, int*, long*, int*>);
----------------
huixie90 wrote:
> philnik wrote:
> > We have `NotSentinelForX` for this.
> Could you please point me where I can find it? I couldn't find it through grep
Sorry, I meant `SentinelForNot`.

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:108-111
+// Negative tests: !std::indirectly_copyable<I1, Out>
+static_assert(!canMerge<int*, int*, int*, int*, const int*>);
+static_assert(!canMerge<Bar*, Bar*, Foo*, Foo*, Foo*, CompareOverloadSet>);
+static_assert(!canMerge<MoveOnly*, MoveOnly*, MoveOnly*, MoveOnly*, MoveOnly*>);
----------------
huixie90 wrote:
> philnik wrote:
> > I don't think we need three tests for the same diagnostic. This is only to ensure that we have constrained the algorithm, not to test the concepts.
> but we could have constrained the algorithm in different wrong ways. 
> 
> for example, if we constrained the algorithm `same_as<In1Iter, OutIter> && same_as<In2Iter, OutIter>`, the first two tests would still pass as the constraint is not met and only the 3rd test would catch it.
> 
> If we constrained the algorithm to just `std::indirectly_copyable<In2Iter, OutIter>` and forget about `In1Iter`, the 1st and 3rd tests would still pass as the constraint is not met and only the 2nd test would catch it.
> 
> If we constrain the algorithm as `same_as<iter_value_t<In1Iter>, iter_value_t<OutIter>>`, only the 1st test would catch it.
> 
> All these hypothetical constraints look reasonable at the first glance without really thinking about a bit more. so i think it would be good to have a bit more negative tests
Yes, technically it's possible, but I wouldn't assume malicious intent when writing the tests. If we assumed malicious intent every single test would probably fail. We don't have to check every single element of the concept. We should just make sure that `mergeable` hasn't been forgotten and for that one of them is enough.

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:263-307
+struct TracedCopyMove {
+  int copy_ctor   = 0;
+  int move_ctor   = 0;
+  int copy_assign = 0;
+  int move_assign = 0;
+
+  int data = 0;
----------------
huixie90 wrote:
> philnik wrote:
> > This class seems to be a little over-engineered. 
> > ```
> >     struct CopyOnce {
> >       bool copied = false;
> >       constexpr CopyOnce() = default;
> >       constexpr CopyOnce(const CopyOnce& other) = delete;
> >       constexpr CopyOnce& operator=(const CopyOnce& other) {
> >         assert(!other.copied);
> >         copied = true;
> >         return *this;
> >       }
> >     };
> > ```
> > should do the trick.
> I think this is insufficient.  If the algorithm is mistakenly moving the elements instead of copying the elements, with this type it would still pass because the move fallback to copy.
> 
> also if I write the test with `back_inserter` (which I think it is worth adding), I think it won't call the assignment operator and it should call the copy constructor instead
You can also add `CopyOnce(CopyOnce&&) = delete` and `CopyOnce& operator=(CopyOnce&&) = delete` if you think it's worth it. (Or `assert(false)` if there is some constraint)

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:324-347
+    using In1s = type_list<
+        cpp20_input_iterator<int*>,
+        forward_iterator<int*>,
+        bidirectional_iterator<int*>,
+        random_access_iterator<int*>,
+        contiguous_iterator<int*> >;
+
----------------
huixie90 wrote:
> philnik wrote:
> > This seems overly complicated. It would be a lot easier to just have a few functions to make the cartesian product.
> I don't think writing few functions can lead less code (as a caller) as you have to list these types anyway.
> 
> I wish I could use boost::hana but I think if we can lift this cartesian_product as a general test utilities it would be really useful. writing several concrete test functions to simulate cartesian_product wouldn't be reusable 
You can't reuse functions, but It really doesn't lead to a lot of boiler code and is easily comprehensible as it doesn't require heaps of meta-programming. 

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:352-360
+    std::array r1{3, 6, 7, 9};
+    std::array<int, 7> out;
+    std::same_as<
+        std::ranges::
+            merge_result<std::ranges::iterator_t<std::array<int, 4>&>, std::ranges::dangling, int*>> decltype(auto)
+        result = std::ranges::merge(r1, std::array{2, 3, 4}, out.data());
+    assert(result.in1 == r1.end());
----------------
huixie90 wrote:
> philnik wrote:
> > Could you make the cartesian product of this? (Just copy-paste it and change the types.)
> again, "just copy-paste" doesn't sound right to me even it is in the test. I really think we should have tested cartesian_product (for types) utility 
As I said a few times already, having met-programming makes tests harder to understand. I also don't like code duplication, but having hard-to-read tests is a lot worse.

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:356
+        std::ranges::
+            merge_result<std::ranges::iterator_t<std::array<int, 4>&>, std::ranges::dangling, int*>> decltype(auto)
+        result = std::ranges::merge(r1, std::array{2, 3, 4}, out.data());
----------------
huixie90 wrote:
> philnik wrote:
> > Please try to avoid using meta-programming to check meta-programming. This should simply be `std::array<int, 4>::iterator`, or (IMO the better option) use `a.data() + a.size()` or a C-style array and make it `int*`.
> hmm. I don't think I would call this meta-programming. I always use `std::ranges::iterator_t` because this is the "interface" function that stl gives us. We don't have to remember how to spell the exact type of the iterators of all the ranges.
Just because it's a standard interface doesn't make it not meta-programming. It's not just an alias for something. Spelling the exact type makes it correct on a glance. (In particular with `int*`).

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:385
+          std::ranges::merge(r1.begin(), r1.end(), r2.begin(), r2.end(), out.data());
+      verify(result, out);
+    }
----------------
huixie90 wrote:
> philnik wrote:
> > Could you just copy-paste the asserts? That makes it a lot easier to verify at a glance.
> I am happy to change it if this is the libcxx convention. I don't feel very comfortable to copy-paste a block of code where I can write a simple local lambda
Yes, that's normally what's done. (correct-at-a-glance, you know)

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:399
+  {
+    std::array r1{IntAndID{0, 0}, IntAndID{0, 1}, IntAndID{0, 2}};
+    std::array r2{IntAndID{1, 0}, IntAndID{1, 1}, IntAndID{1, 2}};
----------------
huixie90 wrote:
> philnik wrote:
> > Could you put the structs locally? `IntAndID` isn't the most helpful name. (When you do that you might as well rename it to `S`, but I don't care much)
> making it local struct would make defining `operator<=>` really inconvenient (I don't think you can have hidden friend for local class). 
> 
> regarding name. it is just an int for the `==` purposes and another identifier for me to track their original order is respected. what about `IntAndOrder`?
You can make `operator<=>` a member function. We don't have to use hidden friends inside tests. Regarding the name, I honestly don't think there is any sensible name we could give it. Feel free to you what you want.

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:400
+    std::array r1{IntAndID{0, 0}, IntAndID{0, 1}, IntAndID{0, 2}};
+    std::array r2{IntAndID{1, 0}, IntAndID{1, 1}, IntAndID{1, 2}};
+
----------------
huixie90 wrote:
> philnik wrote:
> > I also like CTAD, but here I think it would be easier to use `std::array<IntAndID, 3>`.
> how can you avoid writing the `IntAndID` when you init these elements?
You have to add another layer of `{}` when you initialized `array`. (It's stupid, I know, but I don't make the rules)

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:424
+
+  // Equal elements in R1 should be merged before equal elements in R2
+  {
----------------
huixie90 wrote:
> philnik wrote:
> > This seems to be a copy-paste of the last test.
> Could you please clarify what exactly means?
> 
> The previous test point is to test the algorithm is stable. equal elements within the same range are kept in the same order.
> 
> This test is to test that for equal elements across two different input ranges , the element in first range should be inserted first.
> 
> The key difference is that the previous test the equal elements are within same range, but for this test, the equal elements are across two ranges.
> 
> Anyway, I think part of the confusion was caused by the bad name of the class `IntAndID`. I think I am going to create two classes for the purpose of this test
Ah, thanks for the clarification! I looked at the code and not the input data. That's why I missed the difference.

================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp:517
+  }
+
+  return true;
----------------
huixie90 wrote:
> philnik wrote:
> > You are missing test cases for
> > - using `std::invoke`
> > - complexity
> > - that the return type of the comparator is convertible to bool
> > - value edge-cases (like copying zero-size ranges)
> will add these tests. thanks.
> 
> for `std::invoke`, do you mean by using `std::invoke(std::ranges::merge, x, y, z)` 
No, I mean that we should check that you can use `&S::compare` and `&S::projection` for comparators and projections. That ensures that we don't just call `__projection(*__iter)`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128611/new/

https://reviews.llvm.org/D128611