[libcxx-commits] [PATCH] D129823: [libc++][ranges] Make range algorithms support proxy iterators

Konstantin Varlamov via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Sat Jul 16 03:13:25 PDT 2022


var-const added inline comments.


================
Comment at: libcxx/include/__algorithm/copy_backward.h:31
+  auto __last_iter = _IterOps<_AlgPolicy>::next(__first, __last);
+  auto __ret = std::__copy(reverse_iterator<_Iter1>(__last_iter),
+                           reverse_iterator<_Iter1>(__first),
----------------
philnik wrote:
> var-const wrote:
> > @philnik I think this was a bug; it wasn't caught because `copy_backward` tests never try the case where the sentinel isn't the same type as the iterator. This fix is enough to make `robust_against_proxy_iterators` pass; however, if I change `test_iterators` in `copy_backward` test to define `class Sent` as `= sentinel_wrapper<In>`, I get more breakages. Could you please look into this?
> Since D128864 fixes this properly I would prefer if you revert the changes here and I enable the additional tests over there instead.
Sure, reverted this (also commented out `copy_backward` in the `robust` test file and added a TODO).


================
Comment at: libcxx/include/__algorithm/inplace_merge.h:62
 {
+    using _Ops = _IterOps<_AlgPolicy>;
+
----------------
philnik wrote:
> Why are you adding an alias here? It doesn't look like much of a benefit for two calls. Ditto for a lot of other places.
Removed from here and elsewhere. If there are other places you'd like to change, please let me know.


================
Comment at: libcxx/include/__algorithm/inplace_merge.h:68
         {
             _VSTD::move(__first1, __last1, __result);
             return;
----------------
huixie90 wrote:
> should this call `std::ranges::move` for ranges policy? or `Algs<_AlgPolicy>::move` if that exists?
Very good point, I thought about this. Added a TODO for now.


================
Comment at: libcxx/include/__algorithm/inplace_merge.h:202
+                __middle, __m2, __last, __comp, __len12, __len22, __buff, __buff_size);
 //          _VSTD::__inplace_merge<_Compare>(__first, __m1, __middle, __comp, __len11, __len21, __buff, __buff_size);
             __last = __middle;
----------------
huixie90 wrote:
> while you are here, perhaps consider remove these stale comments?
I'd rather not add unrelated changes to this patch -- it's already quite large.


================
Comment at: libcxx/include/__algorithm/iterator_operations.h:50
+template <>
+struct _Algs<_RangeAlgPolicy> {
+  static constexpr auto min_element = ranges::min_element;
----------------
philnik wrote:
> var-const wrote:
> > Should this be moved to a different file? Or perhaps this file should be renamed from `iterator_operations.h` to something more generic?
> I'm already not very happy that we pull all the iterator operations in. We should avoid pulling whole algorithms in. Instead, I think the better option would be to rewrite `min_element` to support ranges like that:
> ```
> template <class _Iter, class _Sent, class _Proj, class _Comp>
> _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX11
> _Iter __min_element(_Iter __first, _Sent __last, _Comp& __comp, _Proj& __proj) {
>   if (__first == __last)
>     return __first;
> 
>   _Iter __i = __first;
>   while (++__i != __last)
>     if (std::__invoke(__comp, std::__invoke(__proj, *__i), std::__invoke(__proj, *__first)))
>       __first = __i;
>   return __first;
> }
> 
> template <class _ForwardIterator, class _Compare>
> _LIBCPP_NODISCARD_EXT inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX11 _ForwardIterator
> min_element(_ForwardIterator __first, _ForwardIterator __last, _Compare __comp)
> {
>   static_assert(__is_callable<_Compare, decltype(*__first), decltype(*__first)>::value,
>               "The comparator has to be callable");
>   auto __proj = __identity();
>   return std::__min_element(__first, __last, __comp, __proj);
> }
> ```
> and update `libcxx/algorithms/callable.verify.cpp`.
Discussed offline. I slightly prefer the current approach, but not enough to spend too much time discussing this. Went with the approach you suggested.



================
Comment at: libcxx/include/__algorithm/make_heap.h:28
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX11
 void __make_heap(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare& __comp) {
   using _CompRef = typename __comp_ref_type<_Compare>::type;
----------------
huixie90 wrote:
> we have two different patterns for these `__` functions. 
> 1. the `__last` has a different template parameter type. and the `__` calls `next` to get the last iterator. the ranges counter part directly call it.
> 2. the `__last` has the same type and the ranges counter-part calls `next` then call into `__` function.
> 
> The problem is that when these two patterns co-exists, let's say `__a` takes two different types, and `__b` takes the same type. If `__a` needs to call `__b`, it could fail because `__b` expecting the same type.
> 
> I think we should make all algorithms consistent.
This should be caught by testing, so I don't think it should cause any runtime issues, but it can definitely be confusing when reading the code. However, sometimes it reflects the nature of the algorithm -- for some algorithms it is essential that `__last` is a random-access or bidirectional iterator, and for those it makes sense to reflect that in the signature. For others, `__last` is only ever used to serve as a guard, so they can trivially support sentinels. Perhaps using different names rather than `__last` having two meanings would make this a little more readable?


================
Comment at: libcxx/include/__algorithm/sort.h:142
+                                is_arithmetic<_Tp>::value && __is_simple_comparator<_Compare>::value &&
+                                is_same<decltype(*declval<_Iter>()), _Tp>::value>;
 
----------------
huixie90 wrote:
> philnik wrote:
> > Shouldn't this be `is_same<decltype(*declval<_Iter>()), _Tp&>::value`? Otherwise this would disable the optimization for (almost) all iterators, especially raw pointers. Also, could you qualify `declval`? While it's not strictly necessary it makes error messages a lot better.
> I think you don't need to guard the `is_same<decltype(*declval<_Iter>()), _Tp>::value`
> 
> The algorithm is correct for proxy iterator, because of the correct use of `value_type`. 
> https://godbolt.org/z/c5Mh7aGxT
> 
> It'd only be would be wrong if it is using `auto`
> https://godbolt.org/z/8xqzhPhve
> 
> Proxy Iterator needs to define `value_type` in the correct way to make these swap patterns work.
> 
> Normally these swap patterns need to be combined with `iter_move` (the triple-move swap), but here we are dealing with `is_arithmetic` types so moves are just copies
Thanks a lot for catching this, it's a very unfortunate typo. I removed this condition altogether thanks to @huixie90's analysis below.


================
Comment at: libcxx/include/__algorithm/sort.h:142-166
+                                is_same<decltype(*declval<_Iter>()), _Tp>::value>;
 
 // Ensures that __c(*__x, *__y) is true by swapping *__x and *__y if necessary.
 template <class _Compare, class _RandomAccessIterator>
 inline _LIBCPP_HIDE_FROM_ABI void __cond_swap(_RandomAccessIterator __x, _RandomAccessIterator __y, _Compare __c) {
   using value_type = typename iterator_traits<_RandomAccessIterator>::value_type;
   bool __r = __c(*__x, *__y);
----------------
var-const wrote:
> huixie90 wrote:
> > philnik wrote:
> > > Shouldn't this be `is_same<decltype(*declval<_Iter>()), _Tp&>::value`? Otherwise this would disable the optimization for (almost) all iterators, especially raw pointers. Also, could you qualify `declval`? While it's not strictly necessary it makes error messages a lot better.
> > I think you don't need to guard the `is_same<decltype(*declval<_Iter>()), _Tp>::value`
> > 
> > The algorithm is correct for proxy iterator, because of the correct use of `value_type`. 
> > https://godbolt.org/z/c5Mh7aGxT
> > 
> > It'd only be would be wrong if it is using `auto`
> > https://godbolt.org/z/8xqzhPhve
> > 
> > Proxy Iterator needs to define `value_type` in the correct way to make these swap patterns work.
> > 
> > Normally these swap patterns need to be combined with `iter_move` (the triple-move swap), but here we are dealing with `is_arithmetic` types so moves are just copies
> Thanks a lot for catching this, it's a very unfortunate typo. I removed this condition altogether thanks to @huixie90's analysis below.
Thanks a lot for doing this analysis! This really makes things much simpler. I added a brief comment to the function as well.


================
Comment at: libcxx/include/__algorithm/sort.h:337-340
+        ::new ((void*)__j2) value_type(_Ops::__iter_move(__i2));
         __d.template __incr<value_type>();
         for (--__j2; __i2 != __first2 && __comp(*__first1, *--__i2); --__j2)
+          *__j2 = _Ops::__iter_move(__i2);
----------------
huixie90 wrote:
> nit: here we know that `__i2` is a pointer so `std::move(*__it2)` is actually ok? (to be confirmed)
This is a very good observation, thanks!


================
Comment at: libcxx/include/__algorithm/sort.h:560-596
+extern template _LIBCPP_FUNC_VIS void __sort<_ClassicAlgPolicy, __less<char>&, char*>(char*, char*, __less<char>&);
 #ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
-extern template _LIBCPP_FUNC_VIS void __sort<__less<wchar_t>&, wchar_t*>(wchar_t*, wchar_t*, __less<wchar_t>&);
+extern template _LIBCPP_FUNC_VIS void __sort<_ClassicAlgPolicy, __less<wchar_t>&, wchar_t*>(wchar_t*, wchar_t*, __less<wchar_t>&);
 #endif
-extern template _LIBCPP_FUNC_VIS void __sort<__less<signed char>&, signed char*>(signed char*, signed char*, __less<signed char>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<unsigned char>&, unsigned char*>(unsigned char*, unsigned char*, __less<unsigned char>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<short>&, short*>(short*, short*, __less<short>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<unsigned short>&, unsigned short*>(unsigned short*, unsigned short*, __less<unsigned short>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<int>&, int*>(int*, int*, __less<int>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<unsigned>&, unsigned*>(unsigned*, unsigned*, __less<unsigned>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<long>&, long*>(long*, long*, __less<long>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<unsigned long>&, unsigned long*>(unsigned long*, unsigned long*, __less<unsigned long>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<long long>&, long long*>(long long*, long long*, __less<long long>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<unsigned long long>&, unsigned long long*>(unsigned long long*, unsigned long long*, __less<unsigned long long>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<float>&, float*>(float*, float*, __less<float>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<double>&, double*>(double*, double*, __less<double>&);
-extern template _LIBCPP_FUNC_VIS void __sort<__less<long double>&, long double*>(long double*, long double*, __less<long double>&);
-
-extern template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<char>&, char*>(char*, char*, __less<char>&);
+extern template _LIBCPP_FUNC_VIS void __sort<_ClassicAlgPolicy, __less<signed char>&, signed char*>(signed char*, signed char*, __less<signed char>&);
+extern template _LIBCPP_FUNC_VIS void __sort<_ClassicAlgPolicy, __less<unsigned char>&, unsigned char*>(unsigned char*, unsigned char*, __less<unsigned char>&);
+extern template _LIBCPP_FUNC_VIS void __sort<_ClassicAlgPolicy, __less<short>&, short*>(short*, short*, __less<short>&);
----------------
huixie90 wrote:
> hmm
This is reverted now due to @philnik's comment (it breaks the ABI).


================
Comment at: libcxx/src/algorithm.cpp:16
 
-template void __sort<__less<char>&, char*>(char*, char*, __less<char>&);
+template void __sort<_ClassicAlgPolicy, __less<char>&, char*>(char*, char*, __less<char>&);
 #ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
----------------
philnik wrote:
> var-const wrote:
> > @ldionne This gives me some concern. Do these changes look reasonable to you? If yes, should we also add specializations for `_RangeAlgPolicy`?
> I'm quite certain that this is an ABI break, since template parameters are mangled. I think you have to revert these changes and add a `__sort_policy` or something like that as a dispatch function. Regarding `_RangeAlgPolicy` I think we should wait a few releases and try things out in the unstable ABI before effectively killing the possibility of changing the ABI in any way (assuming we want to do that at all).
Thanks a lot for calling out the ABI break, I was concerned about that and it makes sense. There's no choice but to revert this change to the template instantiation and come up with a different approach.

For the different approach, I would strongly prefer to avoid:
- bloating the binary;
- duplicating internal sorting functions for ranges and non-ranges (maintenance hazard).

We discussed this offline, and you suggested (IIUC -- feel free to correct my summary) essentially to rename the functions that are being explicitly instantiated, then add one-liner functions with the old instantiated names that just forward arguments to the renamed functions. IIUC, inlining would make sure both the forwarding function _and_ the actual implementation function end up in the dylib, even though only the forwarding function is being instantiated. This sounds like a good approach, but I'm _really_ concerned about fiddling with the ABI so close to the branch cut. It would also take me a while to verify that this works correctly (i.e., no or negligible binary bloat, implementation functions are actually in the dylib).

With that in mind, I decided to go with a different approach. It might be suboptimal long-term, but it's easy to implement and shouldn't affect the ABI at all. I want to land this patch ASAP because the intrusive changes to sort and heap algorithms affect a few of the unimplemented ranges patches. As long as this patch makes things better and doesn't prevent us from using a different approach later, I think it should be good to land.

What I did was to keep the same number of template arguments in `__sort`, `__sort5`, and `__insertion_sort_incomplete` (the ones being explicitly instantiated) and attach the policy tag to the comparator instead. The way it works is:
- when invoked via the range algorithm, wrap the comparator inside a simple struct that holds a reference to the comparator and defines two nested typedefs, one for the policy and one for the comparator type;
- when invoked via the classic algorithm, just pass the pristine comparator like before. When unwrapping the comparator, the lack of a nested typedef for the policy is taken to mean `_ClassicAlgPolicy`.

It's certainly complicated and not pretty, but I think it solves the proxy iterator problem, shouldn't cause any performance regressions and leaves the door open for changing this in the future. What do you think?


================
Comment at: libcxx/src/algorithm.cpp:18-52
+template void __sort<_ClassicAlgPolicy, __less<wchar_t>&, wchar_t*>(wchar_t*, wchar_t*, __less<wchar_t>&);
 #endif
-template void __sort<__less<signed char>&, signed char*>(signed char*, signed char*, __less<signed char>&);
-template void __sort<__less<unsigned char>&, unsigned char*>(unsigned char*, unsigned char*, __less<unsigned char>&);
-template void __sort<__less<short>&, short*>(short*, short*, __less<short>&);
-template void __sort<__less<unsigned short>&, unsigned short*>(unsigned short*, unsigned short*, __less<unsigned short>&);
-template void __sort<__less<int>&, int*>(int*, int*, __less<int>&);
-template void __sort<__less<unsigned>&, unsigned*>(unsigned*, unsigned*, __less<unsigned>&);
-template void __sort<__less<long>&, long*>(long*, long*, __less<long>&);
-template void __sort<__less<unsigned long>&, unsigned long*>(unsigned long*, unsigned long*, __less<unsigned long>&);
-template void __sort<__less<long long>&, long long*>(long long*, long long*, __less<long long>&);
-template void __sort<__less<unsigned long long>&, unsigned long long*>(unsigned long long*, unsigned long long*, __less<unsigned long long>&);
-template void __sort<__less<float>&, float*>(float*, float*, __less<float>&);
-template void __sort<__less<double>&, double*>(double*, double*, __less<double>&);
-template void __sort<__less<long double>&, long double*>(long double*, long double*, __less<long double>&);
+template void __sort<_ClassicAlgPolicy, __less<signed char>&, signed char*>(signed char*, signed char*, __less<signed char>&);
+template void __sort<_ClassicAlgPolicy, __less<unsigned char>&, unsigned char*>(unsigned char*, unsigned char*, __less<unsigned char>&);
+template void __sort<_ClassicAlgPolicy, __less<short>&, short*>(short*, short*, __less<short>&);
+template void __sort<_ClassicAlgPolicy, __less<unsigned short>&, unsigned short*>(unsigned short*, unsigned short*, __less<unsigned short>&);
+template void __sort<_ClassicAlgPolicy, __less<int>&, int*>(int*, int*, __less<int>&);
----------------
huixie90 wrote:
> Can we introduce some new names like `__sort_with_al_policy`, which is the common sharable code between classic and ranges, and we keep `__sort` only for classic algorithm which calls the `__sort_with_al_policy` with classic policy so that these symbols are stable and have the same behaviour
So far I went with wrapping the policy tag, which seems to be the least verbose and least intrusive change (at the cost of some complexity). See one of my comments above in a thread started by @philnik for a bit more context. Please let me know what you think.


================
Comment at: libcxx/test/std/algorithms/ranges_robust_against_proxy_iterators.compile.pass.cpp:1
+//===----------------------------------------------------------------------===//
+//
----------------
philnik wrote:
> IIUC some types of problems are only obvious when looking at the output of the algorithms, so I think it's not enough to ensure that the algorithms can be instantiated.
I have somewhat mixed feelings about this. In short, I feel that the Pareto principle applies and this test would catch, say, 80% of the issues for 20% of the effort -- I expect that compilation issues are a lot more likely than subtle incorrect behavior.

I would be against adding specific test cases for proxy iterators across all algorithms. However, I think we have algorithms where `ProxyIterator` is simply added as one more iterator type to instantiate a `test_iterators` function, which I think is great and should offer sufficient testing. But even with that, I still feel this test file has value because it's compact. It's easy to maintain and extend (for example, adding another proxy iterator type to this file is way easier than going across all the ~100 algorithm test files). It's also way easier to verify that every relevant algorithm is tested in this file rather than going through all of our algorithm tests. I think those upsides outweigh the incomplete coverage.

I changed this to be a runtime test, but I don't want to block this patch on adding checks for the output -- I think it improves our test coverage even without those checks which we can add in a follow-up.


================
Comment at: libcxx/test/std/algorithms/ranges_robust_against_proxy_iterators.compile.pass.cpp:14
+//
+// Range algorithms should work with proxy iterators. For example, the implementations should use `iter_swap` (which is
+// a customization point) rather than plain `swap` (which might not work with certain valid iterators).
----------------
huixie90 wrote:
> It is great to have a test for `iter_swap`. But this test won't cover the `iter_move`. If an algorithm doesn't use `iter_move` but use `std::move(*it)`, it would still compile (for copyable types) because `value_type x = std::move(*it)` would probably ends with copies.
> 
> For algorithms that do move stuff, we should probably test them with proxy iterator of move only types so that it will catch incorrect use of `std::move`
This is a great point and actually something I thought about -- we should probably test _all_ customization points. I'd prefer to do this in a follow-up, though -- I think this patch is already quite big and it makes the coverage strictly better even if it doesn't achieve one hundred percent coverage. Does that sound reasonable to you? To be clear, I plan to do the follow-up immediately after landing this. I want to land this patch ASAP because the changes to `sort.h` and friends are going to cause a lot of merge conflicts for the not-yet-implemented algorithms (e.g. `inplace_merge`).

By the way, do you think we should try to add all checks to the same `ProxyIterator` class or create a separate test iterator (and sentinel and range) for each customization point?


================
Comment at: libcxx/test/std/algorithms/ranges_robust_against_proxy_iterators.compile.pass.cpp:28-45
+
+template <class Func, std::ranges::range Input, class ...Args>
+void test(Func&& func, Input& in, Args&& ...args) {
+  func(in.begin(), in.end(), std::forward<Args>(args)...);
+  func(in, std::forward<Args>(args)...);
+}
+
----------------
huixie90 wrote:
> these tests are great and they can catch some problems immediately. and thanks for adding them.
> 
> However, since these tests don't actually verify the results of the algorithms, I think we should add additional tests in individual tests in the case-by-case manner.
> 
> For example, for sorting, we can test given a vector<int>, we can create a ProxyRange, sort the ProxyRange and verify the original vector is sorted
> 
Please see my other comment above -- in short, I think it would be great to add `ProxyIterator` as one of the iterator types that instantiate `test_iterators` across our algorithms. I wouldn't add specific test cases, though, because I don't think those can be maintained well across all the ~100 algorithms. In either case, this is something that I feel should be done in a follow-up.


================
Comment at: libcxx/test/std/algorithms/ranges_robust_against_proxy_iterators.compile.pass.cpp:152
+  //test(std::ranges::prev_permutation, in, binary_pred);
+  //test(std::ranges::next_permutation, in, binary_pred);
+}
----------------
huixie90 wrote:
> var-const wrote:
> > Uninitialized memory algorithms don't work with `ProxyIterator` because it doesn't satisfy `nothrow-input-iterator` that requires the result of dereferencing the iterator to be the same as `iter_reference_t`. I'm not sure if it's possible to write a different valid proxy iterator that would satisfy that constraint but still require special handling in the implementation. @huixie90 What do you think?
> The `nothrow-xxx-iterator` explicitly requires `std::same_as<std::remove_cvref_t<std::iter_reference_t<I>>, std::iter_value_t<I>>`, which basically means `value_type& == reference`, which rules out all the proxy iterators. It makes sense for writing to uninitialized memory because we do need the address of `*it` to be something that we can write to (not the proxies). So for most uninitialized algorithms, they are designed not to work with proxies. So we can't test them.
> 
> However, there are two exceptions. for `uninitialized_copy` and `uninitialized_move`, the inputs and outputs are separate. So in theory the inputs can be `ProxyRange` and the outputs can be a simple array which does model  `nothrow-xxx-iterator` 
Thanks a lot for looking into this! It's very helpful. It seems that the following constraint:
```
    requires constructible_from<iter_value_t<_OutputIterator>, iter_reference_t<_InputIterator>>
```
also prevents proxy iterators from being used with `uninitialized_{copy,move}`?


================
Comment at: libcxx/test/support/test_iterators.h:881-882
 
+  // If `T` is a reference type, the default assignment operator will be deleted.
+  constexpr Proxy& operator=(const Proxy& rhs) {
+    data = rhs.data;
----------------
huixie90 wrote:
> hmm.  the two overloads above should already cover this. see the test
> https://github.com/llvm/llvm-project/blob/8922adf646eead207fb367dace80bba2abfab2ad/libcxx/test/support/test.support/test_proxy.pass.cpp#L63
> 
> or did I missing something?
I don't want to spend too much time investigating this just because of the deadline coming soon (although it's a pretty interesting issue). The test fails when the assigned type is `const` (which currently isn't tested).

My (unconfirmed) suspicion is that the implicitly-generated default `operator=` is a better match when the type on the right side is `const Proxy&`. Choosing between a template and a non-template function, overload resolution prefers a non-template function if both are an equally good match; `const Proxy&` is an exact match for the signature of the default `operator=`.

Confusingly, however, this works for the move assignment operator, even though the same logic about it being a better match seems to apply. cppreference says that "A deleted implicitly-declared move assignment operator is ignored by overload resolution". Presumably this doesn't apply to a deleted implicitly-declared _copy_ assignment operator, which would explain the difference in behavior.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129823/new/

https://reviews.llvm.org/D129823



More information about the libcxx-commits mailing list