[libcxx-commits] [libcxx] [libc++] Introduce one-sided binary search for set_intersection so that it can be completed in sub-linear time for a large class of inputs. (PR #75230)
Louis Dionne via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Apr 22 09:07:29 PDT 2024
================
@@ -46,6 +48,52 @@ __lower_bound(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp,
return __first;
}
+// One-sided binary search, aka meta binary search, has been in the public domain for decades, and has the general
+// advantage of being \Omega(1) rather than the classic algorithm's \Omega(log(n)), with the downside of executing at
+// most 2*log(n) comparisons vs the classic algorithm's exact log(n). There are two scenarios in which it really shines:
+// the first one is when operating over non-random iterators, because the classic algorithm requires knowing the
+// container's size upfront, which adds \Omega(n) iterator increments to the complexity. The second one is when you're
+// traversing the container in order, trying to fast-forward to the next value: in that case, the classic algorithm
+// would yield \Omega(n*log(n)) comparisons and, for non-random iterators, \Omega(n^2) iterator increments, whereas the
+// one-sided version will yield O(n) operations on both counts, with a \Omega(log(n)) bound on the number of
+// comparisons.
+template <class _AlgPolicy, class _Iter, class _Sent, class _Type, class _Proj, class _Comp>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Iter
+__lower_bound_onesided(_Iter __first, _Sent __last, const _Type& __value, _Comp& __comp, _Proj& __proj) {
+ // static_assert(std::is_base_of<std::forward_iterator_tag, typename _IterOps<_AlgPolicy>::template
+ // __iterator_category<_Iter>>::value,
+ // "lower_bound() is a multipass algorithm and requires forward iterator or better");
+
+ // step = 0, ensuring we can always short-circuit when distance is 1 later on
+ if (__first == __last || !std::__invoke(__comp, std::__invoke(__proj, *__first), __value))
+ return __first;
+
+ using _Distance = typename iterator_traits<_Iter>::difference_type;
+ for (_Distance __step = 1; __first != __last; __step <<= 1) {
+ auto __it = __first;
+ auto __dist = __step - _IterOps<_AlgPolicy>::advance(__it, __step, __last);
+ // once we reach the last range where needle can be we must start
+ // looking inwards, bisecting that range
+ if (__it == __last || !std::__invoke(__comp, std::__invoke(__proj, *__it), __value)) {
+ // we've already checked the previous value and it was less, we can save
+ // one comparison by skipping bisection
+ if (__dist == 1)
+ return __it;
+ return std::__lower_bound_bisecting<_AlgPolicy>(__first, __value, __dist, __comp, __proj);
+ }
+ // range not found, move forward!
+ __first = __it;
+ }
+ return __first;
+}
+
+template <class _AlgPolicy, class _RandIter, class _Sent, class _Type, class _Proj, class _Comp>
----------------
ldionne wrote:
```suggestion
template <class _AlgPolicy, class _RandomAccessIterator, class _Sent, class _Type, class _Proj, class _Comp>
```
https://github.com/llvm/llvm-project/pull/75230
More information about the libcxx-commits
mailing list