[libcxx-commits] [PATCH] D122780: Modify std::sort: add BlockQuickSort partitioning algorithm for arithmetic types

Nikolas Klauser via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Fri Apr 22 11:51:50 PDT 2022


philnik added a comment.

I haven't checked the logic yet, because I'm still not convinced that it's a good idea. While the numbers for random data look great, every other pattern is pessimized a bit to a lot with trivial arithmetic types. The numbers for the non-arithmetic types look good across the board.



================
Comment at: libcxx/include/__algorithm/sort.h:234
 template <class _Compare, class _RandomAccessIterator>
 void __insertion_sort_3(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {
   typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type;
----------------
Why is this named `_3`? That's quite confusing.


================
Comment at: libcxx/include/__algorithm/sort.h:268
+    if (__comp(*__i, *__j)) {
+      value_type __t(_VSTD::move(*__i));
+      _RandomAccessIterator __k = __j;
----------------
You can use `std::` in new code. Applies throughout the patch.


================
Comment at: libcxx/include/__algorithm/sort.h:358-359
+  static __storage_t __blsr(__storage_t x) {
+    // _blsr_u64 can be used here but it did not make any performance
+    // difference in practice.
+    return x ^ (x & -x);
----------------
I don't think the comment is necessary here. Could you add a `__libcpp_blsr` in the same style as the `__libcpp_popcount` functions?


================
Comment at: libcxx/include/__algorithm/sort.h:362-364
+  static int __clz(__storage_t x) { return __builtin_clzll(x); }
+  static int __ctz(__storage_t x) { return __builtin_ctzll(x); }
+  static int __popcount(__storage_t x) { return __builtin_popcountll(x); }
----------------
That would allow you to make this a template instead of having the same thing twice.

Do you even use `__32bit_set` anywhere? I couldn't find it.

Could you just use `uint64_t` and `uint32_t` directly and just call the `__libcpp` functions instead of having the extra indirection?


================
Comment at: libcxx/include/__algorithm/sort.h:399-400
+// pivot.  Returns the iterator for the pivot and a bool value which is true if
+// the provided range is already sorted, false otherwise.  We assume that the
+// length of the range is at least three elements.
+//
----------------
Could you add a `_LIBCPP_ASSERT` for the precondition?


================
Comment at: libcxx/include/__algorithm/sort.h:402-404
+// We term the partitioning algorithm as bitset partitioning since the
+// outcomes of the comparisons between the pivot and other elements are stored
+// in the fixed size bitsets.
----------------
Could you shorten this to something like `// __bitset_partition uses bitsets for storing the outcomes of the comparisons`? I think that makes it clear.


================
Comment at: libcxx/include/__algorithm/sort.h:407
+_LIBCPP_HIDE_FROM_ABI _VSTD::pair<_RandomAccessIterator, bool>
+__bitset_partition(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {
+  typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type value_type;
----------------
Could you split this up into a few more functions? There are very clear sections to divide this functions.


================
Comment at: libcxx/include/__algorithm/sort.h:559-560
+// pivot.  Returns the iterator for the pivot and a bool value which is true if
+// the provided range is already sorted, false otherwise.  We assume that the
+// length of the range is at least three elements.
+template <class _RandomAccessIterator, class _Compare>
----------------
Again, please add a `_LIBCPP_ASSERT`.


================
Comment at: libcxx/include/__algorithm/sort.h:660-662
+  _LIBCPP_CONSTEXPR_AFTER_CXX11 difference_type __limit = 24;
+  // Lower bound for using Tuckey's ninther technique for median computation.
+  _LIBCPP_CONSTEXPR_AFTER_CXX11 difference_type __ninther_threshold = 128;
----------------
Why can't these be `_LIBCPP_CONSTEXPR`?


================
Comment at: libcxx/include/__algorithm/sort.h:685-686
     }
-    if (__len <= __limit) {
-      _VSTD::__insertion_sort_3<_Compare>(__first, __last, __comp);
+    // Use insertion sort if the length of the range is below the specified
+    // limit.
+    if (__len < __limit) {
----------------



================
Comment at: libcxx/include/__algorithm/sort.h:750-751
     }
-    // sort smaller range with recursive call and larger with tail recursion elimination
-    if (__i - __first < __last - __i) {
-      _VSTD::__introsort<_Compare>(__first, __i, __comp, __depth);
-      __first = ++__i;
-    } else {
-      _VSTD::__introsort<_Compare>(__i + difference_type(1), __last, __comp, __depth);
-      __last = __i;
-    }
+    // Sort the left partiton recursively and the right partition with tail
+    // recursion elimination.
+    _VSTD::__introsort<_Compare, _RandomAccessIterator, _UseBitSetPartition>(__first, __i, __comp, __depth, __leftmost);
----------------



================
Comment at: libcxx/include/__algorithm/sort.h:772-774
+  // Only use bitset partitioning for arithmetic types.  We should also check
+  // that the default comparator is in use so that we are sure that there are no
+  // branches in the comparator.
----------------
Where do you check that the default comparator is used?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122780/new/

https://reviews.llvm.org/D122780



More information about the libcxx-commits mailing list