[libcxx-commits] [PATCH] D118029: Introduce branchless sorting functions for sort3, sort4 and sort5.

Wed Feb 9 03:49:15 PST 2022

marcogelmi added a comment.

thanks for the review!

================
Comment at: libcxx/include/__algorithm/sort.h:128
+template <typename _Compare, typename _RandomAccessIterator>
+inline _LIBCPP_HIDDEN void __cond_swap(_RandomAccessIterator __x, _RandomAccessIterator __y, _Compare __c) {
+  using value_type = typename iterator_traits<_RandomAccessIterator>::value_type;
----------------
ldionne wrote:
> Also, does the `inline` make a difference in your benchmarks? If not, I'd drop it, because they are semantically equivalent (templates have inline semantics by default).
> 
> This comment applies below in several places too.
Removing the inline seems to negatively affect the performance for some combinations of tests and architectures. Would it be ok to keep the inline for extra safety?

================
Comment at: libcxx/include/__algorithm/sort.h:201
+    __enable_if_t<!is_arithmetic<typename iterator_traits<_RandomAccessIterator>::value_type>::value, void>
+    __sort5_internal(_RandomAccessIterator __x1, _RandomAccessIterator __x2, _RandomAccessIterator __x3,
+                     _RandomAccessIterator __x4, _RandomAccessIterator __x5, _Compare __c) {
----------------
ldionne wrote:
> Instead, would it be possible to put the `enable_if_t` on `__sort5` directly, and then only add one overload of `__sort5` for when `is_arithmetic` is true? This would avoid introducing a new layer of complexity like `__sort5_internal`. This applies elsewhere too.
unfortunately the original version of __sortN also returns the number of swaps, while the new one doesn't, so we need both versions

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118029/new/

https://reviews.llvm.org/D118029