[libcxx-commits] [PATCH] D118029: Introduce branchless sorting functions for sort3, sort4 and sort5.

Marco Gelmi via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Wed Feb 2 09:33:12 PST 2022


marcogelmi added inline comments.


================
Comment at: libcxx/include/__algorithm/sort.h:160
+inline _LIBCPP_HIDDEN
+    __enable_if_t<!_VSTD::is_integral< typename iterator_traits<_RandomAccessIterator>::value_type>::value, void>
+    __sort3_internal(_RandomAccessIterator __x1, _RandomAccessIterator __x2, _RandomAccessIterator __x3, _Compare __c) {
----------------
philnik wrote:
> marcogelmi wrote:
> > marcogelmi wrote:
> > > Quuxplusone wrote:
> > > > philnik wrote:
> > > > > Is there a reason you restricted this to integrals? Does it pessimize some code? I guessing at least all primitive types should be allowed. I haven't benchmarked, but looking at code generation it seems floating points should also benefit from this change.
> > > > `is_scalar` might be relevant.
> > > this relies on cmovs, on larger data types it could pessimize the code if it compiles to extra copies instead, or not compile at all if it's not trivially copyable.
> > switched to is_arithmetic and added benchmarks here: https://bit.ly/3rgv2r9
> Could we generalize it to something like `sizeof(typename iterator_traits<_RandomAccessIterator>::value_type) <= sizeof(size_t) && is_trivially_copyable_v<typename iterator_traits<_RandomAccessIterator>::value_type>`? I'm assuming here that `sizeof(size_t)` is less than or equal to the word size. This would allow this optimization to be performed on something like `std::optional<int>`. Could you check if that still yields a performance improvement? `std::optional` doesn't have a trivial comparison operator, so it might be interesting.
so I tried testing it on `optional<int>` and the results are underwhelming (pretty much equivalent on intel skylake and actually slower on ARM - see below). 

I think this is because the compiler doesn't generate conditional moves, see https://godbolt.org/z/o6648s6f1

I wonder if there is a way to ensure it's enabled only for types for which conditional moves will be generated...

Results on ARM:

```
name                            old cpu/op  new cpu/op  delta
BM_Sort_optional_Random_1       4.02ns ± 0%  4.01ns ± 1%   -0.36%   (p=0.000 n=99+98)
BM_Sort_optional_Random_3       5.33ns ± 0%  5.46ns ± 0%   +2.42%   (p=0.000 n=94+93)
BM_Sort_optional_Random_4       5.39ns ± 0%  6.17ns ± 0%  +14.45%   (p=0.000 n=99+99)
BM_Sort_optional_Random_5       6.47ns ± 0%  7.35ns ± 0%  +13.54%   (p=0.000 n=99+98)
BM_Sort_optional_Random_16      10.2ns ± 1%  10.8ns ± 2%   +5.74%  (p=0.000 n=95+100)
```



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118029/new/

https://reviews.llvm.org/D118029



More information about the libcxx-commits mailing list