[libcxx-commits] [libcxx] [libc++] Simplify the implementation of std::sort a bit (PR #104902)

Tue Oct 22 03:54:54 PDT 2024

danlark1 wrote:

> In addition to @joanahalili's comment: the regression varies from 5% to 25% depending on the platform. And there's also a much more significant improvement for larger vector sizes (5000+) - around 1.3-2x speedup, but the code is used with smaller vector sizes in prod, which makes the improvement irrelevant to us.

To provide more context. The situation is more nuanced. I am also from Google and was one of the people who developed new std::sort implementation back in 2022/2023.

The patch is correct. It fixed a problem which was introduced (oh my) on Feb 1, 2023 [here](https://github.com/llvm/llvm-project/commit/dc017e03ca55ed0b2054e4a7d5e5ca049a054fcc#diff-603fba8f48383db764fcaeefb820f553a318cb391bc75c9cb769bb0d20c10d42R24).

Note that `__use_branchless_sort` started to be used with a template arg `_Comp`, not `_Comp&` but `__use_branchless_sort` uses `__is_simple_comparator` which accepts only references

```cpp
template <class _Tp>
struct __is_simple_comparator : false_type {};
template <>
struct __is_simple_comparator<__less<>&> : true_type {};
template <class _Tp>
struct __is_simple_comparator<less<_Tp>&> : true_type {};
template <class _Tp>
struct __is_simple_comparator<greater<_Tp>&> : true_type {};
#if _LIBCPP_STD_VER >= 20
template <>
struct __is_simple_comparator<ranges::less&> : true_type {};
template <>
struct __is_simple_comparator<ranges::greater&> : true_type {};
#endif
```
See godbolt for `__use_branchless_sort` dispatch https://gcc.godbolt.org/z/8e9TffYb6

Disabling branch free version of bitset partitioning is not the desired behavior (which was from Feb 1, 2023). Branch free version was developed when we changed std::sort implementation and that was the whole point of doing so. This patch fixes it but regressions of 200-300 elements are real as floats and doubles are slightly harder to move branch free. I checked the same benchmarks on integers and they are fine for <100 elements.

Let's move forward with this patch. We checked and actually biggest clients of double and float sorting are kind of big ones. If everybody here begins to be concerned with <200 elements sorts for floats, we can disable is_arithmetic

https://github.com/llvm/llvm-project/pull/104902