[libcxx-commits] [PATCH] D93233: [libc++] Replaces std::sort by Bitset sorting algorithm.
Morwenn via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Sun Dec 20 05:26:10 PST 2020
Morwenn added a comment.
Hi, @ldionne asked me whether it was possible to review the algorithm, and so did I over the last three days. I eventually decided to create an account here and share again what I already sent him by mail, with additional information.
First, it looks like the main new trick is an implementation of the idea described in *BlockQuicksort: How Branch Mispredictions don't affect Quicksort* <https://arxiv.org/abs/1604.06697> by Edelkamp and Weiß: basically, the partitioning algorithm performs comparisons over blocks of 32~64 elements, stores their results as booleans (here in a bitset), then performs the swaps to put the misplaced elements into place. It nicely avoids issues linked to branch misprediction, and thus is kind of a silver bullet for quicksort as long as the comparisons and projections passed to the algorithm are themselves branchless. As we can see for the strings benchmark, this partitioning scheme can be slower than a more traditional one when there are branches in the comparisons, which is likely due to the overhead from the additional logic involved.
On the good side, I tweaked bitset_sort so that I could inject it in my cpp-sort <https://github.com/Morwenn/cpp-sort> library, and ran the full test suite: the algorithm doesn't seem to have obvious bugs, and ubsan and asan didn't fid any issue either. Also the benchmarks show good results (here for a std::vector<double>: https://i.imgur.com/Nu6l8lI.png
On the not-so-good side, bitset_sort seems to be O(n²), just like the current implementation of std::sort in C++. I reran the quicksort killer implementation of Orson Peters from this issue <https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20140901/113986.html>, and got the same results. The quicksort killer is based on *A Killer Adversary for Quicksort* <https://www.cs.dartmouth.edu/~doug/mdmspe.pdf> by M. D. McIlroy. Here is a graph showing the O(n log n) vs. O(n²) behaviour of the different algorithms: https://i.imgur.com/IWa2teO.png
So far my overall conclusion is that the idea of reimplementing the BlockQuicksort logic over bitsets is nice, but pdqsort otherwise seems better in every regard: it is truly O(n log n) - and even O(n log k) for k unique elements when k is small -, is efficient at breaking common quicksort-adverse patterns such as the pipe organ pattern in my first benchmark, and it actually switches over to a simpler non-branchless partitioning algorithm when it can't prove at compile time that the comparison will be branchless, avoiding the regression on types such as std::string.
If std::sort has to be changed, then I'd definitely reconsider using pdqsort <https://github.com/orlp/pdqsort> instead. It should be possible to reimplement it over bitsets à la bitset_sort if you care about the reduced impact on stack memory.
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the libcxx-commits