[libcxx-commits] [PATCH] D93233: [libc++] Replaces std::sort by Bitset sorting algorithm.
MinJae Hwang via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Dec 21 13:27:20 PST 2020
minjaehwang added a comment.
Thank @Morwenn very much for looking into this code.
@Morwenn, you are right that the idea is largely based on Block Quicksort except that Bitset sort uses a 32/64 bit integer instead of an array. A bitset will be kept within CPU registers and won't require store instructions like Block Quicksort does.
As you know, Pdqsort is a variation of Block Quicksort with a pattern recognition and a different pivot selection. It recognizes ascending ranges, descending ranges, and O(n ^ 2) cases. Pdqsort shows O(n) performance on these known patterns and guarantees O(n lg n) in the worse case.
The bitset trick can be applied to Pdqsort's partition function just like this code does for the existing std::sort's partition function. I have implemented Bitset trick on top of pdqsort as well but chose the current implementation over pdqsort-based Bitset sort. The current implementation introduces the minimal change to std::sort. It only replaces the partition function but keeps most of the outer layer of std::sort.
The reason for keeping the outer layer of std::sort is that it will keep most of performance characteristics unchanged and adds a little overhead when there is a branch in the comparison function.
@Morwenn mentioned the O(n ^ 2) case for quicksort which also happens for the existing std::sort. As you might know, a simple change can avoid O(n ^ 2). Introsort avoids O(n ^ 2) by calling heap sort when a quicksort depth goes above O(lg n). There has been efforts to introduce introsort into libc++ (Kumar's introsort <http://llvm.org/devmtg/2017-10/slides/Kumar-libc++-performance.pdf>). For unknown reason to me, it has not been submitted. Pdqsort avoids O(n ^ 2) by calling heap sort when partitions are highly unbalanced.
I understand that Pdqsort is faster in many known patterns over std::sort. If we are not afraid of changing the entire sort implementation, I can certainly bring Bitset-on-pdqsort as another code review. @Morwenn and @ldionne, could you give your thoughts on the future direction?
@Morwenn Bitset sort is faster for std::strings than pdqsort which turns off Block sort technique for non-arithmetic types. See the following comparison. Block sort technique can be faster even with comparison functions with branches.
BM_PdqSort_string_Random_1 4.65 ns 4.64 ns 150470656
BM_PdqSort_string_Random_4 19.0 ns 19.0 ns 36175872
BM_PdqSort_string_Random_16 46.4 ns 46.4 ns 14942208
BM_PdqSort_string_Random_64 64.7 ns 64.7 ns 10485760
BM_PdqSort_string_Random_256 84.1 ns 84.1 ns 8126464
BM_PdqSort_string_Random_1024 103 ns 103 ns 6553600
BM_PdqSort_string_Random_16384 160 ns 160 ns 4456448
BM_PdqSort_string_Random_262144 242 ns 242 ns 2359296
BM_Sort_string_Random_1 3.85 ns 3.85 ns 181665792
BM_Sort_string_Random_4 18.1 ns 18.1 ns 38535168
BM_Sort_string_Random_16 45.5 ns 45.5 ns 14942208
BM_Sort_string_Random_64 62.6 ns 62.6 ns 10747904
BM_Sort_string_Random_256 74.4 ns 74.4 ns 9175040
BM_Sort_string_Random_1024 85.9 ns 85.9 ns 7864320
BM_Sort_string_Random_16384 121 ns 121 ns 5767168
BM_Sort_string_Random_262144 179 ns 179 ns 3407872
BM_Sort_string_Random_1 4.05 ns 4.05 ns 173277184
BM_Sort_string_Random_4 17.9 ns 17.9 ns 38010880
BM_Sort_string_Random_16 50.4 ns 50.3 ns 13369344
BM_Sort_string_Random_64 73.3 ns 73.3 ns 9437184
BM_Sort_string_Random_256 94.6 ns 94.4 ns 7077888
BM_Sort_string_Random_1024 115 ns 115 ns 6029312
BM_Sort_string_Random_16384 177 ns 177 ns 3932160
BM_Sort_string_Random_262144 279 ns 279 ns 2097152
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93233/new/
https://reviews.llvm.org/D93233
More information about the libcxx-commits
mailing list