[libcxx-commits] [PATCH] D93233: [libc++] Replaces std::sort by Bitset sorting algorithm.

MinJae Hwang via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Mon Dec 21 13:27:20 PST 2020

minjaehwang added a comment.

Thank @Morwenn very much for looking into this code.

@Morwenn, you are right that the idea is largely based on Block Quicksort except that Bitset sort uses a 32/64 bit integer instead of an array. A bitset will be kept within CPU registers and won't require store instructions like Block Quicksort does.

As you know, Pdqsort is a variation of Block Quicksort with a pattern recognition and a different pivot selection. It recognizes ascending ranges, descending ranges, and O(n ^ 2) cases. Pdqsort shows O(n) performance on these known patterns and guarantees O(n lg n) in the worse case.

The bitset trick can be applied to Pdqsort's partition function just like this code does for the existing std::sort's partition function. I have implemented Bitset trick on top of pdqsort as well but chose the current implementation over pdqsort-based Bitset sort. The current implementation introduces the minimal change to std::sort. It only replaces the partition function but keeps most of the outer layer of std::sort.

The reason for keeping the outer layer of std::sort is that it will keep most of performance characteristics unchanged and adds a little overhead when there is a branch in the comparison function.

@Morwenn mentioned the O(n ^ 2) case for quicksort which also happens for the existing std::sort. As you might know, a simple change can avoid O(n ^ 2). Introsort avoids O(n ^ 2) by calling heap sort when a quicksort depth goes above O(lg n). There has been efforts to introduce introsort into libc++ (Kumar's introsort <http://llvm.org/devmtg/2017-10/slides/Kumar-libc++-performance.pdf>). For unknown reason to me, it has not been submitted. Pdqsort avoids O(n ^ 2) by calling heap sort when partitions are highly unbalanced.

I understand that Pdqsort is faster in many known patterns over std::sort. If we are not afraid of changing the entire sort implementation, I can certainly bring Bitset-on-pdqsort as another code review. @Morwenn and @ldionne, could you give your thoughts on the future direction?

@Morwenn Bitset sort is faster for std::strings than pdqsort which turns off Block sort technique for non-arithmetic types. See the following comparison. Block sort technique can be faster even with comparison functions with branches.

  BM_PdqSort_string_Random_1                                                   4.65 ns         4.64 ns    150470656
  BM_PdqSort_string_Random_4                                                   19.0 ns         19.0 ns     36175872
  BM_PdqSort_string_Random_16                                                  46.4 ns         46.4 ns     14942208
  BM_PdqSort_string_Random_64                                                  64.7 ns         64.7 ns     10485760
  BM_PdqSort_string_Random_256                                                 84.1 ns         84.1 ns      8126464
  BM_PdqSort_string_Random_1024                                                 103 ns          103 ns      6553600
  BM_PdqSort_string_Random_16384                                                160 ns          160 ns      4456448
  BM_PdqSort_string_Random_262144                                               242 ns          242 ns      2359296
  BM_Sort_string_Random_1                                                      3.85 ns         3.85 ns    181665792
  BM_Sort_string_Random_4                                                      18.1 ns         18.1 ns     38535168
  BM_Sort_string_Random_16                                                     45.5 ns         45.5 ns     14942208
  BM_Sort_string_Random_64                                                     62.6 ns         62.6 ns     10747904
  BM_Sort_string_Random_256                                                    74.4 ns         74.4 ns      9175040
  BM_Sort_string_Random_1024                                                   85.9 ns         85.9 ns      7864320
  BM_Sort_string_Random_16384                                                   121 ns          121 ns      5767168
  BM_Sort_string_Random_262144                                                  179 ns          179 ns      3407872
  BM_Sort_string_Random_1                                                      4.05 ns         4.05 ns    173277184
  BM_Sort_string_Random_4                                                      17.9 ns         17.9 ns     38010880
  BM_Sort_string_Random_16                                                     50.4 ns         50.3 ns     13369344
  BM_Sort_string_Random_64                                                     73.3 ns         73.3 ns      9437184
  BM_Sort_string_Random_256                                                    94.6 ns         94.4 ns      7077888
  BM_Sort_string_Random_1024                                                    115 ns          115 ns      6029312
  BM_Sort_string_Random_16384                                                   177 ns          177 ns      3932160
  BM_Sort_string_Random_262144                                                  279 ns          279 ns      2097152

  rG LLVM Github Monorepo



More information about the libcxx-commits mailing list