[libcxx-commits] [PATCH] D155136: [libc++][PSTL] Add a __parallel_sort implementation to libdispatch
Nikolas Klauser via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Jul 12 16:35:22 PDT 2023
philnik added a comment.
Benchmark numbers:
-------------------------------------------------------------------------------------------------------------
Benchmark serial parallel
-------------------------------------------------------------------------------------------------------------
BM_pstl_stable_sort_uint32_Random/1 1.93 ns 534 ns
BM_pstl_stable_sort_uint32_Random/4 17.5 ns 800 ns
BM_pstl_stable_sort_uint32_Random/16 118 ns 1198 ns
BM_pstl_stable_sort_uint32_Random/64 776 ns 2142 ns
BM_pstl_stable_sort_uint32_Random/256 4592 ns 6945 ns
BM_pstl_stable_sort_uint32_Random/1024 23893 ns 31171 ns
BM_pstl_stable_sort_uint32_Random/16384 563914 ns 433386 ns
BM_pstl_stable_sort_uint32_Random/262144 11894672 ns 3885956 ns
BM_pstl_stable_sort_uint32_Ascending/1 1.93 ns 568 ns
BM_pstl_stable_sort_uint32_Ascending/4 4.25 ns 789 ns
BM_pstl_stable_sort_uint32_Ascending/16 8.52 ns 1068 ns
BM_pstl_stable_sort_uint32_Ascending/64 25.0 ns 1483 ns
BM_pstl_stable_sort_uint32_Ascending/256 832 ns 3632 ns
BM_pstl_stable_sort_uint32_Ascending/1024 6140 ns 13805 ns
BM_pstl_stable_sort_uint32_Ascending/16384 194077 ns 199223 ns
BM_pstl_stable_sort_uint32_Ascending/262144 4527549 ns 2033799 ns
BM_pstl_stable_sort_uint32_Descending/1 2.05 ns 533 ns
BM_pstl_stable_sort_uint32_Descending/4 7.75 ns 775 ns
BM_pstl_stable_sort_uint32_Descending/16 56.6 ns 1061 ns
BM_pstl_stable_sort_uint32_Descending/64 864 ns 1547 ns
BM_pstl_stable_sort_uint32_Descending/256 4199 ns 3638 ns
BM_pstl_stable_sort_uint32_Descending/1024 19518 ns 15025 ns
BM_pstl_stable_sort_uint32_Descending/16384 409599 ns 249699 ns
BM_pstl_stable_sort_uint32_Descending/262144 8099483 ns 2720430 ns
BM_pstl_stable_sort_uint32_SingleElement/1 1.98 ns 536 ns
BM_pstl_stable_sort_uint32_SingleElement/4 4.24 ns 778 ns
BM_pstl_stable_sort_uint32_SingleElement/16 8.57 ns 1081 ns
BM_pstl_stable_sort_uint32_SingleElement/64 25.0 ns 1464 ns
BM_pstl_stable_sort_uint32_SingleElement/256 815 ns 3753 ns
BM_pstl_stable_sort_uint32_SingleElement/1024 6148 ns 13789 ns
BM_pstl_stable_sort_uint32_SingleElement/16384 193470 ns 200398 ns
BM_pstl_stable_sort_uint32_SingleElement/262144 4593444 ns 2050435 ns
BM_pstl_stable_sort_uint32_PipeOrgan/1 1.99 ns 545 ns
BM_pstl_stable_sort_uint32_PipeOrgan/4 4.80 ns 765 ns
BM_pstl_stable_sort_uint32_PipeOrgan/16 22.9 ns 1060 ns
BM_pstl_stable_sort_uint32_PipeOrgan/64 206 ns 1526 ns
BM_pstl_stable_sort_uint32_PipeOrgan/256 2528 ns 4080 ns
BM_pstl_stable_sort_uint32_PipeOrgan/1024 12875 ns 15882 ns
BM_pstl_stable_sort_uint32_PipeOrgan/16384 296908 ns 241196 ns
BM_pstl_stable_sort_uint32_PipeOrgan/262144 6237556 ns 2797044 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/1 1.96 ns 534 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/4 4.24 ns 816 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/16 8.62 ns 1098 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/64 415 ns 1601 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/256 2744 ns 4634 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/1024 20339 ns 17328 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/16384 428328 ns 289705 ns
BM_pstl_stable_sort_uint32_QuickSortAdversary/262144 8292500 ns 2970696 ns
BM_pstl_stable_sort_uint64_Random/1 1.99 ns 585 ns
BM_pstl_stable_sort_uint64_Random/4 17.7 ns 807 ns
BM_pstl_stable_sort_uint64_Random/16 119 ns 1176 ns
BM_pstl_stable_sort_uint64_Random/64 787 ns 2253 ns
BM_pstl_stable_sort_uint64_Random/256 4568 ns 7001 ns
BM_pstl_stable_sort_uint64_Random/1024 24210 ns 32133 ns
BM_pstl_stable_sort_uint64_Random/16384 578554 ns 493008 ns
BM_pstl_stable_sort_uint64_Random/262144 12190776 ns 4313737 ns
BM_pstl_stable_sort_uint64_Ascending/1 1.95 ns 550 ns
BM_pstl_stable_sort_uint64_Ascending/4 4.29 ns 789 ns
BM_pstl_stable_sort_uint64_Ascending/16 8.70 ns 1044 ns
BM_pstl_stable_sort_uint64_Ascending/64 25.3 ns 1506 ns
BM_pstl_stable_sort_uint64_Ascending/256 850 ns 3718 ns
BM_pstl_stable_sort_uint64_Ascending/1024 6313 ns 13998 ns
BM_pstl_stable_sort_uint64_Ascending/16384 198880 ns 262187 ns
BM_pstl_stable_sort_uint64_Ascending/262144 4857250 ns 2386621 ns
BM_pstl_stable_sort_uint64_Descending/1 1.98 ns 534 ns
BM_pstl_stable_sort_uint64_Descending/4 7.88 ns 780 ns
BM_pstl_stable_sort_uint64_Descending/16 57.4 ns 1048 ns
BM_pstl_stable_sort_uint64_Descending/64 863 ns 1634 ns
BM_pstl_stable_sort_uint64_Descending/256 4166 ns 3778 ns
BM_pstl_stable_sort_uint64_Descending/1024 19769 ns 15531 ns
BM_pstl_stable_sort_uint64_Descending/16384 408186 ns 327655 ns
BM_pstl_stable_sort_uint64_Descending/262144 8160153 ns 3131787 ns
BM_pstl_stable_sort_uint64_SingleElement/1 1.97 ns 541 ns
BM_pstl_stable_sort_uint64_SingleElement/4 4.28 ns 797 ns
BM_pstl_stable_sort_uint64_SingleElement/16 8.69 ns 1038 ns
BM_pstl_stable_sort_uint64_SingleElement/64 25.3 ns 1514 ns
BM_pstl_stable_sort_uint64_SingleElement/256 851 ns 3719 ns
BM_pstl_stable_sort_uint64_SingleElement/1024 6320 ns 13969 ns
BM_pstl_stable_sort_uint64_SingleElement/16384 197247 ns 270237 ns
BM_pstl_stable_sort_uint64_SingleElement/262144 4851639 ns 2421014 ns
BM_pstl_stable_sort_uint64_PipeOrgan/1 1.96 ns 536 ns
BM_pstl_stable_sort_uint64_PipeOrgan/4 5.06 ns 774 ns
BM_pstl_stable_sort_uint64_PipeOrgan/16 22.8 ns 1046 ns
BM_pstl_stable_sort_uint64_PipeOrgan/64 204 ns 1557 ns
BM_pstl_stable_sort_uint64_PipeOrgan/256 2517 ns 4156 ns
BM_pstl_stable_sort_uint64_PipeOrgan/1024 12857 ns 16116 ns
BM_pstl_stable_sort_uint64_PipeOrgan/16384 305902 ns 307511 ns
BM_pstl_stable_sort_uint64_PipeOrgan/262144 6482625 ns 3157936 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/1 1.96 ns 540 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/4 4.27 ns 895 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/16 8.77 ns 1040 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/64 415 ns 1658 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/256 2762 ns 4587 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/1024 20056 ns 17438 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/16384 432698 ns 387232 ns
BM_pstl_stable_sort_uint64_QuickSortAdversary/262144 8477346 ns 3282792 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/1 17.4 ns 565 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/4 35.5 ns 876 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/16 202 ns 1383 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/64 1214 ns 2819 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/256 6523 ns 10135 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/1024 33622 ns 46725 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/16384 763475 ns 643777 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Random/262144 15756837 ns 5658928 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/1 17.1 ns 561 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/4 26.2 ns 851 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/16 57.1 ns 1237 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/64 396 ns 2035 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/256 2293 ns 6025 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/1024 12738 ns 24016 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/16384 319562 ns 393215 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Ascending/262144 7093596 ns 3672095 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/1 17.0 ns 562 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/4 26.3 ns 880 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/16 97.0 ns 1218 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/64 548 ns 2064 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/256 2968 ns 5648 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/1024 15541 ns 23177 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/16384 360516 ns 412686 ns
BM_pstl_stable_sort_pair<uint32, uint32>_Descending/262144 7714000 ns 3346048 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/1 17.0 ns 582 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/4 27.2 ns 844 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/16 64.5 ns 1245 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/64 445 ns 2115 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/256 2625 ns 6361 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/1024 14423 ns 26096 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/16384 355161 ns 473513 ns
BM_pstl_stable_sort_pair<uint32, uint32>_SingleElement/262144 7851652 ns 4256463 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/1 17.0 ns 575 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/4 25.9 ns 860 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/16 73.0 ns 1247 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/64 460 ns 2068 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/256 2595 ns 6562 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/1024 13828 ns 26194 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/16384 336549 ns 436980 ns
BM_pstl_stable_sort_pair<uint32, uint32>_PipeOrgan/262144 7377147 ns 4021613 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/1 17.0 ns 569 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/4 27.9 ns 861 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/16 77.9 ns 1251 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/64 545 ns 2542 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/256 3276 ns 7132 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/1024 18322 ns 30304 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/16384 462334 ns 525712 ns
BM_pstl_stable_sort_pair<uint32, uint32>_QuickSortAdversary/262144 10032232 ns 4340623 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/1 18.7 ns 563 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/4 43.6 ns 906 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/16 249 ns 1479 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/64 1409 ns 3153 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/256 7587 ns 11542 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/1024 38372 ns 51703 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/16384 873886 ns 722067 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Random/262144 18441289 ns 7160888 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/1 18.4 ns 587 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/4 27.5 ns 947 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/16 98.0 ns 1315 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/64 417 ns 2395 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/256 2332 ns 7074 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/1024 12245 ns 28658 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/16384 296193 ns 584751 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Ascending/262144 6857146 ns 5269052 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/1 18.4 ns 584 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/4 27.5 ns 905 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/16 151 ns 1333 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/64 601 ns 2242 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/256 3113 ns 6811 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/1024 15251 ns 28626 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/16384 345008 ns 565317 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_Descending/262144 7446533 ns 4839819 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/1 18.8 ns 580 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/4 29.5 ns 867 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/16 106 ns 1359 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/64 480 ns 2351 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/256 2672 ns 7546 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/1024 14060 ns 32064 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/16384 335052 ns 620179 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_SingleElement/262144 7655247 ns 6110748 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/1 18.7 ns 569 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/4 27.8 ns 860 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/16 120 ns 1304 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/64 509 ns 2206 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/256 2656 ns 7267 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/1024 13435 ns 30041 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/16384 318072 ns 568759 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_PipeOrgan/262144 7258031 ns 5375570 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/1 18.8 ns 568 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/4 29.4 ns 887 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/16 120 ns 1337 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/64 614 ns 2358 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/256 3339 ns 8096 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/1024 17214 ns 37114 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/16384 407828 ns 683645 ns
BM_pstl_stable_sort_tuple<uint32, uint64, uint32>_QuickSortAdversary/262144 9226697 ns 6520879 ns
BM_pstl_stable_sort_string_Random/1 18.2 ns 573 ns
BM_pstl_stable_sort_string_Random/4 71.8 ns 963 ns
BM_pstl_stable_sort_string_Random/16 451 ns 1650 ns
BM_pstl_stable_sort_string_Random/64 2815 ns 4853 ns
BM_pstl_stable_sort_string_Random/256 16983 ns 22339 ns
BM_pstl_stable_sort_string_Random/1024 88698 ns 98363 ns
BM_pstl_stable_sort_string_Random/16384 2105551 ns 1211421 ns
BM_pstl_stable_sort_string_Random/262144 54640692 ns 28420500 ns
BM_pstl_stable_sort_string_Ascending/1 18.3 ns 567 ns
BM_pstl_stable_sort_string_Ascending/4 34.9 ns 909 ns
BM_pstl_stable_sort_string_Ascending/16 120 ns 1376 ns
BM_pstl_stable_sort_string_Ascending/64 587 ns 2445 ns
BM_pstl_stable_sort_string_Ascending/256 4323 ns 8777 ns
BM_pstl_stable_sort_string_Ascending/1024 21786 ns 37000 ns
BM_pstl_stable_sort_string_Ascending/16384 509321 ns 661146 ns
BM_pstl_stable_sort_string_Ascending/262144 16427214 ns 14593143 ns
BM_pstl_stable_sort_string_Descending/1 18.6 ns 589 ns
BM_pstl_stable_sort_string_Descending/4 39.8 ns 935 ns
BM_pstl_stable_sort_string_Descending/16 252 ns 1380 ns
BM_pstl_stable_sort_string_Descending/64 1129 ns 2542 ns
BM_pstl_stable_sort_string_Descending/256 6668 ns 9051 ns
BM_pstl_stable_sort_string_Descending/1024 32704 ns 39067 ns
BM_pstl_stable_sort_string_Descending/16384 708264 ns 666882 ns
BM_pstl_stable_sort_string_Descending/262144 21184333 ns 12753673 ns
BM_pstl_stable_sort_string_SingleElement/1 18.3 ns 563 ns
BM_pstl_stable_sort_string_SingleElement/4 43.4 ns 917 ns
BM_pstl_stable_sort_string_SingleElement/16 159 ns 1453 ns
BM_pstl_stable_sort_string_SingleElement/64 824 ns 2737 ns
BM_pstl_stable_sort_string_SingleElement/256 4574 ns 9319 ns
BM_pstl_stable_sort_string_SingleElement/1024 23883 ns 39474 ns
BM_pstl_stable_sort_string_SingleElement/16384 564291 ns 734450 ns
BM_pstl_stable_sort_string_SingleElement/262144 12667527 ns 8225023 ns
BM_pstl_stable_sort_string_PipeOrgan/1 18.5 ns 573 ns
BM_pstl_stable_sort_string_PipeOrgan/4 35.4 ns 977 ns
BM_pstl_stable_sort_string_PipeOrgan/16 185 ns 1375 ns
BM_pstl_stable_sort_string_PipeOrgan/64 860 ns 2527 ns
BM_pstl_stable_sort_string_PipeOrgan/256 5302 ns 9320 ns
BM_pstl_stable_sort_string_PipeOrgan/1024 26029 ns 40149 ns
BM_pstl_stable_sort_string_PipeOrgan/16384 575983 ns 670767 ns
BM_pstl_stable_sort_string_PipeOrgan/262144 18165333 ns 15620318 ns
BM_pstl_stable_sort_string_QuickSortAdversary/1 18.2 ns 563 ns
BM_pstl_stable_sort_string_QuickSortAdversary/4 71.7 ns 936 ns
BM_pstl_stable_sort_string_QuickSortAdversary/16 449 ns 1658 ns
BM_pstl_stable_sort_string_QuickSortAdversary/64 2809 ns 4777 ns
BM_pstl_stable_sort_string_QuickSortAdversary/256 17149 ns 22100 ns
BM_pstl_stable_sort_string_QuickSortAdversary/1024 88657 ns 97570 ns
BM_pstl_stable_sort_string_QuickSortAdversary/16384 2091408 ns 1217799 ns
BM_pstl_stable_sort_string_QuickSortAdversary/262144 53567200 ns 29541200 ns
BM_pstl_stable_sort_float_Random/1 1.95 ns 539 ns
BM_pstl_stable_sort_float_Random/4 22.5 ns 804 ns
BM_pstl_stable_sort_float_Random/16 147 ns 1294 ns
BM_pstl_stable_sort_float_Random/64 934 ns 2539 ns
BM_pstl_stable_sort_float_Random/256 6600 ns 9269 ns
BM_pstl_stable_sort_float_Random/1024 35831 ns 43842 ns
BM_pstl_stable_sort_float_Random/16384 844270 ns 585408 ns
BM_pstl_stable_sort_float_Random/262144 18131421 ns 5370938 ns
BM_pstl_stable_sort_float_Ascending/1 1.96 ns 539 ns
BM_pstl_stable_sort_float_Ascending/4 4.25 ns 783 ns
BM_pstl_stable_sort_float_Ascending/16 8.59 ns 1107 ns
BM_pstl_stable_sort_float_Ascending/64 25.2 ns 1555 ns
BM_pstl_stable_sort_float_Ascending/256 1200 ns 4571 ns
BM_pstl_stable_sort_float_Ascending/1024 9188 ns 19242 ns
BM_pstl_stable_sort_float_Ascending/16384 287505 ns 286224 ns
BM_pstl_stable_sort_float_Ascending/262144 6861088 ns 2885462 ns
BM_pstl_stable_sort_float_Descending/1 1.96 ns 540 ns
BM_pstl_stable_sort_float_Descending/4 7.69 ns 794 ns
BM_pstl_stable_sort_float_Descending/16 56.4 ns 1109 ns
BM_pstl_stable_sort_float_Descending/64 800 ns 1672 ns
BM_pstl_stable_sort_float_Descending/256 4309 ns 4345 ns
BM_pstl_stable_sort_float_Descending/1024 21579 ns 19584 ns
BM_pstl_stable_sort_float_Descending/16384 490209 ns 328602 ns
BM_pstl_stable_sort_float_Descending/262144 10140493 ns 3444498 ns
BM_pstl_stable_sort_float_SingleElement/1 2.27 ns 549 ns
BM_pstl_stable_sort_float_SingleElement/4 4.51 ns 777 ns
BM_pstl_stable_sort_float_SingleElement/16 8.50 ns 1112 ns
BM_pstl_stable_sort_float_SingleElement/64 25.2 ns 1567 ns
BM_pstl_stable_sort_float_SingleElement/256 1206 ns 4558 ns
BM_pstl_stable_sort_float_SingleElement/1024 9163 ns 19022 ns
BM_pstl_stable_sort_float_SingleElement/16384 287924 ns 281133 ns
BM_pstl_stable_sort_float_SingleElement/262144 6857598 ns 3049492 ns
BM_pstl_stable_sort_float_PipeOrgan/1 1.96 ns 562 ns
BM_pstl_stable_sort_float_PipeOrgan/4 4.86 ns 832 ns
BM_pstl_stable_sort_float_PipeOrgan/16 23.4 ns 1151 ns
BM_pstl_stable_sort_float_PipeOrgan/64 204 ns 1621 ns
BM_pstl_stable_sort_float_PipeOrgan/256 2771 ns 5180 ns
BM_pstl_stable_sort_float_PipeOrgan/1024 15168 ns 21905 ns
BM_pstl_stable_sort_float_PipeOrgan/16384 384810 ns 336797 ns
BM_pstl_stable_sort_float_PipeOrgan/262144 8402687 ns 3817686 ns
BM_pstl_stable_sort_float_QuickSortAdversary/1 1.96 ns 589 ns
BM_pstl_stable_sort_float_QuickSortAdversary/4 4.26 ns 801 ns
BM_pstl_stable_sort_float_QuickSortAdversary/16 8.53 ns 1123 ns
BM_pstl_stable_sort_float_QuickSortAdversary/64 417 ns 1735 ns
BM_pstl_stable_sort_float_QuickSortAdversary/256 3250 ns 5725 ns
BM_pstl_stable_sort_float_QuickSortAdversary/1024 23423 ns 23649 ns
BM_pstl_stable_sort_float_QuickSortAdversary/16384 525503 ns 416050 ns
BM_pstl_stable_sort_float_QuickSortAdversary/262144 10546348 ns 3761952 ns
These don't look that great currently. I think we can improve this a lot by
1. improving the chunk size. Currently, for smaller ranges we don't do anything in the first step, since all chunks are 1 element in size
2. Don't wait every time until all the chunks are ready. If the first two chunks are merged, we can start merging them. This requires a different API though.
3. Don't `inplace_merge` serially. This would be trivial if we don't sync all the steps.
4. Avoid initializing the temporary buffer before we need to
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D155136/new/
https://reviews.llvm.org/D155136
More information about the libcxx-commits
mailing list