[libcxx-commits] [PATCH] D132505: [libc++] Refactor deque::iterator algorithm optimizations

Nikolas Klauser via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Sep 15 07:46:54 PDT 2022


philnik marked 4 inline comments as done.
philnik added a comment.

Here are the benchmarks for the current patch:

  -----------------------------------------------------------------------
  Benchmark                                             old           new
  -----------------------------------------------------------------------
  BM_deque_vector_copy/0                           0.272 ns       1.58 ns
  BM_deque_vector_copy/1                            3.40 ns       1.85 ns
  BM_deque_vector_copy/2                            3.30 ns       1.72 ns
  BM_deque_vector_copy/64                           4.24 ns       2.90 ns
  BM_deque_vector_copy/512                          18.6 ns       17.2 ns
  BM_deque_vector_copy/1024                         41.3 ns       40.1 ns
  BM_deque_vector_copy/4000                          145 ns        127 ns
  BM_deque_vector_copy/4096                          148 ns        132 ns
  BM_deque_vector_copy/5500                          196 ns        169 ns
  BM_deque_vector_copy/64000                        5518 ns       5554 ns
  BM_deque_vector_copy/65536                        5665 ns       5680 ns
  BM_deque_vector_copy/70000                        5852 ns       5772 ns
  BM_deque_vector_ranges_copy/0                    0.265 ns       1.58 ns
  BM_deque_vector_ranges_copy/1                     1.30 ns       1.87 ns
  BM_deque_vector_ranges_copy/2                     1.60 ns       1.72 ns
  BM_deque_vector_ranges_copy/64                    23.5 ns       2.90 ns
  BM_deque_vector_ranges_copy/512                    188 ns       16.9 ns
  BM_deque_vector_ranges_copy/1024                   370 ns       39.5 ns
  BM_deque_vector_ranges_copy/4000                  1437 ns        128 ns
  BM_deque_vector_ranges_copy/4096                  1474 ns        128 ns
  BM_deque_vector_ranges_copy/5500                  1990 ns        170 ns
  BM_deque_vector_ranges_copy/64000                23213 ns       5503 ns
  BM_deque_vector_ranges_copy/65536                23703 ns       5436 ns
  BM_deque_vector_ranges_copy/70000                25287 ns       5897 ns
  BM_deque_deque_copy/0                             1.06 ns       7.18 ns
  BM_deque_deque_copy/1                             5.77 ns       12.8 ns
  BM_deque_deque_copy/2                             5.63 ns       13.1 ns
  BM_deque_deque_copy/64                            6.42 ns       15.2 ns
  BM_deque_deque_copy/512                           21.4 ns       29.1 ns
  BM_deque_deque_copy/1024                          43.0 ns       56.9 ns
  BM_deque_deque_copy/4000                           114 ns        116 ns
  BM_deque_deque_copy/4096                           171 ns        177 ns
  BM_deque_deque_copy/5500                           236 ns        225 ns
  BM_deque_deque_copy/64000                         5387 ns       5431 ns
  BM_deque_deque_copy/65536                         5552 ns       5560 ns
  BM_deque_deque_copy/70000                         5882 ns       5941 ns
  BM_deque_deque_ranges_copy/0                     0.793 ns       7.12 ns
  BM_deque_deque_ranges_copy/1                      1.85 ns       12.8 ns
  BM_deque_deque_ranges_copy/2                      2.38 ns       13.0 ns
  BM_deque_deque_ranges_copy/64                     44.4 ns       15.2 ns
  BM_deque_deque_ranges_copy/512                     281 ns       29.1 ns
  BM_deque_deque_ranges_copy/1024                    555 ns       56.9 ns
  BM_deque_deque_ranges_copy/4000                   2155 ns        115 ns
  BM_deque_deque_ranges_copy/4096                   2217 ns        177 ns
  BM_deque_deque_ranges_copy/5500                   2977 ns        226 ns
  BM_deque_deque_ranges_copy/64000                 34584 ns       5432 ns
  BM_deque_deque_ranges_copy/65536                 35419 ns       5572 ns
  BM_deque_deque_ranges_copy/70000                 37847 ns       5972 ns
  BM_vector_deque_copy/0                           0.585 ns      0.529 ns
  BM_vector_deque_copy/1                            2.98 ns       2.73 ns
  BM_vector_deque_copy/2                            2.79 ns       2.62 ns
  BM_vector_deque_copy/64                           3.70 ns       3.44 ns
  BM_vector_deque_copy/512                          13.5 ns       13.6 ns
  BM_vector_deque_copy/1024                         40.0 ns       40.1 ns
  BM_vector_deque_copy/4000                          145 ns        144 ns
  BM_vector_deque_copy/4096                          147 ns        147 ns
  BM_vector_deque_copy/5500                          199 ns        201 ns
  BM_vector_deque_copy/64000                        5452 ns       5462 ns
  BM_vector_deque_copy/65536                        5718 ns       5691 ns
  BM_vector_deque_copy/70000                        5985 ns       5988 ns
  BM_vector_deque_ranges_copy/0                    0.529 ns      0.528 ns
  BM_vector_deque_ranges_copy/1                     1.06 ns       2.69 ns
  BM_vector_deque_ranges_copy/2                     1.46 ns       2.65 ns
  BM_vector_deque_ranges_copy/64                    23.2 ns       3.51 ns
  BM_vector_deque_ranges_copy/512                    187 ns       14.0 ns
  BM_vector_deque_ranges_copy/1024                   369 ns       40.2 ns
  BM_vector_deque_ranges_copy/4000                  1440 ns        145 ns
  BM_vector_deque_ranges_copy/4096                  1474 ns        146 ns
  BM_vector_deque_ranges_copy/5500                  1979 ns        202 ns
  BM_vector_deque_ranges_copy/64000                22995 ns       5512 ns
  BM_vector_deque_ranges_copy/65536                23549 ns       5695 ns
  BM_vector_deque_ranges_copy/70000                25182 ns       6064 ns
  BM_deque_vector_move/0                           0.266 ns       1.61 ns
  BM_deque_vector_move/1                            3.26 ns       1.91 ns
  BM_deque_vector_move/2                            3.23 ns       1.73 ns
  BM_deque_vector_move/64                           4.24 ns       2.90 ns
  BM_deque_vector_move/512                          14.1 ns       13.5 ns
  BM_deque_vector_move/1024                         40.4 ns       39.5 ns
  BM_deque_vector_move/4000                          144 ns        128 ns
  BM_deque_vector_move/4096                          144 ns        128 ns
  BM_deque_vector_move/5500                          200 ns        170 ns
  BM_deque_vector_move/64000                        5482 ns       5481 ns
  BM_deque_vector_move/65536                        5436 ns       5397 ns
  BM_deque_vector_move/70000                        5846 ns       5795 ns
  BM_deque_vector_ranges_move/0                    0.264 ns       1.58 ns
  BM_deque_vector_ranges_move/1                     1.32 ns       1.85 ns
  BM_deque_vector_ranges_move/2                     1.59 ns       1.72 ns
  BM_deque_vector_ranges_move/64                    35.3 ns       2.90 ns
  BM_deque_vector_ranges_move/512                    202 ns       13.5 ns
  BM_deque_vector_ranges_move/1024                   395 ns       39.5 ns
  BM_deque_vector_ranges_move/4000                  1548 ns        128 ns
  BM_deque_vector_ranges_move/4096                  1559 ns        128 ns
  BM_deque_vector_ranges_move/5500                  2133 ns        170 ns
  BM_deque_vector_ranges_move/64000                24609 ns       5478 ns
  BM_deque_vector_ranges_move/65536                25182 ns       5396 ns
  BM_deque_vector_ranges_move/70000                27414 ns       5783 ns
  BM_deque_deque_move/0                             1.06 ns       7.12 ns
  BM_deque_deque_move/1                             5.67 ns       12.7 ns
  BM_deque_deque_move/2                             5.54 ns       12.9 ns
  BM_deque_deque_move/64                            6.37 ns       15.1 ns
  BM_deque_deque_move/512                           21.3 ns       28.8 ns
  BM_deque_deque_move/1024                          43.0 ns       56.8 ns
  BM_deque_deque_move/4000                           112 ns        116 ns
  BM_deque_deque_move/4096                           171 ns        177 ns
  BM_deque_deque_move/5500                           233 ns        228 ns
  BM_deque_deque_move/64000                         5378 ns       5442 ns
  BM_deque_deque_move/65536                         5546 ns       5578 ns
  BM_deque_deque_move/70000                         5877 ns       5932 ns
  BM_deque_deque_ranges_move/0                     0.792 ns       7.12 ns
  BM_deque_deque_ranges_move/1                      1.85 ns       12.7 ns
  BM_deque_deque_ranges_move/2                      2.38 ns       13.0 ns
  BM_deque_deque_ranges_move/64                     43.9 ns       15.1 ns
  BM_deque_deque_ranges_move/512                     281 ns       28.8 ns
  BM_deque_deque_ranges_move/1024                    560 ns       56.8 ns
  BM_deque_deque_ranges_move/4000                   2171 ns        115 ns
  BM_deque_deque_ranges_move/4096                   2245 ns        177 ns
  BM_deque_deque_ranges_move/5500                   3013 ns        225 ns
  BM_deque_deque_ranges_move/64000                 35085 ns       5429 ns
  BM_deque_deque_ranges_move/65536                 35939 ns       5560 ns
  BM_deque_deque_ranges_move/70000                 38388 ns       5935 ns
  BM_vector_deque_move/0                           0.597 ns      0.534 ns
  BM_vector_deque_move/1                            2.83 ns       2.71 ns
  BM_vector_deque_move/2                            2.70 ns       2.60 ns
  BM_vector_deque_move/64                           3.68 ns       3.44 ns
  BM_vector_deque_move/512                          13.5 ns       13.6 ns
  BM_vector_deque_move/1024                         39.9 ns       40.1 ns
  BM_vector_deque_move/4000                          145 ns        144 ns
  BM_vector_deque_move/4096                          146 ns        146 ns
  BM_vector_deque_move/5500                          200 ns        200 ns
  BM_vector_deque_move/64000                        5454 ns       5460 ns
  BM_vector_deque_move/65536                        5722 ns       5686 ns
  BM_vector_deque_move/70000                        5986 ns       5984 ns
  BM_vector_deque_ranges_move/0                    0.539 ns      0.528 ns
  BM_vector_deque_ranges_move/1                     1.06 ns       2.71 ns
  BM_vector_deque_ranges_move/2                     1.47 ns       2.58 ns
  BM_vector_deque_ranges_move/64                    24.0 ns       3.44 ns
  BM_vector_deque_ranges_move/512                    189 ns       13.6 ns
  BM_vector_deque_ranges_move/1024                   375 ns       40.1 ns
  BM_vector_deque_ranges_move/4000                  1436 ns        144 ns
  BM_vector_deque_ranges_move/4096                  1472 ns        146 ns
  BM_vector_deque_ranges_move/5500                  1977 ns        200 ns
  BM_vector_deque_ranges_move/64000                22981 ns       5466 ns
  BM_vector_deque_ranges_move/65536                23577 ns       5688 ns
  BM_vector_deque_ranges_move/70000                25131 ns       5985 ns
  BM_deque_vector_copy_backward/0                  0.264 ns       1.58 ns
  BM_deque_vector_copy_backward/1                   2.96 ns       1.86 ns
  BM_deque_vector_copy_backward/2                   3.55 ns       1.72 ns
  BM_deque_vector_copy_backward/64                  4.49 ns       2.93 ns
  BM_deque_vector_copy_backward/512                 16.1 ns       13.2 ns
  BM_deque_vector_copy_backward/1024                41.1 ns       40.0 ns
  BM_deque_vector_copy_backward/4000                 151 ns        126 ns
  BM_deque_vector_copy_backward/4096                 145 ns        127 ns
  BM_deque_vector_copy_backward/5500                 211 ns        170 ns
  BM_deque_vector_copy_backward/64000               5471 ns       5506 ns
  BM_deque_vector_copy_backward/65536               5439 ns       5415 ns
  BM_deque_vector_copy_backward/70000               5838 ns       5786 ns
  BM_deque_vector_ranges_copy_backward/0           0.264 ns       1.58 ns
  BM_deque_vector_ranges_copy_backward/1            1.17 ns       1.85 ns
  BM_deque_vector_ranges_copy_backward/2            1.45 ns       1.73 ns
  BM_deque_vector_ranges_copy_backward/64           26.0 ns       3.11 ns
  BM_deque_vector_ranges_copy_backward/512           147 ns       13.3 ns
  BM_deque_vector_ranges_copy_backward/1024          282 ns       40.1 ns
  BM_deque_vector_ranges_copy_backward/4000         1103 ns        127 ns
  BM_deque_vector_ranges_copy_backward/4096         1131 ns        127 ns
  BM_deque_vector_ranges_copy_backward/5500         1514 ns        170 ns
  BM_deque_vector_ranges_copy_backward/6400        17553 ns       5515 ns
  BM_deque_vector_ranges_copy_backward/6553        17944 ns       5415 ns
  BM_deque_vector_ranges_copy_backward/7000        19183 ns       5784 ns
  BM_deque_deque_copy_backward/0                    1.16 ns       1.32 ns
  BM_deque_deque_copy_backward/1                    6.58 ns       3.17 ns
  BM_deque_deque_copy_backward/2                    6.87 ns       3.17 ns
  BM_deque_deque_copy_backward/64                   7.77 ns       4.19 ns
  BM_deque_deque_copy_backward/512                  24.3 ns       19.2 ns
  BM_deque_deque_copy_backward/1024                 46.2 ns       43.5 ns
  BM_deque_deque_copy_backward/4000                  121 ns        101 ns
  BM_deque_deque_copy_backward/4096                  179 ns        163 ns
  BM_deque_deque_copy_backward/5500                  247 ns        216 ns
  BM_deque_deque_copy_backward/64000                5362 ns       5408 ns
  BM_deque_deque_copy_backward/65536                5474 ns       5647 ns
  BM_deque_deque_copy_backward/70000                5856 ns       5948 ns
  BM_deque_deque_ranges_copy_backward/0            0.792 ns       1.33 ns
  BM_deque_deque_ranges_copy_backward/1             2.04 ns       3.17 ns
  BM_deque_deque_ranges_copy_backward/2             2.93 ns       3.17 ns
  BM_deque_deque_ranges_copy_backward/64            56.0 ns       4.19 ns
  BM_deque_deque_ranges_copy_backward/512            372 ns       19.2 ns
  BM_deque_deque_ranges_copy_backward/1024           715 ns       43.5 ns
  BM_deque_deque_ranges_copy_backward/4000          2839 ns        101 ns
  BM_deque_deque_ranges_copy_backward/4096          2861 ns        163 ns
  BM_deque_deque_ranges_copy_backward/5500          3850 ns        217 ns
  BM_deque_deque_ranges_copy_backward/64000        42909 ns       5404 ns
  BM_deque_deque_ranges_copy_backward/65536        44236 ns       5572 ns
  BM_deque_deque_ranges_copy_backward/70000        47484 ns       5997 ns
  BM_vector_deque_copy_backward/0                  0.597 ns      0.532 ns
  BM_vector_deque_copy_backward/1                   4.17 ns       2.15 ns
  BM_vector_deque_copy_backward/2                   3.83 ns       2.03 ns
  BM_vector_deque_copy_backward/64                  4.99 ns       3.49 ns
  BM_vector_deque_copy_backward/512                 18.1 ns       14.0 ns
  BM_vector_deque_copy_backward/1024                43.6 ns       41.1 ns
  BM_vector_deque_copy_backward/4000                 160 ns        135 ns
  BM_vector_deque_copy_backward/4096                 160 ns        138 ns
  BM_vector_deque_copy_backward/5500                 225 ns        180 ns
  BM_vector_deque_copy_backward/64000               5458 ns       5435 ns
  BM_vector_deque_copy_backward/65536               5648 ns       5652 ns
  BM_vector_deque_copy_backward/70000               6021 ns       6026 ns
  BM_vector_deque_ranges_copy_backward/0           0.529 ns      0.529 ns
  BM_vector_deque_ranges_copy_backward/1            1.06 ns       2.11 ns
  BM_vector_deque_ranges_copy_backward/2            1.35 ns       1.98 ns
  BM_vector_deque_ranges_copy_backward/64           25.4 ns       3.43 ns
  BM_vector_deque_ranges_copy_backward/512           166 ns       13.8 ns
  BM_vector_deque_ranges_copy_backward/1024          286 ns       41.0 ns
  BM_vector_deque_ranges_copy_backward/4000         1149 ns        134 ns
  BM_vector_deque_ranges_copy_backward/4096         1138 ns        138 ns
  BM_vector_deque_ranges_copy_backward/5500         1536 ns        180 ns
  BM_vector_deque_ranges_copy_backward/6400        17771 ns       5435 ns
  BM_vector_deque_ranges_copy_backward/6553        18343 ns       5653 ns
  BM_vector_deque_ranges_copy_backward/7000        19422 ns       6024 ns
  BM_deque_vector_move_backward/0                  0.271 ns       1.58 ns
  BM_deque_vector_move_backward/1                   2.91 ns       1.85 ns
  BM_deque_vector_move_backward/2                   3.51 ns       1.72 ns
  BM_deque_vector_move_backward/64                  4.49 ns       2.94 ns
  BM_deque_vector_move_backward/512                 15.8 ns       13.3 ns
  BM_deque_vector_move_backward/1024                41.2 ns       40.0 ns
  BM_deque_vector_move_backward/4000                 147 ns        126 ns
  BM_deque_vector_move_backward/4096                 145 ns        127 ns
  BM_deque_vector_move_backward/5500                 207 ns        170 ns
  BM_deque_vector_move_backward/64000               5465 ns       5512 ns
  BM_deque_vector_move_backward/65536               5435 ns       5414 ns
  BM_deque_vector_move_backward/70000               5835 ns       5788 ns
  BM_deque_vector_ranges_move_backward/0           0.264 ns       1.58 ns
  BM_deque_vector_ranges_move_backward/1            1.17 ns       1.85 ns
  BM_deque_vector_ranges_move_backward/2            1.45 ns       1.72 ns
  BM_deque_vector_ranges_move_backward/64           23.2 ns       3.11 ns
  BM_deque_vector_ranges_move_backward/512           147 ns       13.3 ns
  BM_deque_vector_ranges_move_backward/1024          281 ns       39.9 ns
  BM_deque_vector_ranges_move_backward/4000         1097 ns        126 ns
  BM_deque_vector_ranges_move_backward/4096         1122 ns        127 ns
  BM_deque_vector_ranges_move_backward/5500         1514 ns        170 ns
  BM_deque_vector_ranges_move_backward/6400        17551 ns       5502 ns
  BM_deque_vector_ranges_move_backward/6553        17944 ns       5415 ns
  BM_deque_vector_ranges_move_backward/7000        19183 ns       5787 ns
  BM_deque_deque_move_backward/0                    1.17 ns       1.32 ns
  BM_deque_deque_move_backward/1                    6.60 ns       3.17 ns
  BM_deque_deque_move_backward/2                    6.87 ns       3.17 ns
  BM_deque_deque_move_backward/64                   7.78 ns       4.19 ns
  BM_deque_deque_move_backward/512                  24.2 ns       19.2 ns
  BM_deque_deque_move_backward/1024                 46.2 ns       43.5 ns
  BM_deque_deque_move_backward/4000                  121 ns        101 ns
  BM_deque_deque_move_backward/4096                  179 ns        163 ns
  BM_deque_deque_move_backward/5500                  247 ns        216 ns
  BM_deque_deque_move_backward/64000                5361 ns       5401 ns
  BM_deque_deque_move_backward/65536                5465 ns       5531 ns
  BM_deque_deque_move_backward/70000                5845 ns       5942 ns
  BM_deque_deque_ranges_move_backward/0            0.791 ns       1.32 ns
  BM_deque_deque_ranges_move_backward/1             2.04 ns       3.17 ns
  BM_deque_deque_ranges_move_backward/2             2.93 ns       3.17 ns
  BM_deque_deque_ranges_move_backward/64            55.7 ns       4.19 ns
  BM_deque_deque_ranges_move_backward/512            351 ns       19.2 ns
  BM_deque_deque_ranges_move_backward/1024           689 ns       43.5 ns
  BM_deque_deque_ranges_move_backward/4000          2685 ns        102 ns
  BM_deque_deque_ranges_move_backward/4096          2743 ns        163 ns
  BM_deque_deque_ranges_move_backward/5500          3698 ns        215 ns
  BM_deque_deque_ranges_move_backward/64000        42808 ns       5394 ns
  BM_deque_deque_ranges_move_backward/65536        43858 ns       5531 ns
  BM_deque_deque_ranges_move_backward/70000        46853 ns       5941 ns
  BM_vector_deque_move_backward/0                  0.621 ns      0.528 ns
  BM_vector_deque_move_backward/1                   4.17 ns       2.11 ns
  BM_vector_deque_move_backward/2                   3.84 ns       1.98 ns
  BM_vector_deque_move_backward/64                  4.99 ns       3.43 ns
  BM_vector_deque_move_backward/512                 18.1 ns       13.8 ns
  BM_vector_deque_move_backward/1024                43.6 ns       41.0 ns
  BM_vector_deque_move_backward/4000                 160 ns        134 ns
  BM_vector_deque_move_backward/4096                 160 ns        138 ns
  BM_vector_deque_move_backward/5500                 225 ns        180 ns
  BM_vector_deque_move_backward/64000               5457 ns       5433 ns
  BM_vector_deque_move_backward/65536               5646 ns       5650 ns
  BM_vector_deque_move_backward/70000               6020 ns       6027 ns
  BM_vector_deque_ranges_move_backward/0           0.536 ns      0.529 ns
  BM_vector_deque_ranges_move_backward/1            1.06 ns       2.11 ns
  BM_vector_deque_ranges_move_backward/2            1.33 ns       1.98 ns
  BM_vector_deque_ranges_move_backward/64           28.9 ns       3.43 ns
  BM_vector_deque_ranges_move_backward/512           160 ns       13.8 ns
  BM_vector_deque_ranges_move_backward/1024          286 ns       41.0 ns
  BM_vector_deque_ranges_move_backward/4000         1197 ns        134 ns
  BM_vector_deque_ranges_move_backward/4096         1138 ns        138 ns
  BM_vector_deque_ranges_move_backward/5500         1552 ns        180 ns
  BM_vector_deque_ranges_move_backward/6400        17834 ns       5432 ns
  BM_vector_deque_ranges_move_backward/6553        18351 ns       5654 ns
  BM_vector_deque_ranges_move_backward/7000        19473 ns       6028 ns



================
Comment at: libcxx/include/__algorithm/copy.h:33
+          __enable_if_t<!(is_copy_constructible<_InIter>::value
+                       && is_copy_constructible<_Sent>::value
+                       && is_copy_constructible<_OutIter>::value), int> = 0>
----------------
huixie90 wrote:
> philnik wrote:
> > huixie90 wrote:
> > > sentinel is always copyable.
> > > this applies for other places
> > Since this is pre-existing I'd rather fix it in a follow-up. This patch is already quite large.
> sounds good to me
Doesn't apply anymore, since @var-const refactored it.


================
Comment at: libcxx/include/__algorithm/copy.h:86
+          class _OutIter,
+          __enable_if_t<__segmented_iterator_traits<_InIter>::__is_segmented_iterator::value, int> = 0>
+inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 pair<_InIter, _OutIter>
----------------
huixie90 wrote:
> philnik wrote:
> > huixie90 wrote:
> > > As we get more and more optimisations for different types, it is harder to make sure all of these overloads are mutually exclusive. Do you think this is (or will be) a problem?
> > I think this is a problem. But I don't really have a good idea how to fix it. Using `if constexpr` would probably do the job, but we don't have that option.
> I feel that someone is going to promote his priority_tag thing
Also doesn't apply anymore.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132505/new/

https://reviews.llvm.org/D132505



More information about the libcxx-commits mailing list