[libcxx-commits] [PATCH] D132505: [libc++] Refactor deque::iterator algorithm optimizations

Nikolas Klauser via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Sat Sep 17 07:06:19 PDT 2022


philnik added a comment.

For completeness, here are the updated numbers:

  --------------------------------------------------------------------
  Benchmark                                         old            new
  --------------------------------------------------------------------
  BM_deque_vector_copy/0                       0.272 ns        1.58 ns
  BM_deque_vector_copy/1                        3.40 ns        1.85 ns
  BM_deque_vector_copy/2                        3.30 ns        1.72 ns
  BM_deque_vector_copy/64                       4.24 ns        2.90 ns
  BM_deque_vector_copy/512                      18.6 ns        26.1 ns
  BM_deque_vector_copy/1024                     41.3 ns        40.1 ns
  BM_deque_vector_copy/4000                      145 ns         127 ns
  BM_deque_vector_copy/4096                      148 ns         131 ns
  BM_deque_vector_copy/5500                      196 ns         167 ns
  BM_deque_vector_copy/64000                    5518 ns        5647 ns
  BM_deque_vector_copy/65536                    5665 ns        5674 ns
  BM_deque_vector_copy/70000                    5852 ns        5852 ns
  BM_deque_vector_ranges_copy/0                0.265 ns        1.58 ns
  BM_deque_vector_ranges_copy/1                 1.30 ns        1.85 ns
  BM_deque_vector_ranges_copy/2                 1.60 ns        1.73 ns
  BM_deque_vector_ranges_copy/64                23.5 ns        2.90 ns
  BM_deque_vector_ranges_copy/512                188 ns        16.9 ns
  BM_deque_vector_ranges_copy/1024               370 ns        39.5 ns
  BM_deque_vector_ranges_copy/4000              1437 ns         128 ns
  BM_deque_vector_ranges_copy/4096              1474 ns         128 ns
  BM_deque_vector_ranges_copy/5500              1990 ns         171 ns
  BM_deque_vector_ranges_copy/64000            23213 ns        5533 ns
  BM_deque_vector_ranges_copy/65536            23703 ns        5469 ns
  BM_deque_vector_ranges_copy/70000            25287 ns        5884 ns
  BM_deque_deque_copy/0                         1.06 ns        1.24 ns
  BM_deque_deque_copy/1                         5.77 ns        3.11 ns
  BM_deque_deque_copy/2                         5.63 ns        2.91 ns
  BM_deque_deque_copy/64                        6.42 ns        4.03 ns
  BM_deque_deque_copy/512                       21.4 ns        18.8 ns
  BM_deque_deque_copy/1024                      43.0 ns        41.7 ns
  BM_deque_deque_copy/4000                       114 ns        97.6 ns
  BM_deque_deque_copy/4096                       171 ns         156 ns
  BM_deque_deque_copy/5500                       236 ns         204 ns
  BM_deque_deque_copy/64000                     5387 ns        5406 ns
  BM_deque_deque_copy/65536                     5552 ns        5566 ns
  BM_deque_deque_copy/70000                     5882 ns        5907 ns
  BM_deque_deque_ranges_copy/0                 0.793 ns        1.21 ns
  BM_deque_deque_ranges_copy/1                  1.85 ns        3.12 ns
  BM_deque_deque_ranges_copy/2                  2.38 ns        2.90 ns
  BM_deque_deque_ranges_copy/64                 44.4 ns        4.03 ns
  BM_deque_deque_ranges_copy/512                 281 ns        18.7 ns
  BM_deque_deque_ranges_copy/1024                555 ns        41.8 ns
  BM_deque_deque_ranges_copy/4000               2155 ns        97.8 ns
  BM_deque_deque_ranges_copy/4096               2217 ns         156 ns
  BM_deque_deque_ranges_copy/5500               2977 ns         206 ns
  BM_deque_deque_ranges_copy/64000             34584 ns        5412 ns
  BM_deque_deque_ranges_copy/65536             35419 ns        5582 ns
  BM_deque_deque_ranges_copy/70000             37847 ns        5912 ns
  BM_vector_deque_copy/0                       0.585 ns       0.528 ns
  BM_vector_deque_copy/1                        2.98 ns        2.11 ns
  BM_vector_deque_copy/2                        2.79 ns        1.98 ns
  BM_vector_deque_copy/64                       3.70 ns        3.44 ns
  BM_vector_deque_copy/512                      13.5 ns        13.4 ns
  BM_vector_deque_copy/1024                     40.0 ns        39.2 ns
  BM_vector_deque_copy/4000                      145 ns         130 ns
  BM_vector_deque_copy/4096                      147 ns         135 ns
  BM_vector_deque_copy/5500                      199 ns         182 ns
  BM_vector_deque_copy/64000                    5452 ns        5424 ns
  BM_vector_deque_copy/65536                    5718 ns        5698 ns
  BM_vector_deque_copy/70000                    5985 ns        5992 ns
  BM_vector_deque_ranges_copy/0                0.529 ns       0.531 ns
  BM_vector_deque_ranges_copy/1                 1.06 ns        2.11 ns
  BM_vector_deque_ranges_copy/2                 1.46 ns        1.98 ns
  BM_vector_deque_ranges_copy/64                23.2 ns        3.44 ns
  BM_vector_deque_ranges_copy/512                187 ns        13.5 ns
  BM_vector_deque_ranges_copy/1024               369 ns        39.2 ns
  BM_vector_deque_ranges_copy/4000              1440 ns         130 ns
  BM_vector_deque_ranges_copy/4096              1474 ns         135 ns
  BM_vector_deque_ranges_copy/5500              1979 ns         182 ns
  BM_vector_deque_ranges_copy/64000            22995 ns        5421 ns
  BM_vector_deque_ranges_copy/65536            23549 ns        5705 ns
  BM_vector_deque_ranges_copy/70000            25182 ns        5994 ns
  BM_deque_vector_move/0                       0.266 ns        1.59 ns
  BM_deque_vector_move/1                        3.26 ns        1.85 ns
  BM_deque_vector_move/2                        3.23 ns        1.72 ns
  BM_deque_vector_move/64                       4.24 ns        2.90 ns
  BM_deque_vector_move/512                      14.1 ns        13.5 ns
  BM_deque_vector_move/1024                     40.4 ns        39.5 ns
  BM_deque_vector_move/4000                      144 ns         128 ns
  BM_deque_vector_move/4096                      144 ns         128 ns
  BM_deque_vector_move/5500                      200 ns         171 ns
  BM_deque_vector_move/64000                    5482 ns        5458 ns
  BM_deque_vector_move/65536                    5436 ns        5469 ns
  BM_deque_vector_move/70000                    5846 ns        5781 ns
  BM_deque_vector_ranges_move/0                0.264 ns        1.58 ns
  BM_deque_vector_ranges_move/1                 1.32 ns        1.85 ns
  BM_deque_vector_ranges_move/2                 1.59 ns        1.72 ns
  BM_deque_vector_ranges_move/64                35.3 ns        2.91 ns
  BM_deque_vector_ranges_move/512                202 ns        13.4 ns
  BM_deque_vector_ranges_move/1024               395 ns        39.5 ns
  BM_deque_vector_ranges_move/4000              1548 ns         128 ns
  BM_deque_vector_ranges_move/4096              1559 ns         128 ns
  BM_deque_vector_ranges_move/5500              2133 ns         170 ns
  BM_deque_vector_ranges_move/64000            24609 ns        5460 ns
  BM_deque_vector_ranges_move/65536            25182 ns        5470 ns
  BM_deque_vector_ranges_move/70000            27414 ns        5774 ns
  BM_deque_deque_move/0                         1.06 ns        1.22 ns
  BM_deque_deque_move/1                         5.67 ns        3.10 ns
  BM_deque_deque_move/2                         5.54 ns        2.90 ns
  BM_deque_deque_move/64                        6.37 ns        4.00 ns
  BM_deque_deque_move/512                       21.3 ns        18.8 ns
  BM_deque_deque_move/1024                      43.0 ns        41.7 ns
  BM_deque_deque_move/4000                       112 ns        97.6 ns
  BM_deque_deque_move/4096                       171 ns         156 ns
  BM_deque_deque_move/5500                       233 ns         206 ns
  BM_deque_deque_move/64000                     5378 ns        5401 ns
  BM_deque_deque_move/65536                     5546 ns        5562 ns
  BM_deque_deque_move/70000                     5877 ns        5911 ns
  BM_deque_deque_ranges_move/0                 0.792 ns        1.21 ns
  BM_deque_deque_ranges_move/1                  1.85 ns        3.10 ns
  BM_deque_deque_ranges_move/2                  2.38 ns        2.90 ns
  BM_deque_deque_ranges_move/64                 43.9 ns        4.02 ns
  BM_deque_deque_ranges_move/512                 281 ns        18.8 ns
  BM_deque_deque_ranges_move/1024                560 ns        41.7 ns
  BM_deque_deque_ranges_move/4000               2171 ns        97.6 ns
  BM_deque_deque_ranges_move/4096               2245 ns         156 ns
  BM_deque_deque_ranges_move/5500               3013 ns         204 ns
  BM_deque_deque_ranges_move/64000             35085 ns        5399 ns
  BM_deque_deque_ranges_move/65536             35939 ns        5560 ns
  BM_deque_deque_ranges_move/70000             38388 ns        5903 ns
  BM_vector_deque_move/0                       0.597 ns       0.579 ns
  BM_vector_deque_move/1                        2.83 ns        2.11 ns
  BM_vector_deque_move/2                        2.70 ns        1.98 ns
  BM_vector_deque_move/64                       3.68 ns        3.43 ns
  BM_vector_deque_move/512                      13.5 ns        13.5 ns
  BM_vector_deque_move/1024                     39.9 ns        39.1 ns
  BM_vector_deque_move/4000                      145 ns         130 ns
  BM_vector_deque_move/4096                      146 ns         135 ns
  BM_vector_deque_move/5500                      200 ns         182 ns
  BM_vector_deque_move/64000                    5454 ns        5422 ns
  BM_vector_deque_move/65536                    5722 ns        5693 ns
  BM_vector_deque_move/70000                    5986 ns        5992 ns
  BM_vector_deque_ranges_move/0                0.539 ns       0.528 ns
  BM_vector_deque_ranges_move/1                 1.06 ns        2.13 ns
  BM_vector_deque_ranges_move/2                 1.47 ns        1.99 ns
  BM_vector_deque_ranges_move/64                24.0 ns        3.44 ns
  BM_vector_deque_ranges_move/512                189 ns        13.8 ns
  BM_vector_deque_ranges_move/1024               375 ns        39.5 ns
  BM_vector_deque_ranges_move/4000              1436 ns         130 ns
  BM_vector_deque_ranges_move/4096              1472 ns         135 ns
  BM_vector_deque_ranges_move/5500              1977 ns         183 ns
  BM_vector_deque_ranges_move/64000            22981 ns        5660 ns
  BM_vector_deque_ranges_move/65536            23577 ns        5888 ns
  BM_vector_deque_ranges_move/70000            25131 ns        6174 ns
  BM_deque_vector_copy_backward/0              0.264 ns        1.61 ns
  BM_deque_vector_copy_backward/1               2.96 ns        1.87 ns
  BM_deque_vector_copy_backward/2               3.55 ns        1.75 ns
  BM_deque_vector_copy_backward/64              4.49 ns        3.02 ns
  BM_deque_vector_copy_backward/512             16.1 ns        13.4 ns
  BM_deque_vector_copy_backward/1024            41.1 ns        40.3 ns
  BM_deque_vector_copy_backward/4000             151 ns         128 ns
  BM_deque_vector_copy_backward/4096             145 ns         129 ns
  BM_deque_vector_copy_backward/5500             211 ns         171 ns
  BM_deque_vector_copy_backward/64000           5471 ns        5517 ns
  BM_deque_vector_copy_backward/65536           5439 ns        5440 ns
  BM_deque_vector_copy_backward/70000           5838 ns        5775 ns
  BM_deque_vector_ranges_copy_backward/0       0.264 ns        1.58 ns
  BM_deque_vector_ranges_copy_backward/1        1.17 ns        1.85 ns
  BM_deque_vector_ranges_copy_backward/2        1.45 ns        1.72 ns
  BM_deque_vector_ranges_copy_backward/64       26.0 ns        3.12 ns
  BM_deque_vector_ranges_copy_backward/512       147 ns        13.2 ns
  BM_deque_vector_ranges_copy_backward/1024      282 ns        39.9 ns
  BM_deque_vector_ranges_copy_backward/4000     1103 ns         126 ns
  BM_deque_vector_ranges_copy_backward/4096     1131 ns         127 ns
  BM_deque_vector_ranges_copy_backward/5500     1514 ns         171 ns
  BM_deque_vector_ranges_copy_backward/64000   17553 ns        5512 ns
  BM_deque_vector_ranges_copy_backward/65536   17944 ns        5441 ns
  BM_deque_vector_ranges_copy_backward/70000   19183 ns        5772 ns
  BM_deque_deque_copy_backward/0                1.16 ns        1.33 ns
  BM_deque_deque_copy_backward/1                6.58 ns        3.17 ns
  BM_deque_deque_copy_backward/2                6.87 ns        3.17 ns
  BM_deque_deque_copy_backward/64               7.77 ns        4.18 ns
  BM_deque_deque_copy_backward/512              24.3 ns        19.2 ns
  BM_deque_deque_copy_backward/1024             46.2 ns        43.2 ns
  BM_deque_deque_copy_backward/4000              121 ns         101 ns
  BM_deque_deque_copy_backward/4096              179 ns         162 ns
  BM_deque_deque_copy_backward/5500              247 ns         216 ns
  BM_deque_deque_copy_backward/64000            5362 ns        5429 ns
  BM_deque_deque_copy_backward/65536            5474 ns        5551 ns
  BM_deque_deque_copy_backward/70000            5856 ns        5942 ns
  BM_deque_deque_ranges_copy_backward/0        0.792 ns        1.34 ns
  BM_deque_deque_ranges_copy_backward/1         2.04 ns        3.17 ns
  BM_deque_deque_ranges_copy_backward/2         2.93 ns        3.17 ns
  BM_deque_deque_ranges_copy_backward/64        56.0 ns        4.20 ns
  BM_deque_deque_ranges_copy_backward/512        372 ns        19.2 ns
  BM_deque_deque_ranges_copy_backward/1024       715 ns        43.2 ns
  BM_deque_deque_ranges_copy_backward/4000      2839 ns         101 ns
  BM_deque_deque_ranges_copy_backward/4096      2861 ns         163 ns
  BM_deque_deque_ranges_copy_backward/5500      3850 ns         216 ns
  BM_deque_deque_ranges_copy_backward/64000    42909 ns        5425 ns
  BM_deque_deque_ranges_copy_backward/65536    44236 ns        5547 ns
  BM_deque_deque_ranges_copy_backward/70000    47484 ns        5939 ns
  BM_vector_deque_copy_backward/0              0.597 ns       0.566 ns
  BM_vector_deque_copy_backward/1               4.17 ns        2.11 ns
  BM_vector_deque_copy_backward/2               3.83 ns        1.98 ns
  BM_vector_deque_copy_backward/64              4.99 ns        3.43 ns
  BM_vector_deque_copy_backward/512             18.1 ns        13.8 ns
  BM_vector_deque_copy_backward/1024            43.6 ns        41.0 ns
  BM_vector_deque_copy_backward/4000             160 ns         134 ns
  BM_vector_deque_copy_backward/4096             160 ns         138 ns
  BM_vector_deque_copy_backward/5500             225 ns         180 ns
  BM_vector_deque_copy_backward/64000           5458 ns        5440 ns
  BM_vector_deque_copy_backward/65536           5648 ns        5657 ns
  BM_vector_deque_copy_backward/70000           6021 ns        6018 ns
  BM_vector_deque_ranges_copy_backward/0       0.529 ns       0.538 ns
  BM_vector_deque_ranges_copy_backward/1        1.06 ns        2.11 ns
  BM_vector_deque_ranges_copy_backward/2        1.35 ns        1.98 ns
  BM_vector_deque_ranges_copy_backward/64       25.4 ns        3.43 ns
  BM_vector_deque_ranges_copy_backward/512       166 ns        13.8 ns
  BM_vector_deque_ranges_copy_backward/1024      286 ns        41.0 ns
  BM_vector_deque_ranges_copy_backward/4000     1149 ns         134 ns
  BM_vector_deque_ranges_copy_backward/4096     1138 ns         139 ns
  BM_vector_deque_ranges_copy_backward/5500     1536 ns         182 ns
  BM_vector_deque_ranges_copy_backward/64000   17771 ns        5467 ns
  BM_vector_deque_ranges_copy_backward/65536   18343 ns        5658 ns
  BM_vector_deque_ranges_copy_backward/70000   19422 ns        6062 ns
  BM_deque_vector_move_backward/0              0.271 ns        1.60 ns
  BM_deque_vector_move_backward/1               2.91 ns        1.87 ns
  BM_deque_vector_move_backward/2               3.51 ns        1.74 ns
  BM_deque_vector_move_backward/64              4.49 ns        3.13 ns
  BM_deque_vector_move_backward/512             15.8 ns        13.4 ns
  BM_deque_vector_move_backward/1024            41.2 ns        40.0 ns
  BM_deque_vector_move_backward/4000             147 ns         126 ns
  BM_deque_vector_move_backward/4096             145 ns         127 ns
  BM_deque_vector_move_backward/5500             207 ns         171 ns
  BM_deque_vector_move_backward/64000           5465 ns        5515 ns
  BM_deque_vector_move_backward/65536           5435 ns        5441 ns
  BM_deque_vector_move_backward/70000           5835 ns        5787 ns
  BM_deque_vector_ranges_move_backward/0       0.264 ns        1.58 ns
  BM_deque_vector_ranges_move_backward/1        1.17 ns        1.85 ns
  BM_deque_vector_ranges_move_backward/2        1.45 ns        1.72 ns
  BM_deque_vector_ranges_move_backward/64       23.2 ns        3.12 ns
  BM_deque_vector_ranges_move_backward/512       147 ns        13.3 ns
  BM_deque_vector_ranges_move_backward/1024      281 ns        39.9 ns
  BM_deque_vector_ranges_move_backward/4000     1097 ns         126 ns
  BM_deque_vector_ranges_move_backward/4096     1122 ns         127 ns
  BM_deque_vector_ranges_move_backward/5500     1514 ns         170 ns
  BM_deque_vector_ranges_move_backward/64000   17551 ns        5507 ns
  BM_deque_vector_ranges_move_backward/65536   17944 ns        5436 ns
  BM_deque_vector_ranges_move_backward/70000   19183 ns        5781 ns
  BM_deque_deque_move_backward/0                1.17 ns        1.34 ns
  BM_deque_deque_move_backward/1                6.60 ns        3.17 ns
  BM_deque_deque_move_backward/2                6.87 ns        3.17 ns
  BM_deque_deque_move_backward/64               7.78 ns        4.20 ns
  BM_deque_deque_move_backward/512              24.2 ns        19.2 ns
  BM_deque_deque_move_backward/1024             46.2 ns        43.2 ns
  BM_deque_deque_move_backward/4000              121 ns         101 ns
  BM_deque_deque_move_backward/4096              179 ns         163 ns
  BM_deque_deque_move_backward/5500              247 ns         217 ns
  BM_deque_deque_move_backward/64000            5361 ns        5431 ns
  BM_deque_deque_move_backward/65536            5465 ns        5547 ns
  BM_deque_deque_move_backward/70000            5845 ns        5939 ns
  BM_deque_deque_ranges_move_backward/0        0.791 ns        1.33 ns
  BM_deque_deque_ranges_move_backward/1         2.04 ns        3.17 ns
  BM_deque_deque_ranges_move_backward/2         2.93 ns        3.17 ns
  BM_deque_deque_ranges_move_backward/64        55.7 ns        4.18 ns
  BM_deque_deque_ranges_move_backward/512        351 ns        19.2 ns
  BM_deque_deque_ranges_move_backward/1024       689 ns        43.2 ns
  BM_deque_deque_ranges_move_backward/4000      2685 ns         101 ns
  BM_deque_deque_ranges_move_backward/4096      2743 ns         162 ns
  BM_deque_deque_ranges_move_backward/5500      3698 ns         215 ns
  BM_deque_deque_ranges_move_backward/64000    42808 ns        5426 ns
  BM_deque_deque_ranges_move_backward/65536    43858 ns        5551 ns
  BM_deque_deque_ranges_move_backward/70000    46853 ns        5947 ns
  BM_vector_deque_move_backward/0              0.621 ns       0.532 ns
  BM_vector_deque_move_backward/1               4.17 ns        2.11 ns
  BM_vector_deque_move_backward/2               3.84 ns        1.98 ns
  BM_vector_deque_move_backward/64              4.99 ns        3.43 ns
  BM_vector_deque_move_backward/512             18.1 ns        13.8 ns
  BM_vector_deque_move_backward/1024            43.6 ns        41.0 ns
  BM_vector_deque_move_backward/4000             160 ns         134 ns
  BM_vector_deque_move_backward/4096             160 ns         138 ns
  BM_vector_deque_move_backward/5500             225 ns         181 ns
  BM_vector_deque_move_backward/64000           5457 ns        5436 ns
  BM_vector_deque_move_backward/65536           5646 ns        5658 ns
  BM_vector_deque_move_backward/70000           6020 ns        6022 ns
  BM_vector_deque_ranges_move_backward/0       0.536 ns       0.528 ns
  BM_vector_deque_ranges_move_backward/1        1.06 ns        2.11 ns
  BM_vector_deque_ranges_move_backward/2        1.33 ns        1.98 ns
  BM_vector_deque_ranges_move_backward/64       28.9 ns        3.43 ns
  BM_vector_deque_ranges_move_backward/512       160 ns        13.8 ns
  BM_vector_deque_ranges_move_backward/1024      286 ns        41.0 ns
  BM_vector_deque_ranges_move_backward/4000     1197 ns         134 ns
  BM_vector_deque_ranges_move_backward/4096     1138 ns         138 ns
  BM_vector_deque_ranges_move_backward/5500     1552 ns         181 ns
  BM_vector_deque_ranges_move_backward/64000   17834 ns        5436 ns
  BM_vector_deque_ranges_move_backward/65536   18351 ns        5660 ns
  BM_vector_deque_ranges_move_backward/70000   19473 ns        6022 ns



================
Comment at: libcxx/benchmarks/deque_iterator.bench.cpp:1
+//===----------------------------------------------------------------------===//
+//
----------------
ldionne wrote:
> I would be curious to try adding `_LIBCPP_ALWAYS_INLINE` to `__unwrap_and_dispatch` (and possibly other implementation details in that code path) to see what impact it has on the benchmarks. Could you try that out and report? I would assume that for raw pointers, that should all end up being inlined away.
I've changed the implementation of `copy` and `move` to use the segment iterators and that fixed the performance issues, so I didn't test with `_LIBCPP_ALWAYS_INLINE`. 


================
Comment at: libcxx/include/__algorithm/copy.h:74
+      auto __iters = std::__copy(__first, __last, __segment.__begin_);
+      __result += __range_size;
+      return std::make_pair(__iters.first, __result);
----------------
huixie90 wrote:
> `+=` isn't required as part of your "concept" of segmented iterator
I'd like to investigate that when adding a non-random-access iterator to the segmented iterators, since then it will actually matter. I don't know yet whether local iterators should always be random-access or copying to output segmented iterators should be constrained on the local iterator being random-access.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132505/new/

https://reviews.llvm.org/D132505



More information about the libcxx-commits mailing list