[libcxx-commits] [PATCH] D132505: [libc++] Refactor deque::iterator algorithm optimizations
Nikolas Klauser via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Sat Sep 17 07:06:19 PDT 2022
philnik added a comment.
For completeness, here are the updated numbers:
--------------------------------------------------------------------
Benchmark old new
--------------------------------------------------------------------
BM_deque_vector_copy/0 0.272 ns 1.58 ns
BM_deque_vector_copy/1 3.40 ns 1.85 ns
BM_deque_vector_copy/2 3.30 ns 1.72 ns
BM_deque_vector_copy/64 4.24 ns 2.90 ns
BM_deque_vector_copy/512 18.6 ns 26.1 ns
BM_deque_vector_copy/1024 41.3 ns 40.1 ns
BM_deque_vector_copy/4000 145 ns 127 ns
BM_deque_vector_copy/4096 148 ns 131 ns
BM_deque_vector_copy/5500 196 ns 167 ns
BM_deque_vector_copy/64000 5518 ns 5647 ns
BM_deque_vector_copy/65536 5665 ns 5674 ns
BM_deque_vector_copy/70000 5852 ns 5852 ns
BM_deque_vector_ranges_copy/0 0.265 ns 1.58 ns
BM_deque_vector_ranges_copy/1 1.30 ns 1.85 ns
BM_deque_vector_ranges_copy/2 1.60 ns 1.73 ns
BM_deque_vector_ranges_copy/64 23.5 ns 2.90 ns
BM_deque_vector_ranges_copy/512 188 ns 16.9 ns
BM_deque_vector_ranges_copy/1024 370 ns 39.5 ns
BM_deque_vector_ranges_copy/4000 1437 ns 128 ns
BM_deque_vector_ranges_copy/4096 1474 ns 128 ns
BM_deque_vector_ranges_copy/5500 1990 ns 171 ns
BM_deque_vector_ranges_copy/64000 23213 ns 5533 ns
BM_deque_vector_ranges_copy/65536 23703 ns 5469 ns
BM_deque_vector_ranges_copy/70000 25287 ns 5884 ns
BM_deque_deque_copy/0 1.06 ns 1.24 ns
BM_deque_deque_copy/1 5.77 ns 3.11 ns
BM_deque_deque_copy/2 5.63 ns 2.91 ns
BM_deque_deque_copy/64 6.42 ns 4.03 ns
BM_deque_deque_copy/512 21.4 ns 18.8 ns
BM_deque_deque_copy/1024 43.0 ns 41.7 ns
BM_deque_deque_copy/4000 114 ns 97.6 ns
BM_deque_deque_copy/4096 171 ns 156 ns
BM_deque_deque_copy/5500 236 ns 204 ns
BM_deque_deque_copy/64000 5387 ns 5406 ns
BM_deque_deque_copy/65536 5552 ns 5566 ns
BM_deque_deque_copy/70000 5882 ns 5907 ns
BM_deque_deque_ranges_copy/0 0.793 ns 1.21 ns
BM_deque_deque_ranges_copy/1 1.85 ns 3.12 ns
BM_deque_deque_ranges_copy/2 2.38 ns 2.90 ns
BM_deque_deque_ranges_copy/64 44.4 ns 4.03 ns
BM_deque_deque_ranges_copy/512 281 ns 18.7 ns
BM_deque_deque_ranges_copy/1024 555 ns 41.8 ns
BM_deque_deque_ranges_copy/4000 2155 ns 97.8 ns
BM_deque_deque_ranges_copy/4096 2217 ns 156 ns
BM_deque_deque_ranges_copy/5500 2977 ns 206 ns
BM_deque_deque_ranges_copy/64000 34584 ns 5412 ns
BM_deque_deque_ranges_copy/65536 35419 ns 5582 ns
BM_deque_deque_ranges_copy/70000 37847 ns 5912 ns
BM_vector_deque_copy/0 0.585 ns 0.528 ns
BM_vector_deque_copy/1 2.98 ns 2.11 ns
BM_vector_deque_copy/2 2.79 ns 1.98 ns
BM_vector_deque_copy/64 3.70 ns 3.44 ns
BM_vector_deque_copy/512 13.5 ns 13.4 ns
BM_vector_deque_copy/1024 40.0 ns 39.2 ns
BM_vector_deque_copy/4000 145 ns 130 ns
BM_vector_deque_copy/4096 147 ns 135 ns
BM_vector_deque_copy/5500 199 ns 182 ns
BM_vector_deque_copy/64000 5452 ns 5424 ns
BM_vector_deque_copy/65536 5718 ns 5698 ns
BM_vector_deque_copy/70000 5985 ns 5992 ns
BM_vector_deque_ranges_copy/0 0.529 ns 0.531 ns
BM_vector_deque_ranges_copy/1 1.06 ns 2.11 ns
BM_vector_deque_ranges_copy/2 1.46 ns 1.98 ns
BM_vector_deque_ranges_copy/64 23.2 ns 3.44 ns
BM_vector_deque_ranges_copy/512 187 ns 13.5 ns
BM_vector_deque_ranges_copy/1024 369 ns 39.2 ns
BM_vector_deque_ranges_copy/4000 1440 ns 130 ns
BM_vector_deque_ranges_copy/4096 1474 ns 135 ns
BM_vector_deque_ranges_copy/5500 1979 ns 182 ns
BM_vector_deque_ranges_copy/64000 22995 ns 5421 ns
BM_vector_deque_ranges_copy/65536 23549 ns 5705 ns
BM_vector_deque_ranges_copy/70000 25182 ns 5994 ns
BM_deque_vector_move/0 0.266 ns 1.59 ns
BM_deque_vector_move/1 3.26 ns 1.85 ns
BM_deque_vector_move/2 3.23 ns 1.72 ns
BM_deque_vector_move/64 4.24 ns 2.90 ns
BM_deque_vector_move/512 14.1 ns 13.5 ns
BM_deque_vector_move/1024 40.4 ns 39.5 ns
BM_deque_vector_move/4000 144 ns 128 ns
BM_deque_vector_move/4096 144 ns 128 ns
BM_deque_vector_move/5500 200 ns 171 ns
BM_deque_vector_move/64000 5482 ns 5458 ns
BM_deque_vector_move/65536 5436 ns 5469 ns
BM_deque_vector_move/70000 5846 ns 5781 ns
BM_deque_vector_ranges_move/0 0.264 ns 1.58 ns
BM_deque_vector_ranges_move/1 1.32 ns 1.85 ns
BM_deque_vector_ranges_move/2 1.59 ns 1.72 ns
BM_deque_vector_ranges_move/64 35.3 ns 2.91 ns
BM_deque_vector_ranges_move/512 202 ns 13.4 ns
BM_deque_vector_ranges_move/1024 395 ns 39.5 ns
BM_deque_vector_ranges_move/4000 1548 ns 128 ns
BM_deque_vector_ranges_move/4096 1559 ns 128 ns
BM_deque_vector_ranges_move/5500 2133 ns 170 ns
BM_deque_vector_ranges_move/64000 24609 ns 5460 ns
BM_deque_vector_ranges_move/65536 25182 ns 5470 ns
BM_deque_vector_ranges_move/70000 27414 ns 5774 ns
BM_deque_deque_move/0 1.06 ns 1.22 ns
BM_deque_deque_move/1 5.67 ns 3.10 ns
BM_deque_deque_move/2 5.54 ns 2.90 ns
BM_deque_deque_move/64 6.37 ns 4.00 ns
BM_deque_deque_move/512 21.3 ns 18.8 ns
BM_deque_deque_move/1024 43.0 ns 41.7 ns
BM_deque_deque_move/4000 112 ns 97.6 ns
BM_deque_deque_move/4096 171 ns 156 ns
BM_deque_deque_move/5500 233 ns 206 ns
BM_deque_deque_move/64000 5378 ns 5401 ns
BM_deque_deque_move/65536 5546 ns 5562 ns
BM_deque_deque_move/70000 5877 ns 5911 ns
BM_deque_deque_ranges_move/0 0.792 ns 1.21 ns
BM_deque_deque_ranges_move/1 1.85 ns 3.10 ns
BM_deque_deque_ranges_move/2 2.38 ns 2.90 ns
BM_deque_deque_ranges_move/64 43.9 ns 4.02 ns
BM_deque_deque_ranges_move/512 281 ns 18.8 ns
BM_deque_deque_ranges_move/1024 560 ns 41.7 ns
BM_deque_deque_ranges_move/4000 2171 ns 97.6 ns
BM_deque_deque_ranges_move/4096 2245 ns 156 ns
BM_deque_deque_ranges_move/5500 3013 ns 204 ns
BM_deque_deque_ranges_move/64000 35085 ns 5399 ns
BM_deque_deque_ranges_move/65536 35939 ns 5560 ns
BM_deque_deque_ranges_move/70000 38388 ns 5903 ns
BM_vector_deque_move/0 0.597 ns 0.579 ns
BM_vector_deque_move/1 2.83 ns 2.11 ns
BM_vector_deque_move/2 2.70 ns 1.98 ns
BM_vector_deque_move/64 3.68 ns 3.43 ns
BM_vector_deque_move/512 13.5 ns 13.5 ns
BM_vector_deque_move/1024 39.9 ns 39.1 ns
BM_vector_deque_move/4000 145 ns 130 ns
BM_vector_deque_move/4096 146 ns 135 ns
BM_vector_deque_move/5500 200 ns 182 ns
BM_vector_deque_move/64000 5454 ns 5422 ns
BM_vector_deque_move/65536 5722 ns 5693 ns
BM_vector_deque_move/70000 5986 ns 5992 ns
BM_vector_deque_ranges_move/0 0.539 ns 0.528 ns
BM_vector_deque_ranges_move/1 1.06 ns 2.13 ns
BM_vector_deque_ranges_move/2 1.47 ns 1.99 ns
BM_vector_deque_ranges_move/64 24.0 ns 3.44 ns
BM_vector_deque_ranges_move/512 189 ns 13.8 ns
BM_vector_deque_ranges_move/1024 375 ns 39.5 ns
BM_vector_deque_ranges_move/4000 1436 ns 130 ns
BM_vector_deque_ranges_move/4096 1472 ns 135 ns
BM_vector_deque_ranges_move/5500 1977 ns 183 ns
BM_vector_deque_ranges_move/64000 22981 ns 5660 ns
BM_vector_deque_ranges_move/65536 23577 ns 5888 ns
BM_vector_deque_ranges_move/70000 25131 ns 6174 ns
BM_deque_vector_copy_backward/0 0.264 ns 1.61 ns
BM_deque_vector_copy_backward/1 2.96 ns 1.87 ns
BM_deque_vector_copy_backward/2 3.55 ns 1.75 ns
BM_deque_vector_copy_backward/64 4.49 ns 3.02 ns
BM_deque_vector_copy_backward/512 16.1 ns 13.4 ns
BM_deque_vector_copy_backward/1024 41.1 ns 40.3 ns
BM_deque_vector_copy_backward/4000 151 ns 128 ns
BM_deque_vector_copy_backward/4096 145 ns 129 ns
BM_deque_vector_copy_backward/5500 211 ns 171 ns
BM_deque_vector_copy_backward/64000 5471 ns 5517 ns
BM_deque_vector_copy_backward/65536 5439 ns 5440 ns
BM_deque_vector_copy_backward/70000 5838 ns 5775 ns
BM_deque_vector_ranges_copy_backward/0 0.264 ns 1.58 ns
BM_deque_vector_ranges_copy_backward/1 1.17 ns 1.85 ns
BM_deque_vector_ranges_copy_backward/2 1.45 ns 1.72 ns
BM_deque_vector_ranges_copy_backward/64 26.0 ns 3.12 ns
BM_deque_vector_ranges_copy_backward/512 147 ns 13.2 ns
BM_deque_vector_ranges_copy_backward/1024 282 ns 39.9 ns
BM_deque_vector_ranges_copy_backward/4000 1103 ns 126 ns
BM_deque_vector_ranges_copy_backward/4096 1131 ns 127 ns
BM_deque_vector_ranges_copy_backward/5500 1514 ns 171 ns
BM_deque_vector_ranges_copy_backward/64000 17553 ns 5512 ns
BM_deque_vector_ranges_copy_backward/65536 17944 ns 5441 ns
BM_deque_vector_ranges_copy_backward/70000 19183 ns 5772 ns
BM_deque_deque_copy_backward/0 1.16 ns 1.33 ns
BM_deque_deque_copy_backward/1 6.58 ns 3.17 ns
BM_deque_deque_copy_backward/2 6.87 ns 3.17 ns
BM_deque_deque_copy_backward/64 7.77 ns 4.18 ns
BM_deque_deque_copy_backward/512 24.3 ns 19.2 ns
BM_deque_deque_copy_backward/1024 46.2 ns 43.2 ns
BM_deque_deque_copy_backward/4000 121 ns 101 ns
BM_deque_deque_copy_backward/4096 179 ns 162 ns
BM_deque_deque_copy_backward/5500 247 ns 216 ns
BM_deque_deque_copy_backward/64000 5362 ns 5429 ns
BM_deque_deque_copy_backward/65536 5474 ns 5551 ns
BM_deque_deque_copy_backward/70000 5856 ns 5942 ns
BM_deque_deque_ranges_copy_backward/0 0.792 ns 1.34 ns
BM_deque_deque_ranges_copy_backward/1 2.04 ns 3.17 ns
BM_deque_deque_ranges_copy_backward/2 2.93 ns 3.17 ns
BM_deque_deque_ranges_copy_backward/64 56.0 ns 4.20 ns
BM_deque_deque_ranges_copy_backward/512 372 ns 19.2 ns
BM_deque_deque_ranges_copy_backward/1024 715 ns 43.2 ns
BM_deque_deque_ranges_copy_backward/4000 2839 ns 101 ns
BM_deque_deque_ranges_copy_backward/4096 2861 ns 163 ns
BM_deque_deque_ranges_copy_backward/5500 3850 ns 216 ns
BM_deque_deque_ranges_copy_backward/64000 42909 ns 5425 ns
BM_deque_deque_ranges_copy_backward/65536 44236 ns 5547 ns
BM_deque_deque_ranges_copy_backward/70000 47484 ns 5939 ns
BM_vector_deque_copy_backward/0 0.597 ns 0.566 ns
BM_vector_deque_copy_backward/1 4.17 ns 2.11 ns
BM_vector_deque_copy_backward/2 3.83 ns 1.98 ns
BM_vector_deque_copy_backward/64 4.99 ns 3.43 ns
BM_vector_deque_copy_backward/512 18.1 ns 13.8 ns
BM_vector_deque_copy_backward/1024 43.6 ns 41.0 ns
BM_vector_deque_copy_backward/4000 160 ns 134 ns
BM_vector_deque_copy_backward/4096 160 ns 138 ns
BM_vector_deque_copy_backward/5500 225 ns 180 ns
BM_vector_deque_copy_backward/64000 5458 ns 5440 ns
BM_vector_deque_copy_backward/65536 5648 ns 5657 ns
BM_vector_deque_copy_backward/70000 6021 ns 6018 ns
BM_vector_deque_ranges_copy_backward/0 0.529 ns 0.538 ns
BM_vector_deque_ranges_copy_backward/1 1.06 ns 2.11 ns
BM_vector_deque_ranges_copy_backward/2 1.35 ns 1.98 ns
BM_vector_deque_ranges_copy_backward/64 25.4 ns 3.43 ns
BM_vector_deque_ranges_copy_backward/512 166 ns 13.8 ns
BM_vector_deque_ranges_copy_backward/1024 286 ns 41.0 ns
BM_vector_deque_ranges_copy_backward/4000 1149 ns 134 ns
BM_vector_deque_ranges_copy_backward/4096 1138 ns 139 ns
BM_vector_deque_ranges_copy_backward/5500 1536 ns 182 ns
BM_vector_deque_ranges_copy_backward/64000 17771 ns 5467 ns
BM_vector_deque_ranges_copy_backward/65536 18343 ns 5658 ns
BM_vector_deque_ranges_copy_backward/70000 19422 ns 6062 ns
BM_deque_vector_move_backward/0 0.271 ns 1.60 ns
BM_deque_vector_move_backward/1 2.91 ns 1.87 ns
BM_deque_vector_move_backward/2 3.51 ns 1.74 ns
BM_deque_vector_move_backward/64 4.49 ns 3.13 ns
BM_deque_vector_move_backward/512 15.8 ns 13.4 ns
BM_deque_vector_move_backward/1024 41.2 ns 40.0 ns
BM_deque_vector_move_backward/4000 147 ns 126 ns
BM_deque_vector_move_backward/4096 145 ns 127 ns
BM_deque_vector_move_backward/5500 207 ns 171 ns
BM_deque_vector_move_backward/64000 5465 ns 5515 ns
BM_deque_vector_move_backward/65536 5435 ns 5441 ns
BM_deque_vector_move_backward/70000 5835 ns 5787 ns
BM_deque_vector_ranges_move_backward/0 0.264 ns 1.58 ns
BM_deque_vector_ranges_move_backward/1 1.17 ns 1.85 ns
BM_deque_vector_ranges_move_backward/2 1.45 ns 1.72 ns
BM_deque_vector_ranges_move_backward/64 23.2 ns 3.12 ns
BM_deque_vector_ranges_move_backward/512 147 ns 13.3 ns
BM_deque_vector_ranges_move_backward/1024 281 ns 39.9 ns
BM_deque_vector_ranges_move_backward/4000 1097 ns 126 ns
BM_deque_vector_ranges_move_backward/4096 1122 ns 127 ns
BM_deque_vector_ranges_move_backward/5500 1514 ns 170 ns
BM_deque_vector_ranges_move_backward/64000 17551 ns 5507 ns
BM_deque_vector_ranges_move_backward/65536 17944 ns 5436 ns
BM_deque_vector_ranges_move_backward/70000 19183 ns 5781 ns
BM_deque_deque_move_backward/0 1.17 ns 1.34 ns
BM_deque_deque_move_backward/1 6.60 ns 3.17 ns
BM_deque_deque_move_backward/2 6.87 ns 3.17 ns
BM_deque_deque_move_backward/64 7.78 ns 4.20 ns
BM_deque_deque_move_backward/512 24.2 ns 19.2 ns
BM_deque_deque_move_backward/1024 46.2 ns 43.2 ns
BM_deque_deque_move_backward/4000 121 ns 101 ns
BM_deque_deque_move_backward/4096 179 ns 163 ns
BM_deque_deque_move_backward/5500 247 ns 217 ns
BM_deque_deque_move_backward/64000 5361 ns 5431 ns
BM_deque_deque_move_backward/65536 5465 ns 5547 ns
BM_deque_deque_move_backward/70000 5845 ns 5939 ns
BM_deque_deque_ranges_move_backward/0 0.791 ns 1.33 ns
BM_deque_deque_ranges_move_backward/1 2.04 ns 3.17 ns
BM_deque_deque_ranges_move_backward/2 2.93 ns 3.17 ns
BM_deque_deque_ranges_move_backward/64 55.7 ns 4.18 ns
BM_deque_deque_ranges_move_backward/512 351 ns 19.2 ns
BM_deque_deque_ranges_move_backward/1024 689 ns 43.2 ns
BM_deque_deque_ranges_move_backward/4000 2685 ns 101 ns
BM_deque_deque_ranges_move_backward/4096 2743 ns 162 ns
BM_deque_deque_ranges_move_backward/5500 3698 ns 215 ns
BM_deque_deque_ranges_move_backward/64000 42808 ns 5426 ns
BM_deque_deque_ranges_move_backward/65536 43858 ns 5551 ns
BM_deque_deque_ranges_move_backward/70000 46853 ns 5947 ns
BM_vector_deque_move_backward/0 0.621 ns 0.532 ns
BM_vector_deque_move_backward/1 4.17 ns 2.11 ns
BM_vector_deque_move_backward/2 3.84 ns 1.98 ns
BM_vector_deque_move_backward/64 4.99 ns 3.43 ns
BM_vector_deque_move_backward/512 18.1 ns 13.8 ns
BM_vector_deque_move_backward/1024 43.6 ns 41.0 ns
BM_vector_deque_move_backward/4000 160 ns 134 ns
BM_vector_deque_move_backward/4096 160 ns 138 ns
BM_vector_deque_move_backward/5500 225 ns 181 ns
BM_vector_deque_move_backward/64000 5457 ns 5436 ns
BM_vector_deque_move_backward/65536 5646 ns 5658 ns
BM_vector_deque_move_backward/70000 6020 ns 6022 ns
BM_vector_deque_ranges_move_backward/0 0.536 ns 0.528 ns
BM_vector_deque_ranges_move_backward/1 1.06 ns 2.11 ns
BM_vector_deque_ranges_move_backward/2 1.33 ns 1.98 ns
BM_vector_deque_ranges_move_backward/64 28.9 ns 3.43 ns
BM_vector_deque_ranges_move_backward/512 160 ns 13.8 ns
BM_vector_deque_ranges_move_backward/1024 286 ns 41.0 ns
BM_vector_deque_ranges_move_backward/4000 1197 ns 134 ns
BM_vector_deque_ranges_move_backward/4096 1138 ns 138 ns
BM_vector_deque_ranges_move_backward/5500 1552 ns 181 ns
BM_vector_deque_ranges_move_backward/64000 17834 ns 5436 ns
BM_vector_deque_ranges_move_backward/65536 18351 ns 5660 ns
BM_vector_deque_ranges_move_backward/70000 19473 ns 6022 ns
================
Comment at: libcxx/benchmarks/deque_iterator.bench.cpp:1
+//===----------------------------------------------------------------------===//
+//
----------------
ldionne wrote:
> I would be curious to try adding `_LIBCPP_ALWAYS_INLINE` to `__unwrap_and_dispatch` (and possibly other implementation details in that code path) to see what impact it has on the benchmarks. Could you try that out and report? I would assume that for raw pointers, that should all end up being inlined away.
I've changed the implementation of `copy` and `move` to use the segment iterators and that fixed the performance issues, so I didn't test with `_LIBCPP_ALWAYS_INLINE`.
================
Comment at: libcxx/include/__algorithm/copy.h:74
+ auto __iters = std::__copy(__first, __last, __segment.__begin_);
+ __result += __range_size;
+ return std::make_pair(__iters.first, __result);
----------------
huixie90 wrote:
> `+=` isn't required as part of your "concept" of segmented iterator
I'd like to investigate that when adding a non-random-access iterator to the segmented iterators, since then it will actually matter. I don't know yet whether local iterators should always be random-access or copying to output segmented iterators should be constrained on the local iterator being random-access.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132505/new/
https://reviews.llvm.org/D132505
More information about the libcxx-commits
mailing list