[libcxx-commits] [PATCH] D132505: [libc++] Refactor deque::iterator algorithm optimizations
Nikolas Klauser via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Thu Sep 15 07:46:54 PDT 2022
philnik marked 4 inline comments as done.
philnik added a comment.
Here are the benchmarks for the current patch:
-----------------------------------------------------------------------
Benchmark old new
-----------------------------------------------------------------------
BM_deque_vector_copy/0 0.272 ns 1.58 ns
BM_deque_vector_copy/1 3.40 ns 1.85 ns
BM_deque_vector_copy/2 3.30 ns 1.72 ns
BM_deque_vector_copy/64 4.24 ns 2.90 ns
BM_deque_vector_copy/512 18.6 ns 17.2 ns
BM_deque_vector_copy/1024 41.3 ns 40.1 ns
BM_deque_vector_copy/4000 145 ns 127 ns
BM_deque_vector_copy/4096 148 ns 132 ns
BM_deque_vector_copy/5500 196 ns 169 ns
BM_deque_vector_copy/64000 5518 ns 5554 ns
BM_deque_vector_copy/65536 5665 ns 5680 ns
BM_deque_vector_copy/70000 5852 ns 5772 ns
BM_deque_vector_ranges_copy/0 0.265 ns 1.58 ns
BM_deque_vector_ranges_copy/1 1.30 ns 1.87 ns
BM_deque_vector_ranges_copy/2 1.60 ns 1.72 ns
BM_deque_vector_ranges_copy/64 23.5 ns 2.90 ns
BM_deque_vector_ranges_copy/512 188 ns 16.9 ns
BM_deque_vector_ranges_copy/1024 370 ns 39.5 ns
BM_deque_vector_ranges_copy/4000 1437 ns 128 ns
BM_deque_vector_ranges_copy/4096 1474 ns 128 ns
BM_deque_vector_ranges_copy/5500 1990 ns 170 ns
BM_deque_vector_ranges_copy/64000 23213 ns 5503 ns
BM_deque_vector_ranges_copy/65536 23703 ns 5436 ns
BM_deque_vector_ranges_copy/70000 25287 ns 5897 ns
BM_deque_deque_copy/0 1.06 ns 7.18 ns
BM_deque_deque_copy/1 5.77 ns 12.8 ns
BM_deque_deque_copy/2 5.63 ns 13.1 ns
BM_deque_deque_copy/64 6.42 ns 15.2 ns
BM_deque_deque_copy/512 21.4 ns 29.1 ns
BM_deque_deque_copy/1024 43.0 ns 56.9 ns
BM_deque_deque_copy/4000 114 ns 116 ns
BM_deque_deque_copy/4096 171 ns 177 ns
BM_deque_deque_copy/5500 236 ns 225 ns
BM_deque_deque_copy/64000 5387 ns 5431 ns
BM_deque_deque_copy/65536 5552 ns 5560 ns
BM_deque_deque_copy/70000 5882 ns 5941 ns
BM_deque_deque_ranges_copy/0 0.793 ns 7.12 ns
BM_deque_deque_ranges_copy/1 1.85 ns 12.8 ns
BM_deque_deque_ranges_copy/2 2.38 ns 13.0 ns
BM_deque_deque_ranges_copy/64 44.4 ns 15.2 ns
BM_deque_deque_ranges_copy/512 281 ns 29.1 ns
BM_deque_deque_ranges_copy/1024 555 ns 56.9 ns
BM_deque_deque_ranges_copy/4000 2155 ns 115 ns
BM_deque_deque_ranges_copy/4096 2217 ns 177 ns
BM_deque_deque_ranges_copy/5500 2977 ns 226 ns
BM_deque_deque_ranges_copy/64000 34584 ns 5432 ns
BM_deque_deque_ranges_copy/65536 35419 ns 5572 ns
BM_deque_deque_ranges_copy/70000 37847 ns 5972 ns
BM_vector_deque_copy/0 0.585 ns 0.529 ns
BM_vector_deque_copy/1 2.98 ns 2.73 ns
BM_vector_deque_copy/2 2.79 ns 2.62 ns
BM_vector_deque_copy/64 3.70 ns 3.44 ns
BM_vector_deque_copy/512 13.5 ns 13.6 ns
BM_vector_deque_copy/1024 40.0 ns 40.1 ns
BM_vector_deque_copy/4000 145 ns 144 ns
BM_vector_deque_copy/4096 147 ns 147 ns
BM_vector_deque_copy/5500 199 ns 201 ns
BM_vector_deque_copy/64000 5452 ns 5462 ns
BM_vector_deque_copy/65536 5718 ns 5691 ns
BM_vector_deque_copy/70000 5985 ns 5988 ns
BM_vector_deque_ranges_copy/0 0.529 ns 0.528 ns
BM_vector_deque_ranges_copy/1 1.06 ns 2.69 ns
BM_vector_deque_ranges_copy/2 1.46 ns 2.65 ns
BM_vector_deque_ranges_copy/64 23.2 ns 3.51 ns
BM_vector_deque_ranges_copy/512 187 ns 14.0 ns
BM_vector_deque_ranges_copy/1024 369 ns 40.2 ns
BM_vector_deque_ranges_copy/4000 1440 ns 145 ns
BM_vector_deque_ranges_copy/4096 1474 ns 146 ns
BM_vector_deque_ranges_copy/5500 1979 ns 202 ns
BM_vector_deque_ranges_copy/64000 22995 ns 5512 ns
BM_vector_deque_ranges_copy/65536 23549 ns 5695 ns
BM_vector_deque_ranges_copy/70000 25182 ns 6064 ns
BM_deque_vector_move/0 0.266 ns 1.61 ns
BM_deque_vector_move/1 3.26 ns 1.91 ns
BM_deque_vector_move/2 3.23 ns 1.73 ns
BM_deque_vector_move/64 4.24 ns 2.90 ns
BM_deque_vector_move/512 14.1 ns 13.5 ns
BM_deque_vector_move/1024 40.4 ns 39.5 ns
BM_deque_vector_move/4000 144 ns 128 ns
BM_deque_vector_move/4096 144 ns 128 ns
BM_deque_vector_move/5500 200 ns 170 ns
BM_deque_vector_move/64000 5482 ns 5481 ns
BM_deque_vector_move/65536 5436 ns 5397 ns
BM_deque_vector_move/70000 5846 ns 5795 ns
BM_deque_vector_ranges_move/0 0.264 ns 1.58 ns
BM_deque_vector_ranges_move/1 1.32 ns 1.85 ns
BM_deque_vector_ranges_move/2 1.59 ns 1.72 ns
BM_deque_vector_ranges_move/64 35.3 ns 2.90 ns
BM_deque_vector_ranges_move/512 202 ns 13.5 ns
BM_deque_vector_ranges_move/1024 395 ns 39.5 ns
BM_deque_vector_ranges_move/4000 1548 ns 128 ns
BM_deque_vector_ranges_move/4096 1559 ns 128 ns
BM_deque_vector_ranges_move/5500 2133 ns 170 ns
BM_deque_vector_ranges_move/64000 24609 ns 5478 ns
BM_deque_vector_ranges_move/65536 25182 ns 5396 ns
BM_deque_vector_ranges_move/70000 27414 ns 5783 ns
BM_deque_deque_move/0 1.06 ns 7.12 ns
BM_deque_deque_move/1 5.67 ns 12.7 ns
BM_deque_deque_move/2 5.54 ns 12.9 ns
BM_deque_deque_move/64 6.37 ns 15.1 ns
BM_deque_deque_move/512 21.3 ns 28.8 ns
BM_deque_deque_move/1024 43.0 ns 56.8 ns
BM_deque_deque_move/4000 112 ns 116 ns
BM_deque_deque_move/4096 171 ns 177 ns
BM_deque_deque_move/5500 233 ns 228 ns
BM_deque_deque_move/64000 5378 ns 5442 ns
BM_deque_deque_move/65536 5546 ns 5578 ns
BM_deque_deque_move/70000 5877 ns 5932 ns
BM_deque_deque_ranges_move/0 0.792 ns 7.12 ns
BM_deque_deque_ranges_move/1 1.85 ns 12.7 ns
BM_deque_deque_ranges_move/2 2.38 ns 13.0 ns
BM_deque_deque_ranges_move/64 43.9 ns 15.1 ns
BM_deque_deque_ranges_move/512 281 ns 28.8 ns
BM_deque_deque_ranges_move/1024 560 ns 56.8 ns
BM_deque_deque_ranges_move/4000 2171 ns 115 ns
BM_deque_deque_ranges_move/4096 2245 ns 177 ns
BM_deque_deque_ranges_move/5500 3013 ns 225 ns
BM_deque_deque_ranges_move/64000 35085 ns 5429 ns
BM_deque_deque_ranges_move/65536 35939 ns 5560 ns
BM_deque_deque_ranges_move/70000 38388 ns 5935 ns
BM_vector_deque_move/0 0.597 ns 0.534 ns
BM_vector_deque_move/1 2.83 ns 2.71 ns
BM_vector_deque_move/2 2.70 ns 2.60 ns
BM_vector_deque_move/64 3.68 ns 3.44 ns
BM_vector_deque_move/512 13.5 ns 13.6 ns
BM_vector_deque_move/1024 39.9 ns 40.1 ns
BM_vector_deque_move/4000 145 ns 144 ns
BM_vector_deque_move/4096 146 ns 146 ns
BM_vector_deque_move/5500 200 ns 200 ns
BM_vector_deque_move/64000 5454 ns 5460 ns
BM_vector_deque_move/65536 5722 ns 5686 ns
BM_vector_deque_move/70000 5986 ns 5984 ns
BM_vector_deque_ranges_move/0 0.539 ns 0.528 ns
BM_vector_deque_ranges_move/1 1.06 ns 2.71 ns
BM_vector_deque_ranges_move/2 1.47 ns 2.58 ns
BM_vector_deque_ranges_move/64 24.0 ns 3.44 ns
BM_vector_deque_ranges_move/512 189 ns 13.6 ns
BM_vector_deque_ranges_move/1024 375 ns 40.1 ns
BM_vector_deque_ranges_move/4000 1436 ns 144 ns
BM_vector_deque_ranges_move/4096 1472 ns 146 ns
BM_vector_deque_ranges_move/5500 1977 ns 200 ns
BM_vector_deque_ranges_move/64000 22981 ns 5466 ns
BM_vector_deque_ranges_move/65536 23577 ns 5688 ns
BM_vector_deque_ranges_move/70000 25131 ns 5985 ns
BM_deque_vector_copy_backward/0 0.264 ns 1.58 ns
BM_deque_vector_copy_backward/1 2.96 ns 1.86 ns
BM_deque_vector_copy_backward/2 3.55 ns 1.72 ns
BM_deque_vector_copy_backward/64 4.49 ns 2.93 ns
BM_deque_vector_copy_backward/512 16.1 ns 13.2 ns
BM_deque_vector_copy_backward/1024 41.1 ns 40.0 ns
BM_deque_vector_copy_backward/4000 151 ns 126 ns
BM_deque_vector_copy_backward/4096 145 ns 127 ns
BM_deque_vector_copy_backward/5500 211 ns 170 ns
BM_deque_vector_copy_backward/64000 5471 ns 5506 ns
BM_deque_vector_copy_backward/65536 5439 ns 5415 ns
BM_deque_vector_copy_backward/70000 5838 ns 5786 ns
BM_deque_vector_ranges_copy_backward/0 0.264 ns 1.58 ns
BM_deque_vector_ranges_copy_backward/1 1.17 ns 1.85 ns
BM_deque_vector_ranges_copy_backward/2 1.45 ns 1.73 ns
BM_deque_vector_ranges_copy_backward/64 26.0 ns 3.11 ns
BM_deque_vector_ranges_copy_backward/512 147 ns 13.3 ns
BM_deque_vector_ranges_copy_backward/1024 282 ns 40.1 ns
BM_deque_vector_ranges_copy_backward/4000 1103 ns 127 ns
BM_deque_vector_ranges_copy_backward/4096 1131 ns 127 ns
BM_deque_vector_ranges_copy_backward/5500 1514 ns 170 ns
BM_deque_vector_ranges_copy_backward/6400 17553 ns 5515 ns
BM_deque_vector_ranges_copy_backward/6553 17944 ns 5415 ns
BM_deque_vector_ranges_copy_backward/7000 19183 ns 5784 ns
BM_deque_deque_copy_backward/0 1.16 ns 1.32 ns
BM_deque_deque_copy_backward/1 6.58 ns 3.17 ns
BM_deque_deque_copy_backward/2 6.87 ns 3.17 ns
BM_deque_deque_copy_backward/64 7.77 ns 4.19 ns
BM_deque_deque_copy_backward/512 24.3 ns 19.2 ns
BM_deque_deque_copy_backward/1024 46.2 ns 43.5 ns
BM_deque_deque_copy_backward/4000 121 ns 101 ns
BM_deque_deque_copy_backward/4096 179 ns 163 ns
BM_deque_deque_copy_backward/5500 247 ns 216 ns
BM_deque_deque_copy_backward/64000 5362 ns 5408 ns
BM_deque_deque_copy_backward/65536 5474 ns 5647 ns
BM_deque_deque_copy_backward/70000 5856 ns 5948 ns
BM_deque_deque_ranges_copy_backward/0 0.792 ns 1.33 ns
BM_deque_deque_ranges_copy_backward/1 2.04 ns 3.17 ns
BM_deque_deque_ranges_copy_backward/2 2.93 ns 3.17 ns
BM_deque_deque_ranges_copy_backward/64 56.0 ns 4.19 ns
BM_deque_deque_ranges_copy_backward/512 372 ns 19.2 ns
BM_deque_deque_ranges_copy_backward/1024 715 ns 43.5 ns
BM_deque_deque_ranges_copy_backward/4000 2839 ns 101 ns
BM_deque_deque_ranges_copy_backward/4096 2861 ns 163 ns
BM_deque_deque_ranges_copy_backward/5500 3850 ns 217 ns
BM_deque_deque_ranges_copy_backward/64000 42909 ns 5404 ns
BM_deque_deque_ranges_copy_backward/65536 44236 ns 5572 ns
BM_deque_deque_ranges_copy_backward/70000 47484 ns 5997 ns
BM_vector_deque_copy_backward/0 0.597 ns 0.532 ns
BM_vector_deque_copy_backward/1 4.17 ns 2.15 ns
BM_vector_deque_copy_backward/2 3.83 ns 2.03 ns
BM_vector_deque_copy_backward/64 4.99 ns 3.49 ns
BM_vector_deque_copy_backward/512 18.1 ns 14.0 ns
BM_vector_deque_copy_backward/1024 43.6 ns 41.1 ns
BM_vector_deque_copy_backward/4000 160 ns 135 ns
BM_vector_deque_copy_backward/4096 160 ns 138 ns
BM_vector_deque_copy_backward/5500 225 ns 180 ns
BM_vector_deque_copy_backward/64000 5458 ns 5435 ns
BM_vector_deque_copy_backward/65536 5648 ns 5652 ns
BM_vector_deque_copy_backward/70000 6021 ns 6026 ns
BM_vector_deque_ranges_copy_backward/0 0.529 ns 0.529 ns
BM_vector_deque_ranges_copy_backward/1 1.06 ns 2.11 ns
BM_vector_deque_ranges_copy_backward/2 1.35 ns 1.98 ns
BM_vector_deque_ranges_copy_backward/64 25.4 ns 3.43 ns
BM_vector_deque_ranges_copy_backward/512 166 ns 13.8 ns
BM_vector_deque_ranges_copy_backward/1024 286 ns 41.0 ns
BM_vector_deque_ranges_copy_backward/4000 1149 ns 134 ns
BM_vector_deque_ranges_copy_backward/4096 1138 ns 138 ns
BM_vector_deque_ranges_copy_backward/5500 1536 ns 180 ns
BM_vector_deque_ranges_copy_backward/6400 17771 ns 5435 ns
BM_vector_deque_ranges_copy_backward/6553 18343 ns 5653 ns
BM_vector_deque_ranges_copy_backward/7000 19422 ns 6024 ns
BM_deque_vector_move_backward/0 0.271 ns 1.58 ns
BM_deque_vector_move_backward/1 2.91 ns 1.85 ns
BM_deque_vector_move_backward/2 3.51 ns 1.72 ns
BM_deque_vector_move_backward/64 4.49 ns 2.94 ns
BM_deque_vector_move_backward/512 15.8 ns 13.3 ns
BM_deque_vector_move_backward/1024 41.2 ns 40.0 ns
BM_deque_vector_move_backward/4000 147 ns 126 ns
BM_deque_vector_move_backward/4096 145 ns 127 ns
BM_deque_vector_move_backward/5500 207 ns 170 ns
BM_deque_vector_move_backward/64000 5465 ns 5512 ns
BM_deque_vector_move_backward/65536 5435 ns 5414 ns
BM_deque_vector_move_backward/70000 5835 ns 5788 ns
BM_deque_vector_ranges_move_backward/0 0.264 ns 1.58 ns
BM_deque_vector_ranges_move_backward/1 1.17 ns 1.85 ns
BM_deque_vector_ranges_move_backward/2 1.45 ns 1.72 ns
BM_deque_vector_ranges_move_backward/64 23.2 ns 3.11 ns
BM_deque_vector_ranges_move_backward/512 147 ns 13.3 ns
BM_deque_vector_ranges_move_backward/1024 281 ns 39.9 ns
BM_deque_vector_ranges_move_backward/4000 1097 ns 126 ns
BM_deque_vector_ranges_move_backward/4096 1122 ns 127 ns
BM_deque_vector_ranges_move_backward/5500 1514 ns 170 ns
BM_deque_vector_ranges_move_backward/6400 17551 ns 5502 ns
BM_deque_vector_ranges_move_backward/6553 17944 ns 5415 ns
BM_deque_vector_ranges_move_backward/7000 19183 ns 5787 ns
BM_deque_deque_move_backward/0 1.17 ns 1.32 ns
BM_deque_deque_move_backward/1 6.60 ns 3.17 ns
BM_deque_deque_move_backward/2 6.87 ns 3.17 ns
BM_deque_deque_move_backward/64 7.78 ns 4.19 ns
BM_deque_deque_move_backward/512 24.2 ns 19.2 ns
BM_deque_deque_move_backward/1024 46.2 ns 43.5 ns
BM_deque_deque_move_backward/4000 121 ns 101 ns
BM_deque_deque_move_backward/4096 179 ns 163 ns
BM_deque_deque_move_backward/5500 247 ns 216 ns
BM_deque_deque_move_backward/64000 5361 ns 5401 ns
BM_deque_deque_move_backward/65536 5465 ns 5531 ns
BM_deque_deque_move_backward/70000 5845 ns 5942 ns
BM_deque_deque_ranges_move_backward/0 0.791 ns 1.32 ns
BM_deque_deque_ranges_move_backward/1 2.04 ns 3.17 ns
BM_deque_deque_ranges_move_backward/2 2.93 ns 3.17 ns
BM_deque_deque_ranges_move_backward/64 55.7 ns 4.19 ns
BM_deque_deque_ranges_move_backward/512 351 ns 19.2 ns
BM_deque_deque_ranges_move_backward/1024 689 ns 43.5 ns
BM_deque_deque_ranges_move_backward/4000 2685 ns 102 ns
BM_deque_deque_ranges_move_backward/4096 2743 ns 163 ns
BM_deque_deque_ranges_move_backward/5500 3698 ns 215 ns
BM_deque_deque_ranges_move_backward/64000 42808 ns 5394 ns
BM_deque_deque_ranges_move_backward/65536 43858 ns 5531 ns
BM_deque_deque_ranges_move_backward/70000 46853 ns 5941 ns
BM_vector_deque_move_backward/0 0.621 ns 0.528 ns
BM_vector_deque_move_backward/1 4.17 ns 2.11 ns
BM_vector_deque_move_backward/2 3.84 ns 1.98 ns
BM_vector_deque_move_backward/64 4.99 ns 3.43 ns
BM_vector_deque_move_backward/512 18.1 ns 13.8 ns
BM_vector_deque_move_backward/1024 43.6 ns 41.0 ns
BM_vector_deque_move_backward/4000 160 ns 134 ns
BM_vector_deque_move_backward/4096 160 ns 138 ns
BM_vector_deque_move_backward/5500 225 ns 180 ns
BM_vector_deque_move_backward/64000 5457 ns 5433 ns
BM_vector_deque_move_backward/65536 5646 ns 5650 ns
BM_vector_deque_move_backward/70000 6020 ns 6027 ns
BM_vector_deque_ranges_move_backward/0 0.536 ns 0.529 ns
BM_vector_deque_ranges_move_backward/1 1.06 ns 2.11 ns
BM_vector_deque_ranges_move_backward/2 1.33 ns 1.98 ns
BM_vector_deque_ranges_move_backward/64 28.9 ns 3.43 ns
BM_vector_deque_ranges_move_backward/512 160 ns 13.8 ns
BM_vector_deque_ranges_move_backward/1024 286 ns 41.0 ns
BM_vector_deque_ranges_move_backward/4000 1197 ns 134 ns
BM_vector_deque_ranges_move_backward/4096 1138 ns 138 ns
BM_vector_deque_ranges_move_backward/5500 1552 ns 180 ns
BM_vector_deque_ranges_move_backward/6400 17834 ns 5432 ns
BM_vector_deque_ranges_move_backward/6553 18351 ns 5654 ns
BM_vector_deque_ranges_move_backward/7000 19473 ns 6028 ns
================
Comment at: libcxx/include/__algorithm/copy.h:33
+ __enable_if_t<!(is_copy_constructible<_InIter>::value
+ && is_copy_constructible<_Sent>::value
+ && is_copy_constructible<_OutIter>::value), int> = 0>
----------------
huixie90 wrote:
> philnik wrote:
> > huixie90 wrote:
> > > sentinel is always copyable.
> > > this applies for other places
> > Since this is pre-existing I'd rather fix it in a follow-up. This patch is already quite large.
> sounds good to me
Doesn't apply anymore, since @var-const refactored it.
================
Comment at: libcxx/include/__algorithm/copy.h:86
+ class _OutIter,
+ __enable_if_t<__segmented_iterator_traits<_InIter>::__is_segmented_iterator::value, int> = 0>
+inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 pair<_InIter, _OutIter>
----------------
huixie90 wrote:
> philnik wrote:
> > huixie90 wrote:
> > > As we get more and more optimisations for different types, it is harder to make sure all of these overloads are mutually exclusive. Do you think this is (or will be) a problem?
> > I think this is a problem. But I don't really have a good idea how to fix it. Using `if constexpr` would probably do the job, but we don't have that option.
> I feel that someone is going to promote his priority_tag thing
Also doesn't apply anymore.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132505/new/
https://reviews.llvm.org/D132505
More information about the libcxx-commits
mailing list