[libcxx-commits] [PATCH] D131395: [libc++] Implement `lexicographical_compare_three_way`
Adrian Vogelsgesang via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Tue Nov 29 15:37:04 PST 2022
avogelsgesang added inline comments.
================
Comment at: libcxx/benchmarks/lexicographical_compare_three_way.bench.cpp:73-79
+ for (auto _ : state) {
+ auto b1 = IteratorT{v1.data()};
+ auto e1 = IteratorT{v1.data() + v1.size()};
+ auto b2 = IteratorT{v2.data()};
+ auto e2 = IteratorT{v2.data() + v2.size()};
+ benchmark::DoNotOptimize(std::lexicographical_compare_three_way(b1, e1, b2, e2, std::compare_three_way()));
+ }
----------------
ldionne wrote:
> You seem to be benchmarking not only the call to `std::lexicographical_compare_three_way`, but all of:
>
> ```
> auto b1 = IteratorT{v1.data()};
> auto e1 = IteratorT{v1.data() + v1.size()};
> auto b2 = IteratorT{v2.data()};
> auto e2 = IteratorT{v2.data() + v2.size()};
> benchmark::DoNotOptimize(std::lexicographical_compare_three_way(b1, e1, b2, e2, std::compare_three_way()));
> ```
>
> I suspect this is where the difference for `random_access_iterator<int*>` vs `int*` comes from. I think it would make sense to instead create the various iterators outside of the timer. And then perhaps regenerate the benchmarks and hopefully the ones for `random_access_iterator<int*>` and `int*` would be within noise.
>
> Also, we did confirm that the assembly was the same for both: https://godbolt.org/z/nTfGcKrzM.
>
Good catch! I moved the initialization of the iterators out of the benchmark loop. In particular for the large vector sizes, I don't think that this influenced the benchmark results, but still good to make the benchmarks as focused as possible.
> I suspect this is where the difference for random_access_iterator<int*> vs int* comes from.
The change did not help, unfortunately (see my local results below). This time, the performance of `BM_lexicographical_compare_three_way_fast_path/1048576` also dropped to 700ms compared to last time. The assembly code for the different ways to trigger the fast path (direct call to the fast path; `int*`, `random_access_iterator`) is still identical.
```
-----------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------------------------------
BM_lexicographical_compare_three_way_slow_path/1 2.01 ns 2.01 ns 347897440
BM_lexicographical_compare_three_way_slow_path/4 5.00 ns 5.00 ns 137864992
BM_lexicographical_compare_three_way_slow_path/16 16.3 ns 16.3 ns 42347282
BM_lexicographical_compare_three_way_slow_path/64 64.8 ns 64.8 ns 10882499
BM_lexicographical_compare_three_way_slow_path/256 269 ns 269 ns 2602744
BM_lexicographical_compare_three_way_slow_path/1024 1039 ns 1039 ns 672699
BM_lexicographical_compare_three_way_slow_path/4096 4131 ns 4131 ns 169469
BM_lexicographical_compare_three_way_slow_path/16384 16563 ns 16562 ns 42373
BM_lexicographical_compare_three_way_slow_path/65536 65853 ns 65850 ns 10631
BM_lexicographical_compare_three_way_slow_path/262144 263375 ns 263356 ns 2656
BM_lexicographical_compare_three_way_slow_path/1048576 1060461 ns 1060399 ns 658
BM_lexicographical_compare_three_way_fast_path/1 1.35 ns 1.35 ns 523207011
BM_lexicographical_compare_three_way_fast_path/4 3.38 ns 3.38 ns 207207415
BM_lexicographical_compare_three_way_fast_path/16 11.4 ns 11.4 ns 60817686
BM_lexicographical_compare_three_way_fast_path/64 43.6 ns 43.6 ns 16014340
BM_lexicographical_compare_three_way_fast_path/256 181 ns 181 ns 3877235
BM_lexicographical_compare_three_way_fast_path/1024 700 ns 700 ns 1008340
BM_lexicographical_compare_three_way_fast_path/4096 2758 ns 2758 ns 253557
BM_lexicographical_compare_three_way_fast_path/16384 11054 ns 11054 ns 62963
BM_lexicographical_compare_three_way_fast_path/65536 44180 ns 44176 ns 15921
BM_lexicographical_compare_three_way_fast_path/262144 175889 ns 175876 ns 3986
BM_lexicographical_compare_three_way_fast_path/1048576 707635 ns 707576 ns 990
BM_lexicographical_compare_three_way<IntPtr>/1 1.01 ns 1.01 ns 690305420
BM_lexicographical_compare_three_way<IntPtr>/4 3.70 ns 3.70 ns 189157419
BM_lexicographical_compare_three_way<IntPtr>/16 7.70 ns 7.70 ns 90905923
BM_lexicographical_compare_three_way<IntPtr>/64 29.2 ns 29.2 ns 23966259
BM_lexicographical_compare_three_way<IntPtr>/256 124 ns 124 ns 5635669
BM_lexicographical_compare_three_way<IntPtr>/1024 467 ns 467 ns 1497982
BM_lexicographical_compare_three_way<IntPtr>/4096 1848 ns 1848 ns 378813
BM_lexicographical_compare_three_way<IntPtr>/16384 7710 ns 7710 ns 89978
BM_lexicographical_compare_three_way<IntPtr>/65536 29735 ns 29734 ns 23509
BM_lexicographical_compare_three_way<IntPtr>/262144 122694 ns 122680 ns 5585
BM_lexicographical_compare_three_way<IntPtr>/1048576 500608 ns 500572 ns 1000
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1 1.34 ns 1.34 ns 515760180
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4 3.42 ns 3.42 ns 205376850
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16 11.8 ns 11.8 ns 59125669
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/64 44.0 ns 44.0 ns 15869035
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/256 181 ns 180 ns 3880819
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1024 695 ns 695 ns 1003722
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4096 2754 ns 2754 ns 254042
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16384 11040 ns 11040 ns 63761
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/65536 43912 ns 43911 ns 15908
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/262144 177597 ns 177587 ns 3965
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1048576 706671 ns 706623 ns 993
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1 1.12 ns 1.12 ns 627628655
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4 4.13 ns 4.13 ns 169518923
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16 16.3 ns 16.3 ns 43030558
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/64 64.4 ns 64.4 ns 10875232
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/256 268 ns 268 ns 2611255
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1024 1040 ns 1040 ns 674434
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4096 4133 ns 4133 ns 169644
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16384 16523 ns 16522 ns 42524
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/65536 65819 ns 65814 ns 10645
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/262144 263771 ns 263765 ns 2656
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1048576 1059997 ns 1059932 ns 660
```
================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:100
+ return __lexicographical_compare_three_way_fast_path(
+ __first1, __last1, __first2, __last2, std::forward<_Cmp>(__comp));
+ } else {
----------------
Mordante wrote:
> Is `forward` intended? `__comp` is passed by value.
good catch! Changed to `move`
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D131395/new/
https://reviews.llvm.org/D131395
More information about the libcxx-commits
mailing list