[libcxx-commits] [PATCH] D131395: [libc++] Implement `lexicographical_compare_three_way`

Adrian Vogelsgesang via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Tue Nov 29 15:37:04 PST 2022


avogelsgesang added inline comments.


================
Comment at: libcxx/benchmarks/lexicographical_compare_three_way.bench.cpp:73-79
+  for (auto _ : state) {
+    auto b1 = IteratorT{v1.data()};
+    auto e1 = IteratorT{v1.data() + v1.size()};
+    auto b2 = IteratorT{v2.data()};
+    auto e2 = IteratorT{v2.data() + v2.size()};
+    benchmark::DoNotOptimize(std::lexicographical_compare_three_way(b1, e1, b2, e2, std::compare_three_way()));
+  }
----------------
ldionne wrote:
> You seem to be benchmarking not only the call to `std::lexicographical_compare_three_way`, but all of:
> 
> ```
> auto b1 = IteratorT{v1.data()};
> auto e1 = IteratorT{v1.data() + v1.size()};
> auto b2 = IteratorT{v2.data()};
> auto e2 = IteratorT{v2.data() + v2.size()};
> benchmark::DoNotOptimize(std::lexicographical_compare_three_way(b1, e1, b2, e2, std::compare_three_way()));
> ```
> 
> I suspect this is where the difference for `random_access_iterator<int*>` vs `int*` comes from. I think it would make sense to instead create the various iterators outside of the timer. And then perhaps regenerate the benchmarks and hopefully the ones for `random_access_iterator<int*>` and `int*` would be within noise.
> 
> Also, we did confirm that the assembly was the same for both: https://godbolt.org/z/nTfGcKrzM.
> 
Good catch! I moved the initialization of the iterators out of the benchmark loop. In particular for the large vector sizes, I don't think that this influenced the benchmark results, but still good to make the benchmarks as focused as possible.

> I suspect this is where the difference for random_access_iterator<int*> vs int* comes from.

The change did not help, unfortunately (see my local results below). This time, the performance of `BM_lexicographical_compare_three_way_fast_path/1048576` also dropped to 700ms compared to last time. The assembly code for the different ways to trigger the fast path (direct call to the fast path; `int*`, `random_access_iterator`) is still identical.

```
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
BM_lexicographical_compare_three_way_slow_path/1                                   2.01 ns         2.01 ns    347897440
BM_lexicographical_compare_three_way_slow_path/4                                   5.00 ns         5.00 ns    137864992
BM_lexicographical_compare_three_way_slow_path/16                                  16.3 ns         16.3 ns     42347282
BM_lexicographical_compare_three_way_slow_path/64                                  64.8 ns         64.8 ns     10882499
BM_lexicographical_compare_three_way_slow_path/256                                  269 ns          269 ns      2602744
BM_lexicographical_compare_three_way_slow_path/1024                                1039 ns         1039 ns       672699
BM_lexicographical_compare_three_way_slow_path/4096                                4131 ns         4131 ns       169469
BM_lexicographical_compare_three_way_slow_path/16384                              16563 ns        16562 ns        42373
BM_lexicographical_compare_three_way_slow_path/65536                              65853 ns        65850 ns        10631
BM_lexicographical_compare_three_way_slow_path/262144                            263375 ns       263356 ns         2656
BM_lexicographical_compare_three_way_slow_path/1048576                          1060461 ns      1060399 ns          658
BM_lexicographical_compare_three_way_fast_path/1                                   1.35 ns         1.35 ns    523207011
BM_lexicographical_compare_three_way_fast_path/4                                   3.38 ns         3.38 ns    207207415
BM_lexicographical_compare_three_way_fast_path/16                                  11.4 ns         11.4 ns     60817686
BM_lexicographical_compare_three_way_fast_path/64                                  43.6 ns         43.6 ns     16014340
BM_lexicographical_compare_three_way_fast_path/256                                  181 ns          181 ns      3877235
BM_lexicographical_compare_three_way_fast_path/1024                                 700 ns          700 ns      1008340
BM_lexicographical_compare_three_way_fast_path/4096                                2758 ns         2758 ns       253557
BM_lexicographical_compare_three_way_fast_path/16384                              11054 ns        11054 ns        62963
BM_lexicographical_compare_three_way_fast_path/65536                              44180 ns        44176 ns        15921
BM_lexicographical_compare_three_way_fast_path/262144                            175889 ns       175876 ns         3986
BM_lexicographical_compare_three_way_fast_path/1048576                           707635 ns       707576 ns          990
BM_lexicographical_compare_three_way<IntPtr>/1                                     1.01 ns         1.01 ns    690305420
BM_lexicographical_compare_three_way<IntPtr>/4                                     3.70 ns         3.70 ns    189157419
BM_lexicographical_compare_three_way<IntPtr>/16                                    7.70 ns         7.70 ns     90905923
BM_lexicographical_compare_three_way<IntPtr>/64                                    29.2 ns         29.2 ns     23966259
BM_lexicographical_compare_three_way<IntPtr>/256                                    124 ns          124 ns      5635669
BM_lexicographical_compare_three_way<IntPtr>/1024                                   467 ns          467 ns      1497982
BM_lexicographical_compare_three_way<IntPtr>/4096                                  1848 ns         1848 ns       378813
BM_lexicographical_compare_three_way<IntPtr>/16384                                 7710 ns         7710 ns        89978
BM_lexicographical_compare_three_way<IntPtr>/65536                                29735 ns        29734 ns        23509
BM_lexicographical_compare_three_way<IntPtr>/262144                              122694 ns       122680 ns         5585
BM_lexicographical_compare_three_way<IntPtr>/1048576                             500608 ns       500572 ns         1000
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1             1.34 ns         1.34 ns    515760180
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4             3.42 ns         3.42 ns    205376850
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16            11.8 ns         11.8 ns     59125669
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/64            44.0 ns         44.0 ns     15869035
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/256            181 ns          180 ns      3880819
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1024           695 ns          695 ns      1003722
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4096          2754 ns         2754 ns       254042
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16384        11040 ns        11040 ns        63761
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/65536        43912 ns        43911 ns        15908
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/262144      177597 ns       177587 ns         3965
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1048576     706671 ns       706623 ns          993
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1               1.12 ns         1.12 ns    627628655
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4               4.13 ns         4.13 ns    169518923
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16              16.3 ns         16.3 ns     43030558
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/64              64.4 ns         64.4 ns     10875232
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/256              268 ns          268 ns      2611255
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1024            1040 ns         1040 ns       674434
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4096            4133 ns         4133 ns       169644
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16384          16523 ns        16522 ns        42524
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/65536          65819 ns        65814 ns        10645
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/262144        263771 ns       263765 ns         2656
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1048576      1059997 ns      1059932 ns          660
```


================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:100
+    return __lexicographical_compare_three_way_fast_path(
+        __first1, __last1, __first2, __last2, std::forward<_Cmp>(__comp));
+  } else {
----------------
Mordante wrote:
> Is `forward` intended? `__comp` is passed by value.
good catch! Changed to `move`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131395/new/

https://reviews.llvm.org/D131395



More information about the libcxx-commits mailing list