[libcxx-commits] [PATCH] D131395: [libc++] Implement `lexicographical_compare_three_way`

Thu Aug 25 13:47:22 PDT 2022

var-const added a comment.

In D131395#3736227 <https://reviews.llvm.org/D131395#3736227>, @avogelsgesang wrote:

> Added a benchmark
>
> Based on the below results, we can see that the fast path for random access iterators gives us around 30% peformance compared to the basic implementation
>
>   Run on (40 X 3000 MHz CPU s)
>   CPU Caches:
>     L1 Data 32 KiB (x20)
>     L1 Instruction 32 KiB (x20)
>     L2 Unified 1024 KiB (x20)
>     L3 Unified 14080 KiB (x2)
>   Load Average: 8.64, 6.10, 3.51
>   ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
>   -------------------------------------------------------------------------------------------------------------------
>   Benchmark                                                                         Time             CPU   Iterations
>   -------------------------------------------------------------------------------------------------------------------
>   BM_lexicographical_compare_three_way<int*>/1                                   1.38 ns         1.38 ns    507820129
>   BM_lexicographical_compare_three_way<int*>/2                                   2.06 ns         2.06 ns    339714887
>   BM_lexicographical_compare_three_way<int*>/4                                   3.44 ns         3.44 ns    203747061
>   BM_lexicographical_compare_three_way<int*>/8                                   6.20 ns         6.20 ns    111056521
>   BM_lexicographical_compare_three_way<int*>/16                                  12.0 ns         12.0 ns     58183199
>   BM_lexicographical_compare_three_way<int*>/32                                  22.9 ns         22.9 ns     30549045
>   BM_lexicographical_compare_three_way<int*>/64                                  44.8 ns         44.8 ns     15615457
>   BM_lexicographical_compare_three_way<int*>/128                                 99.0 ns         99.0 ns      7056236
>   BM_lexicographical_compare_three_way<int*>/256                                  187 ns          187 ns      3743238
>   BM_lexicographical_compare_three_way<int*>/512                                  363 ns          363 ns      1927025
>   BM_lexicographical_compare_three_way<int*>/1024                                 715 ns          715 ns       969796
>   BM_lexicographical_compare_three_way<int*>/2048                                1418 ns         1418 ns       493273
>   BM_lexicographical_compare_three_way<int*>/4096                                2838 ns         2838 ns       246123
>   BM_lexicographical_compare_three_way<int*>/8192                                5647 ns         5647 ns       123272
>   BM_lexicographical_compare_three_way<int*>/16384                              11318 ns        11318 ns        61772
>   BM_lexicographical_compare_three_way<int*>/32768                              22565 ns        22563 ns        31000
>   BM_lexicographical_compare_three_way<int*>/65536                              45231 ns        45229 ns        15513
>   BM_lexicographical_compare_three_way<int*>/131072                             91047 ns        91042 ns         7678
>   BM_lexicographical_compare_three_way<int*>/262144                            181619 ns       181601 ns         3867
>   BM_lexicographical_compare_three_way<int*>/524288                            361487 ns       361452 ns         1936
>   BM_lexicographical_compare_three_way<int*>/1048576                           736236 ns       736225 ns          953
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/1             1.03 ns         1.03 ns    675808965
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/2             2.41 ns         2.41 ns    290934998
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/4             4.48 ns         4.48 ns    156648871
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/8             8.60 ns         8.60 ns     81462544
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/16            16.9 ns         16.9 ns     41431608
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/32            33.3 ns         33.3 ns     20989390
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/64            66.3 ns         66.3 ns     10542387
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/128            142 ns          142 ns      4949055
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/256            274 ns          274 ns      2555910
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/512            537 ns          537 ns      1296173
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/1024          1066 ns         1066 ns       656525
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/2048          2121 ns         2121 ns       329789
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/4096          4237 ns         4237 ns       165380
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/8192          8452 ns         8452 ns        82559
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/16384        16897 ns        16897 ns        41435
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/32768        33800 ns        33798 ns        20716
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/65536        67609 ns        67608 ns        10340
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/131072      135777 ns       135770 ns         5156
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/262144      270691 ns       270686 ns         2588
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/524288      542588 ns       542550 ns         1290
>   BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/1048576    1088975 ns      1088880 ns          642

Can you please redo the benchmark in a different way by doing some local (not to be committed) modifications? Rather than comparing the results for `int*` versus `cpp17_input_iterator<int*>`, remove the input iterator tests completely, run the `int*` benchmark against the current implementation, then remove the optimization and rerun the benchmark. That way, it would be a proper "apples-to-apples" comparison: `int*` vs. `int*` with or without the optimization. (I don't expect results to change much, but better be sure)

================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:23
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
----------------
Is this header still used?

================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:36
+
+  if constexpr (__is_cpp17_random_access_iterator<_InputIterator1>::value &&
+                __is_cpp17_random_access_iterator<_InputIterator2>::value) {
----------------
Optional: consider refactoring both branches into helper functions.

================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:43
+    static_assert(is_integral_v<typename iterator_traits<_InputIterator1>::difference_type>,
+                  "A non-integral difference_type is undefined behavior");
+    static_assert(is_integral_v<typename iterator_traits<_InputIterator2>::difference_type>,
----------------
Nit: I think this would read a little better with a verb, e.g. `Using a non-integral difference_type is undefined behavior`.

Also, this seems a little overkill -- I'm pretty sure there are many places in algorithms where we subtract random access iterators and expect to get an integral type, without checking. I don't object to the `static_assert`s, but the comment (starting from `We rely on the fact...`) seems a little unnecessary to me (the part about undefined behavior is already captured in the static assertion).

================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:49
+    auto __min_len = __len1 < __len2 ? __len1 : __len2;
+    for (decltype(__min_len) __i = 0; __i < __min_len;) {
+      auto __c = __comp(*__first1, *__first2);
----------------
Nit: some empty lines will help separate this algorithm into logical blocks and make it more readable. I'd suggest adding blank lines before and after the `for` loop and before the `else` branch.

================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:54
+      }
+      ++__i;
+      ++__first1;
----------------
Nit: increment `__i` in the iteration expression? Otherwise, it seems more like a `while` loop.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131395/new/

https://reviews.llvm.org/D131395