[libcxx-commits] [PATCH] D131395: [libc++] Implement `lexicographical_compare_three_way`
Konstantin Varlamov via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Thu Aug 25 13:47:22 PDT 2022
var-const added a comment.
In D131395#3736227 <https://reviews.llvm.org/D131395#3736227>, @avogelsgesang wrote:
> Added a benchmark
>
> Based on the below results, we can see that the fast path for random access iterators gives us around 30% peformance compared to the basic implementation
>
> Run on (40 X 3000 MHz CPU s)
> CPU Caches:
> L1 Data 32 KiB (x20)
> L1 Instruction 32 KiB (x20)
> L2 Unified 1024 KiB (x20)
> L3 Unified 14080 KiB (x2)
> Load Average: 8.64, 6.10, 3.51
> ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
> -------------------------------------------------------------------------------------------------------------------
> Benchmark Time CPU Iterations
> -------------------------------------------------------------------------------------------------------------------
> BM_lexicographical_compare_three_way<int*>/1 1.38 ns 1.38 ns 507820129
> BM_lexicographical_compare_three_way<int*>/2 2.06 ns 2.06 ns 339714887
> BM_lexicographical_compare_three_way<int*>/4 3.44 ns 3.44 ns 203747061
> BM_lexicographical_compare_three_way<int*>/8 6.20 ns 6.20 ns 111056521
> BM_lexicographical_compare_three_way<int*>/16 12.0 ns 12.0 ns 58183199
> BM_lexicographical_compare_three_way<int*>/32 22.9 ns 22.9 ns 30549045
> BM_lexicographical_compare_three_way<int*>/64 44.8 ns 44.8 ns 15615457
> BM_lexicographical_compare_three_way<int*>/128 99.0 ns 99.0 ns 7056236
> BM_lexicographical_compare_three_way<int*>/256 187 ns 187 ns 3743238
> BM_lexicographical_compare_three_way<int*>/512 363 ns 363 ns 1927025
> BM_lexicographical_compare_three_way<int*>/1024 715 ns 715 ns 969796
> BM_lexicographical_compare_three_way<int*>/2048 1418 ns 1418 ns 493273
> BM_lexicographical_compare_three_way<int*>/4096 2838 ns 2838 ns 246123
> BM_lexicographical_compare_three_way<int*>/8192 5647 ns 5647 ns 123272
> BM_lexicographical_compare_three_way<int*>/16384 11318 ns 11318 ns 61772
> BM_lexicographical_compare_three_way<int*>/32768 22565 ns 22563 ns 31000
> BM_lexicographical_compare_three_way<int*>/65536 45231 ns 45229 ns 15513
> BM_lexicographical_compare_three_way<int*>/131072 91047 ns 91042 ns 7678
> BM_lexicographical_compare_three_way<int*>/262144 181619 ns 181601 ns 3867
> BM_lexicographical_compare_three_way<int*>/524288 361487 ns 361452 ns 1936
> BM_lexicographical_compare_three_way<int*>/1048576 736236 ns 736225 ns 953
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/1 1.03 ns 1.03 ns 675808965
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/2 2.41 ns 2.41 ns 290934998
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/4 4.48 ns 4.48 ns 156648871
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/8 8.60 ns 8.60 ns 81462544
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/16 16.9 ns 16.9 ns 41431608
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/32 33.3 ns 33.3 ns 20989390
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/64 66.3 ns 66.3 ns 10542387
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/128 142 ns 142 ns 4949055
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/256 274 ns 274 ns 2555910
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/512 537 ns 537 ns 1296173
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/1024 1066 ns 1066 ns 656525
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/2048 2121 ns 2121 ns 329789
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/4096 4237 ns 4237 ns 165380
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/8192 8452 ns 8452 ns 82559
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/16384 16897 ns 16897 ns 41435
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/32768 33800 ns 33798 ns 20716
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/65536 67609 ns 67608 ns 10340
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/131072 135777 ns 135770 ns 5156
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/262144 270691 ns 270686 ns 2588
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/524288 542588 ns 542550 ns 1290
> BM_lexicographical_compare_three_way<cpp17_input_iterator<int*>>/1048576 1088975 ns 1088880 ns 642
Can you please redo the benchmark in a different way by doing some local (not to be committed) modifications? Rather than comparing the results for `int*` versus `cpp17_input_iterator<int*>`, remove the input iterator tests completely, run the `int*` benchmark against the current implementation, then remove the optimization and rerun the benchmark. That way, it would be a proper "apples-to-apples" comparison: `int*` vs. `int*` with or without the optimization. (I don't expect results to change much, but better be sure)
================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:23
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
----------------
Is this header still used?
================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:36
+
+ if constexpr (__is_cpp17_random_access_iterator<_InputIterator1>::value &&
+ __is_cpp17_random_access_iterator<_InputIterator2>::value) {
----------------
Optional: consider refactoring both branches into helper functions.
================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:43
+ static_assert(is_integral_v<typename iterator_traits<_InputIterator1>::difference_type>,
+ "A non-integral difference_type is undefined behavior");
+ static_assert(is_integral_v<typename iterator_traits<_InputIterator2>::difference_type>,
----------------
Nit: I think this would read a little better with a verb, e.g. `Using a non-integral difference_type is undefined behavior`.
Also, this seems a little overkill -- I'm pretty sure there are many places in algorithms where we subtract random access iterators and expect to get an integral type, without checking. I don't object to the `static_assert`s, but the comment (starting from `We rely on the fact...`) seems a little unnecessary to me (the part about undefined behavior is already captured in the static assertion).
================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:49
+ auto __min_len = __len1 < __len2 ? __len1 : __len2;
+ for (decltype(__min_len) __i = 0; __i < __min_len;) {
+ auto __c = __comp(*__first1, *__first2);
----------------
Nit: some empty lines will help separate this algorithm into logical blocks and make it more readable. I'd suggest adding blank lines before and after the `for` loop and before the `else` branch.
================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:54
+ }
+ ++__i;
+ ++__first1;
----------------
Nit: increment `__i` in the iteration expression? Otherwise, it seems more like a `while` loop.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D131395/new/
https://reviews.llvm.org/D131395
More information about the libcxx-commits
mailing list