[libcxx-commits] [PATCH] D131395: [libc++] Implement `lexicographical_compare_three_way`

Konstantin Varlamov via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Wed Nov 30 01:43:59 PST 2022


var-const requested changes to this revision.
var-const added a comment.
This revision now requires changes to proceed.

Thanks a lot for the updated benchmarks! I really appreciate the effort you put into this.

All in all, the results look pretty good, but I really hope we can get to the bottom of the difference between `int*` and `random_access_iterator<int*>`.

> The results for `{fast,slow}_path/1` vary from run to run. The `slow_path` is not faster than the `fast_path` consistently across runs. I think this is primarily variance

I suspect that this might be more than just variance. I haven't tried plotting this, but from looking at the numbers, it seems that the fast path starts _slower_ for the smallest input of `1`, becomes roughly equal in time for a slightly larger input of `4`, and becomes faster with larger inputs, with the difference becoming larger and larger as the input grows. So if we were to plot this (with the x axis being input size and y axis being time), it seems like we would get two curves where the fast path curve starts out slightly above the slow path curve, but then they intersect almost immediately and from that point the slow curve goes up at a sharper angle, so to say, than the fast curve. I could also imagine how the implementation for the fast path could end up doing slightly more work for tiny inputs. So in short, I think we might be seeing a real issue where the optimization is actually slightly less efficient for very small inputs.

I doubt, however, that this is an issue worth fixing. Adding a runtime check for small inputs would introduce branching that is likely to do way more harm than good, and optimizing for large inputs is a lot more important than for small ones. In short, this seems like a good (and probably unavoidable) tradeoff.

> `three_way<random_access_iterator<int*>>/1048576` is less efficient than `three_way<int*>/1048576`. As shown below, the assembly code is exactly identical between the two, though. So this seems to be due to some micro-architectural shenanigans (maybe code alignment?)
> [...]
> I am not sure how to further pinpoint this difference between `int*` and `random_access_iterator<int*>`.
> Either way, I think we can say for sure: The assembly for the fast path is more efficient than the assembly for the slow path.
> There seems to be some performance variability, though, due to some aspects which I don't understand fully.

Here, too, I suspect we're seeing a real thing and not just variance. I say that because the difference tends to stay pretty consistent, with `random_access_iterator` being ~40% (30-50%) slower. This seems to point to the issue being real.

I don't immediately see how the same assembly could produce consistently different timings, so I'd start with the hypothesis that the difference happens somewhere else. On the other hand, as the input grows, any factors such as inlining or not inlining the function or copying the iterators should be completely drowned out by the body of the algorithm. But perhaps looking at a larger piece of generated assembly would help pinpoint the issue. I'll try to find the time this week to see if I can reproduce the numbers on my side.

I would expect the results for `int*` and `random_access_iterator<int*>` to be within the margin of error in an optimized build. If they differ, it might imply we accidentally do something inefficient with iterators, and it would be great to get to the bottom of this because it's such a common use case. I certainly don't want to block this patch forever on this, though. Basically, I think we should dig more into the issue, but if we run out of ideas, I'd be okay to ship it as is and deal with it later.



================
Comment at: libcxx/benchmarks/lexicographical_compare_three_way.bench.cpp:73
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(std::lexicographical_compare_three_way(b1, e1, b2, e2, std::compare_three_way()));
+  }
----------------
Nit: passing this is unnecessary because it's the default value, right?


================
Comment at: libcxx/benchmarks/lexicographical_compare_three_way.bench.cpp:73
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(std::lexicographical_compare_three_way(b1, e1, b2, e2, std::compare_three_way()));
+  }
----------------
var-const wrote:
> Nit: passing this is unnecessary because it's the default value, right?
Nit: move?


================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:66
+_LIBCPP_HIDE_FROM_ABI constexpr auto __lexicographical_compare_three_way_slow_path(
+    _InputIterator1 __first1, _InputIterator1 __last1, _InputIterator2 __first2, _InputIterator2 __last2, _Cmp __comp)
+    -> decltype(__comp(*__first1, *__first2)) {
----------------
Would it make sense to pass `__comp` by reference? It could, in theory, be e.g. a lambda that captures a lot of state. In fact, we generally commit to avoid copying user-provided functors. However, see also the other comment about `__comp_ref_type` (which would make this a reference but also have an additional effect in the debug mode).


================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:91
+template <class _InputIterator1, class _InputIterator2, class _Cmp>
+_LIBCPP_HIDE_FROM_ABI constexpr auto lexicographical_compare_three_way(
+    _InputIterator1 __first1, _InputIterator1 __last1, _InputIterator2 __first2, _InputIterator2 __last2, _Cmp __comp)
----------------
As a libc++-specific extension, we mark the return value of `lexicographical_compare` with `_LIBCPP_NODISCARD_EXT`. We should do the same here (and add a check to `test/libcxx/diagnostics/nodiscard_extensions.verify.cpp`).


================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:109
+    _InputIterator1 __first1, _InputIterator1 __last1, _InputIterator2 __first2, _InputIterator2 __last2) {
+  return std::lexicographical_compare_three_way(__first1, __last1, __first2, __last2, compare_three_way());
+}
----------------
Move the iterators? (in the other function as well)

You might additionally `static_assert` that the given iterators are copyable, to prevent users from accidentally passing move-only iterators (that our implementation would happen to accept due to the optimization but which isn't guaranteed by the Standard). A few of the existing algorithms do that, see e.g. `upper_bound`.


================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:100
+    return __lexicographical_compare_three_way_fast_path(
+        __first1, __last1, __first2, __last2, std::forward<_Cmp>(__comp));
+  } else {
----------------
JohelEGP wrote:
> avogelsgesang wrote:
> > Mordante wrote:
> > > Is `forward` intended? `__comp` is passed by value.
> > good catch! Changed to `move`
> I think `ref(__comp)` is the preferred way.
@JohenEGP I presume you referring to the `__comp_ref_type`, right? It's a great suggestion.

@avogelsgesang For context, there is an existing pattern in algorithms where the internal implementation of the function wraps the comparator in a typedef that is defined differently in debug and non-debug modes (see `include/__algorithm/comp_ref_type.h`):
```
#ifdef _LIBCPP_ENABLE_DEBUG_MODE
template <class _Comp>
using __comp_ref_type = __debug_less<_Comp>;
#else
template <class _Comp>
using __comp_ref_type = _Comp&;
#endif
```

So in non-debug modes, this resolves to simply a reference. In debug mode, however, it creates a temporary of a helper type `__debug_less` that additionally checks that the comparator in fact does induce a strict weak order.

I think we want to continue using this pattern going forward (e.g. `lexicographical_compare` uses it). The only issue is that the existing `__debug_less` only supports comparators returning a boolean. You would probably need to create a separate `__three_way_comp_ref_type` typedef and a separate `__debug_three_way_comp` helper struct (names are subject to change).

Please let me know if you'd like any additional context on this (this can be kinda confusing). Many existing algorithms can be used as examples of this pattern; a caveat is that more recent code uses a (very) slightly different approach:
- older code tends to call have `_Compare` as a template parameter of the internal function, declare the parameter as `_Compare __comp`, and have the caller specify the template argument explicitly, like `std::__foo<_comp_ref_type<_Comp>>(__first, __last, __comp)`. See e.g. `lexicographical_compare`;
- in newer code, the pattern was to declare the parameter as a forwarding reference `_Compare&& __comp` and have the caller do a `static_cast`, like `std::__foo(__first, __last, static_cast<__comp_ref_type(__comp)>` (see e.g. `inplace_merge`).

If you prefer to, I'm fine with doing this in a follow up since this patch has been open for a while now.


================
Comment at: libcxx/include/__algorithm/lexicographical_compare_three_way.h:48
+    auto __len2    = __last2 - __first2;
+    auto __min_len = __len1 < __len2 ? __len1 : __len2;
+    for (decltype(__min_len) __i = 0; __i < __min_len;) {
----------------
Mordante wrote:
> avogelsgesang wrote:
> > mumbleskates wrote:
> > > philnik wrote:
> > > > Mordante wrote:
> > > > > philnik wrote:
> > > > > > Mordante wrote:
> > > > > > > Why not using `std::min` here? I assume it is because the types of `__len1` and `__len2` can differ.
> > > > > > > 
> > > > > > > I'm not happy with all the `auto`s in this code; it's hard to understand what the intended types are and whether these are the correct intended types. Please remove the `auto`s so it's clear what the types are.
> > > > > > > 
> > > > > > > I really would like something along the lines of
> > > > > > > ```
> > > > > > > __iter_diff_t<_InputIterator1> __len1 = __last1 - __first1;
> > > > > > > __iter_diff_t<_InputIterator2> __len2 = __last2 - __first2;
> > > > > > > auto __min = std::min<common_type_t<decltype(__len1), decltype(__len2)>(__len1, __len2));
> > > > > > > ```
> > > > > > > This is more verbose, but at least I can understand how the types are defined and whether this is as intended.
> > > > > > > 
> > > > > > I very much disagree with you here. IMO the `__iter_diff_t<_InputIterator1>` and so on just make the code more verbose and give the option to get it wrong. `__last1 - __first1` will always return a difference type. That's not exactly surprising. Also, you want to use `decltype()` for the min for some reason. I guess because otherwise it's //really// long. How does that help with understanding the code?
> > > > > It helps since the rules of the type of the conditional expression are not simple. I had to verify with the standard it does the right thing. So instead of spending a few seconds to validate these 3 lines I had to spend several minutes.
> > > > > 
> > > > > Code should be optimized for understanding by humans, `auto` quite often saves the writer from typing a few characters. (The compiler doesn't care either way it does its thing.)
> > > > > 
> > > > > The verbose code helps to communicate what the author of the code intended to happen. Relaying on some (not always well understood) language rules means it's less clear for the reader to understand what the writer intended. Both may have a different understanding of these rules.
> > > > > 
> > > > > I also considered to use 3 types
> > > > > ```
> > > > > using _Len1 = __iter_diff_t<_InputIterator1>;
> > > > > using _Len2 = __iter_diff_t<_InputIterator2>;
> > > > > using _Min = common_type_t<_Len1, _Len2>;
> > > > > 
> > > > > _Len1 __len1 = __last1 - __first1;
> > > > > _Len2 __len2 = __last2 - __first2;
> > > > > 
> > > > > _Min __min = std::min<_Min>(__len1, __len2);
> > > > > ```
> > > > > 
> > > > > I don't mind that solution either.
> > > > > It helps since the rules of the type of the conditional expression are not simple. I had to verify with the standard it does the right thing. So instead of spending a few seconds to validate these 3 lines I had to spend several minutes.
> > > > 
> > > > I don't see how `common_type` helps with the type of the conditional not being simple. That's exactly what `common_type` uses to get the type in this case AFAICT (https://eel.is/c++draft/type.traits#meta.trans.other-3). Plus is has a heap of other conditionals that are hard to get through.
> > > > 
> > > > > Code should be optimized for understanding by humans, `auto` quite often saves the writer from typing a few characters. (The compiler doesn't care either way it does its thing.)
> > > > I agree that code should be written to make it easier to read. IMO littering the code with types I don't really care about makes it harder to read. i.e. I don't really care that `a - b` returns a value of type `__iter_diff_t<_InputIterator1>`, but now I have to check that you actually named the correct type. Thinking about it, `integral auto __len1 = __last1 - __first1` would be great. Not sure how much compile time overhead that would incur though. WDYT?
> > > > 
> > > > > The verbose code helps to communicate what the author of the code intended to happen. [Relying] on some (not always well understood) language rules means it's less clear for the reader to understand what the writer intended. Both may have a different understanding of these rules.
> > > > But you are still relying on these rules through `common_type` AFAICT.
> > > > 
> > > The point is that, while we are still relying on the exact same semantic under the hood, we are making an effort to spell out what we expect to happen, making it explicit that we thought of this case and fully intend this outcome. We can also ensure that future readers might also understand this happens when they might not have thought of it, whether in understanding what our code is doing or in making their own implementation some time in the future.
> > > 
> > > I like @Mordante's most recent version with the `using` declarations.
> > I personally don't have an opinion here. Please come to an agreement here (or agree to disagree, and nevertheless agree with merging this code).
> > 
> > I am happy to update this patch in whichever way you agree on, but I want to avoid changing it forth and back (an earlier version of this review was actually written in terms of the difference type and changed based on feedback in this review)
> > > It helps since the rules of the type of the conditional expression are not simple. I had to verify with the standard it does the right thing. So instead of spending a few seconds to validate these 3 lines I had to spend several minutes.
> > 
> > I don't see how `common_type` helps with the type of the conditional not being simple. That's exactly what `common_type` uses to get the type in this case AFAICT (https://eel.is/c++draft/type.traits#meta.trans.other-3). Plus is has a heap of other conditionals that are hard to get through.
> 
> It's at least clear that it's intended to use `common_type` and not one of the other rules of the conditional expression. IMO auto should never be used with the conditional expression.
> 
> > 
> > > Code should be optimized for understanding by humans, `auto` quite often saves the writer from typing a few characters. (The compiler doesn't care either way it does its thing.)
> > I agree that code should be written to make it easier to read. IMO littering the code with types I don't really care about makes it harder to read. i.e. I don't really care that `a - b` returns a value of type `__iter_diff_t<_InputIterator1>`, but now I have to check that you actually named the correct type. Thinking about it, `integral auto __len1 = __last1 - __first1` would be great. Not sure how much compile time overhead that would incur though. WDYT?
> 
> I care about these types when I try to understand the code and validate whether the author wrote the code correctly. 
> It takes me as reader a lot longer to validate the code. Auto has its uses but it shouldn't be used everywhere just to make it easy for the writer. I still like the explicit type better either directly or by using a typedef.
> For the `operator-` I don't dislike this too strongly; but as said above for the conditional expression I do.
> 
> > > The verbose code helps to communicate what the author of the code intended to happen. [Relying] on some (not always well understood) language rules means it's less clear for the reader to understand what the writer intended. Both may have a different understanding of these rules.
> > But you are still relying on these rules through `common_type` AFAICT.
> 
> Yes but as said above it makes it clear that `common_type` is intended to be used. (I agree `common_type` has it's complexity too.)
> 
> 
FWIW, I don't have a strong preference here, but for me, one of the most important and informative aspects of `auto` is "guaranteed no conversion". This is as relevant for the author as it is for the reader. While it's true that `auto` can be misused by a writer to avoid thinking about which types are returned (IMO that's a bigger time "saver" than the literal typing which can often be autocompleted), I think it has legitimate uses where it makes the intention clearer.



================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way.pass.cpp:3
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
----------------
We have a few tests that check for a certain behavior across a wide range of algorithms; those have to be updated to include `lexicographical_compare_three_way` now:
- `test/libcxx/diagnostics/nodiscard_extensions.verify.cpp` (potentially);
- `test/libcxx/algorithms/robust_against_copying_comparators.pass.cpp`;
- `test/libcxx/algorithms/robust_against_cpp20_hostile_iterators.compile.pass.cpp`;
- `test/std/algorithms/robust_re_difference_type.compile.pass.cpp`;
- `test/std/algorithms/robust_against_adl.compile.pass.cpp`.

Please let me know if you need any help with those.


================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way.pass.cpp:29
+constexpr void test_lexicographical_compare(C1 a, C2 b, Order expected) {
+  std::same_as<Order> auto result =
+      std::lexicographical_compare_three_way(Iter1{a.begin()}, Iter1{a.end()}, Iter2{b.begin()}, Iter2{b.end()});
----------------
Nit: `s/auto/decltype(auto)/`. That way we always check the _exact_ returned type, without potentially stripping away references. While it's very unlikely that this algorithm could plausibly return a reference, I think the main advantage of using `decltype(auto)` is just not having to think about it and have it guaranteed to always catch the exact type.


================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.pass.cpp:31
+
+constexpr auto compare_last_digit_strong  = [](int a, int b) -> std::strong_ordering { return (a % 10) <=> (b % 10); };
+constexpr auto compare_last_digit_weak    = [](int a, int b) -> std::weak_ordering { return (a % 10) <=> (b % 10); };
----------------
Question: what is the purpose of comparing just the last digit instead of just the two given numbers? It makes the implementation slightly more complicated (and the variable names longer), but all the inputs are single-digit anyway.


================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.pass.cpp:34
+constexpr auto compare_last_digit_partial = [](int a, int b) -> std::partial_ordering {
+  if (a == std::numeric_limits<int>::min() || b == std::numeric_limits<int>::min())
+    return std::partial_ordering::unordered;
----------------
Is this branch ever taken?


================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.pass.cpp:57
+template <typename Iter1, typename Iter2, typename C1, typename C2, typename Order, typename Comparator>
+constexpr void test_lexicographical_compare(C1 a, C2 b, Order expected, Comparator comp) {
+  std::same_as<Order> auto result =
----------------
Nit: I'd put `expected` last so that all the algorithm inputs are next to each other.


================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.pass.cpp:67
+  test_lexicographical_compare<Iter1, Iter2>(
+      std::array<int, 0>{}, std::array<int, 0>{}, std::strong_ordering::equal, compare_last_digit_strong);
+  // Left input empty
----------------
Optional: create a local constant with a shorter name for the comparator to cut down on the boilerplate a little?


================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.pass.cpp:153
+  };
+  // If one of both ranges is empty, the comparator must not be called at all
+  compare_invocation_count = 0;
----------------
Nit: `s/both/the/`.


================
Comment at: libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.verify.cpp:40
+  // expected-error-re@*:* {{{{(static_assert|static assertion)}} failed{{.*}}Using a non-integral difference_type is undefined behavior}}}}
+  return std::lexicographical_compare_three_way(a, b, c, d, std::compare_three_way());
+}
----------------
Nit: please also check that passing `RandomAccessIteratorBadDifferenceType` iterators for the second range fails as well (i.e., calling `std::lexicographical_compare_three_way(c, d, a, b, std::compare_three_way())`).


================
Comment at: libcxx/test/support/almost_satisfies_types.h:425
+
+
 template <class Iter>
----------------
Ultranit: there's one extraneous blank line.


================
Comment at: libcxx/test/support/test_comparisons.h:263
+        rhs.value == std::numeric_limits<int>::min())
+      return std::partial_ordering::unordered;
+    return lhs.value <=> rhs.value;
----------------
Is this branch ever taken?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131395/new/

https://reviews.llvm.org/D131395



More information about the libcxx-commits mailing list