[libcxx-commits] [PATCH] D150831: [libc++] Implement ranges::ends_with

Thu Sep 7 17:43:59 PDT 2023

var-const added inline comments.

================
Comment at: libcxx/benchmarks/algorithms/ranges_ends_with.bench.cpp:14
+
+static void bm_ends_with_random_iter(benchmark::State& state) {
+  std::list<int> a(state.range(), 1);
----------------
I tried out the benchmarks locally. After applying the fixes outlined above, I noticed a couple of interesting things:

1. Random access iterators end up being a little slower than bidirectional iterators, at least in my environment. This is because `std::__equal_impl` ends up being a little slower than `std::ranges::__mismatch::__fn::__go` for the same arguments. It looks like the combination of calling `advance` on `__first1` (even though in this case it's a no-op) with the fact that `equal` checks `__first1 == __last1` in the end inhibits some optimization. I'm reluctant to try to fix this, though, at least in this patch -- the difference is small and isn't really specific to `ends_with`; it might also be dependent on the compiler.

2. `equal` contains a powerful `memcmp` optimization for contiguous iterators but it doesn't get triggered currently. That's because `ends_with` wraps the predicate and the projections in a `reference_wrapper`, and the optimization only kicks in if (simplified):
```
__is_trivial_equality_predicate<_Pred, _Tp, _Up>::value && __is_identity<_Proj1>::value && __is_identity<_Proj2>::value
```
`__is_trivial_equality_predicate` recognizes `ranges::equal_to` as trivial, but not `reference_wrapper<ranges::equal_to>`, similarly for `__is_identity`.

There are a few potential solutions:
1. Modify these traits to recognize `reference_wrapper`.
2. Call an internal version of `ranges::equal` that takes functors by reference (needs to be written) or call `std::__equal` directly. It would also be necessary to duplicate the `__unwrap_iter` and `__unwrap_range` logic that's currently in `ranges::equal`.
3. Stop providing the guarantee that we never copy functors in algorithms.

I'm currently leaning towards option 1. @philnik Nikolas, what do you think?

================
Comment at: libcxx/benchmarks/algorithms/ranges_ends_with.bench.cpp:21
+    benchmark::DoNotOptimize(p);
+    auto begin1 = std::random_access_iterator<decltype(a.begin())>;
+    auto end1 = std::random_access_iterator<decltype(a.end())>;
----------------
Sorry, I miswrote the suggestion above, this should be
```
#include "test_iterators.h"

// ...

auto begin1 = random_access_iterator(a.begin());
// ...
```
`std::random_access_iterator` is a concept and as such it evaluates to a boolean. In other words, `begin1`, `end1`, `begin2` and `end2` end up being boolean variables, then we create zero- or one-element containers from those booleans (because they implicitly convert to integers, unfortunately), which means we ignore the size of `state.range()` generated by `benchmark` and always measure the same (meaningless) thing.

================
Comment at: libcxx/benchmarks/algorithms/ranges_ends_with.bench.cpp:25
+    auto end2 = std::random_access_iterator<decltype(p.end())>;
+    std::list<int> newa(begin1, end1);
+    std::list<int> newp(begin2, end2);
----------------
What we want to do is to test how the implementation of the algorithm treats different iterator categories, but at the same time we want to use the same underlying container type so that it's not a factor in our measurement (otherwise, if we compare e.g. a vector against a list, it's impossible to say how much of the performance difference is due to random-access iterator optimizations in `ends_with` and how much is due to the fact that `vector` is much more cache-friendly). Since we can make the iterator category more limited (e.g. turn a bidirectional iterator into a forward iterator) but not vice versa, that means we need to use a container with the most powerful iterator category -- so it would have to be a vector, an `array` or a built-in array.

If we use a container, wrap its iterators but then create a new container from those iterators, it negates the idea because the new container would have the same iterator category as before. E.g. if we create a list, wrap its (bidirectional) iterators into forward iterators, then create a new list from those forward iterators, the new list once again would have bidirectional iterators, completely negating our wrapping. In other words, you should just do
```
benchmark::DoNotOptimize(std::ranges::ends_with(begin1, end1, begin2, end2));
```

================
Comment at: libcxx/benchmarks/algorithms/ranges_ends_with.bench.cpp:39
+    benchmark::DoNotOptimize(p);
+    auto begin1 = std::bidirectional_iterator<decltype(a.begin())>;
+    auto end1 = std::bidirectional_iterator<decltype(a.end())>;
----------------
We would also want to test with `contiguous_iterator`s and `forward_iterator`s.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150831/new/

https://reviews.llvm.org/D150831