[libcxx-commits] [libcxx] [libc++] Add benchmarks for copy algorithms (PR #127328)

Tue Feb 18 14:07:02 PST 2025

================
@@ -0,0 +1,87 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03, c++11, c++14, c++17
+
+#include <algorithm>
+#include <deque>
+#include <iterator>
+#include <list>
+#include <string>
+#include <vector>
+
+#include "benchmark/benchmark.h"
+#include "../../GenerateInput.h"
+#include "test_macros.h"
+
+template <class Container, class Operation>
+void bm_general(std::string operation_name, Operation copy) {
+  auto bench = [copy](auto& st) {
+    auto const size = st.range(0);
+    using ValueType = typename Container::value_type;
+    Container c;
+    std::generate_n(std::back_inserter(c), size, [] { return Generate<ValueType>::random(); });
+
+    std::vector<ValueType> out(size);
+
+    for ([[maybe_unused]] auto _ : st) {
+      auto result = copy(c.begin(), c.end(), out.begin());
+      benchmark::DoNotOptimize(result);
+      benchmark::DoNotOptimize(out);
+      benchmark::DoNotOptimize(c);
+      benchmark::ClobberMemory();
+    }
+  };
+  benchmark::RegisterBenchmark(operation_name, bench)->Range(8, 1 << 20);
+}
+
+template <bool Aligned, class Operation>
+static void bm_vector_bool(std::string operation_name, Operation copy) {
+  auto bench = [copy](auto& st) {
+    auto n = st.range();
+    std::vector<bool> in(n, true);
+    std::vector<bool> out(Aligned ? n : n + 8);
+    benchmark::DoNotOptimize(&in);
+    auto first = in.begin();
+    auto last  = in.end();
+    auto dst   = Aligned ? out.begin() : out.begin() + 4;
+    for ([[maybe_unused]] auto _ : st) {
+      auto result = copy(first, last, dst);
+      benchmark::DoNotOptimize(result);
+      benchmark::DoNotOptimize(out);
+      benchmark::ClobberMemory();
+    }
+  };
+  benchmark::RegisterBenchmark(operation_name, bench)->Range(64, 1 << 20);
+}
+
+int main(int argc, char** argv) {
+  auto std_copy    = [](auto first, auto last, auto out) { return std::copy(first, last, out); };
+  auto ranges_copy = [](auto first, auto last, auto out) { return std::ranges::copy(first, last, out); };
+
+  // std::copy
+  bm_general<std::vector<int>>("std::copy(vector<int>)", std_copy);
+  bm_general<std::deque<int>>("std::copy(deque<int>)", std_copy);
+  bm_general<std::list<int>>("std::copy(list<int>)", std_copy);
+  bm_vector_bool<true>("std::copy(vector<bool>) (aligned)", std_copy);
+  bm_vector_bool<false>("std::copy(vector<bool>) (unaligned)", std_copy);
+
+  // ranges::copy
+  bm_general<std::vector<int>>("ranges::copy(vector<int>)", ranges_copy);
+  bm_general<std::deque<int>>("ranges::copy(deque<int>)", ranges_copy);
+  bm_general<std::list<int>>("ranges::copy(list<int>)", ranges_copy);
+#if TEST_STD_VER >= 23 // vector<bool>::iterator is not an output_iterator before C++23
+  bm_vector_bool<true>("ranges::copy(vector<bool>) (aligned)", ranges_copy);
+  bm_vector_bool<false>("ranges::copy(vector<bool>) (unaligned)", ranges_copy);
+#endif
+
+  benchmark::Initialize(&argc, argv);
+  benchmark::RunSpecifiedBenchmarks();
+  benchmark::Shutdown();
----------------
ldionne wrote:

I spent some time trying to find better ways to write this and I found it rather challenging. I'm not saying it's impossible, but I'd like to understand more specifically what you find difficult with the current benchmarks.

I don't disagree there's some boilerplate here BTW, but at the same time there are things which look like boilerplate that aren't actually so, for example how we can't call `ranges::copy(vector<bool>)` before C++23 -- that's difficult to deal with uniformly.

There are some things I could try to centralize more like where we set the ranges of the benchmark (currently duplicated), but I can't centralize it more than per benchmark, because different algorithms need different ranges in some cases.

This part is definitely 100% boilerplate:

```
benchmark::Initialize(&argc, argv);
benchmark::RunSpecifiedBenchmarks();
benchmark::Shutdown();
```

However, I struggle to find a nice way to get rid of it except for using GoogleBenchmark's own `BENCHMARK(...)` macro and then `BENCHMARK_MAIN()`, which unfortunately has the downside of removing a lot of flexibility for programatically defining the benchmarks.

TLDR I'm very open to refactoring this (and my upcoming patches) to reduce boilerplate, but I'd like to understand if you have something more specific in mind. One constraint I'd like to put on the solution is that we want to avoid something as "terse" as the CartesianProduct approach, which has proven to be difficult to understand and not to deliver tons of value (since the benchmarks end up being so mechanically exhaustive that they don't highlight the important cases to consider, and they take forever to run). I'd like to be able to look at the benchmarks and easily understand what's being benchmarked and how, and I'm willing to sacrifice a bit of code duplication for that property. But I don't disagree we might be able to do better than my current approach.

https://github.com/llvm/llvm-project/pull/127328