[libcxx-commits] [libcxx] Optimize input iterator overload of `std::vector::assign(first, last)` (PR #113852)

Peng Liu via libcxx-commits libcxx-commits at lists.llvm.org
Fri Nov 15 11:32:49 PST 2024


================
@@ -48,6 +49,35 @@ void BM_Assignment(benchmark::State& st, Container) {
   }
 }
 
+template <class Container,
+          class GenInputs,
+          typename std::enable_if<std::is_trivial<typename Container::value_type>::value>::type* = nullptr>
+void BM_AssignInputIterIter(benchmark::State& st, Container c, GenInputs gen) {
+  auto in = gen(st.range(1));
+  c.resize(st.range(0));
+  benchmark::DoNotOptimize(&in);
+  benchmark::DoNotOptimize(&c);
+  for (auto _ : st) {
+    c.assign(cpp17_input_iterator(in.begin()), cpp17_input_iterator(in.end()));
+    benchmark::ClobberMemory();
+  }
+}
+
+template <class Container,
+          class GenInputs,
+          typename std::enable_if<!std::is_trivial<typename Container::value_type>::value>::type* = nullptr>
+void BM_AssignInputIterIter(benchmark::State& st, Container c, GenInputs gen) {
+  auto v = gen(1, 100);
+  c.resize(st.range(0), v[0]);
+  auto in = gen(st.range(1), 32);
+  benchmark::DoNotOptimize(&in);
+  benchmark::DoNotOptimize(&c);
+  for (auto _ : st) {
+    c.assign(cpp17_input_iterator(in.begin()), cpp17_input_iterator(in.end()));
+    benchmark::ClobberMemory();
+  }
+}
----------------
winner245 wrote:

The main difference between the tests lies in the dimensionality of the inputs. Tests for trivial element types deal with a 1-dimensional data structure (e.g., `std::vector<int>`), while tests for non-trivial element types deal with a 2-dimensional data (e.g., `std::vector<vector<int>>`, `std::vector<std::string>` where `string` can be seen as a vector of chars). Due to this difference, the initializations of the inputs in the tests are different.

Originally, I put a separate overload for each case for clarity. I agree with you that this may lead to some code duplication. Therefore, as you suggested, I have merged them into a single implementation.

I then re-ran all the benchmarks tests and obtained similar results. 

#### Before
| Benchmark                                               | Time      | CPU      | Iterations |
|---------------------------------------------------------|-----------|----------|------------|
| BM_AssignInputIterIter/vector_int/1024/1024             | 1157 ns   | 1169 ns  | 608188     |
| BM_AssignInputIterIter<32>/vector_string/1024/1024      | 14559 ns  | 14710 ns | 47277      |
| BM_AssignInputIterIter<32>/vector_vector_int/1024/1024  | 26846 ns  | 27129 ns | 25925      |


#### After
| Benchmark                                               | Time      | CPU      | Iterations |
|---------------------------------------------------------|-----------|----------|------------|
| BM_AssignInputIterIter/vector_int/1024/1024             | 561 ns    | 566 ns   | 1242251    |
| BM_AssignInputIterIter<32>/vector_string/1024/1024      | 5604 ns   | 5664 ns  | 128365     |
| BM_AssignInputIterIter<32>/vector_vector_int/1024/1024  | 7927 ns   | 8012 ns  | 88579      |


https://github.com/llvm/llvm-project/pull/113852


More information about the libcxx-commits mailing list