[libcxx-commits] [libcxx] Optimize __assign_with_sentinel in std::vector (PR #113852)

Tue Nov 12 08:03:56 PST 2024

================
@@ -48,6 +48,76 @@ void BM_Assignment(benchmark::State& st, Container) {
   }
 }
 
+// Wrap any Iterator into an input iterator
+template <typename Iterator>
+class InputIterator {
+  using iter_traits = std::iterator_traits<Iterator>;
+
+public:
+  using iterator_category = std::input_iterator_tag;
+  using value_type        = typename iter_traits::value_type;
+  using difference_type   = typename iter_traits::difference_type;
+  using pointer           = typename iter_traits::pointer;
+  using reference         = typename iter_traits::reference;
+
+  InputIterator(Iterator it) : current_(it) {}
+
+  reference operator*() { return *current_; }
+  InputIterator& operator++() {
+    ++current_;
+    return *this;
+  }
+  InputIterator operator++(int) {
+    InputIterator tmp = *this;
+    ++(*this);
+    return tmp;
+  }
+
+  friend bool operator==(const InputIterator& lhs, const InputIterator& rhs) { return lhs.current_ == rhs.current_; }
+  friend bool operator!=(const InputIterator& lhs, const InputIterator& rhs) { return !(lhs == rhs); }
+
+private:
+  Iterator current_;
+};
+
+template <typename Iterator>
+InputIterator<Iterator> make_input_iterator(Iterator it) {
+  return InputIterator<Iterator>(it);
+}
+
+template <class Container,
+          class GenInputs,
+          typename std::enable_if<std::is_trivial<typename Container::value_type>::value>::type* = nullptr>
+void BM_AssignInputIterIter(benchmark::State& st, Container c, GenInputs gen) {
+  auto in = gen(st.range(1));
+  benchmark::DoNotOptimize(&in);
+  for (auto _ : st) {
+    st.PauseTiming();
+    c.resize(st.range(0));
+    benchmark::DoNotOptimize(&c);
+    st.ResumeTiming();
----------------
winner245 wrote:

Sorry for getting back late. I have been running various tests to address your comments. I appreciate your feedback, which led to some new insights.

> PauseTiming and ResumeTiming shouldn't be used in a single-iteration benchmark whenever possible. This results in way higher times than anything the actual code you're benchmarking will produce.

I completely agree that we should avoid using `PauseTiming` and `ResumeTiming` in single-iteration benchmarks, especially for trivial operations. While I was aware of this issue, I intentionally ran the benchmark tests this way to ensure each measurement was independent and consistent by using a brand-new vector input for each iteration. I thought this approach was acceptable because the operation being measured, `assign()`, is significant and non-trivial compared to the overhead introduced by `PauseTiming` and `ResumeTiming`.

However, it turns out that I was only partially correct. As you suggested, I've re-run all the benchmark tests by moving both `PauseTiming` and `ResumeTiming`, as well as any other irrelevant operations, outside the loop, leaving only the single assignment operation inside the loop. The new tests for non-trivial types, such as `vector<string>`, still align with my previous tests, showing a 2.3x (15517/6729) performance improvement for `vector<string>`. Surprisingly, even for trivial types like `vector<int>`, the new tests demonstrated a similar 1.9x (1075/560) performance improvement. I believe this is due to the `clear()` operation before the assignment in the previous implementation, which introduced some unnecessary overhead. My earlier tests were inaccurate in this case, as the assignment for trivial types is not significant enough to ignore the overhead caused by `PauseTiming` and `ResumeTiming`. 

To provide a more comprehensive picture, I also performed additional tests for `vector<vector<int>>`, which demonstrated a more pronounced 3.7x (28387/7583) performance improvement. This clearly demonstrates that the `clear()` operation, which destroys all vectors before assignment, followed by re-constructing all the vectors in the previous implementation, led to significant overhead.

- Before
![before](https://github.com/user-attachments/assets/3d044031-2a84-4606-b93e-676e3aea59a0)

- After
![after](https://github.com/user-attachments/assets/cc6cd71b-df93-4845-8f65-4a0e604852c8)

> I'm not even sure why you added any, since the resize() will allocate once and then be essentially a no-op AFAICT. We should just do a reserve call outside the loop instead, assuming we want to only benchmark the happy path (which IMO is fine).

I understand your point about using `reserve()` instead of `resize()`. However, for my testing purpose, I need to use `resize()` rather than `reserve()` because I am performing copy-assignment, not copy-construction. So before the assignment, the vector I am assigning to is already initialized with all elements populated, which can be achieved by `resize()` rather than `reserve()`. 


https://github.com/llvm/llvm-project/pull/113852