[libcxx-commits] [libcxx] [libc++] Speed up vector<bool> copy/move-ctors [1/3] (PR #120132)

Tue Jan 21 08:45:09 PST 2025

================
@@ -674,25 +676,30 @@ vector<bool, _Allocator>::vector(initializer_list<value_type> __il, const alloca
 
 #endif // _LIBCPP_CXX03_LANG
 
+// This function copies each storage word as a whole, which is substantially more efficient than copying
+// individual bits within each word
+template <class _Allocator>
+_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void vector<bool, _Allocator>::__alloc_and_copy(const vector& __v) {
+  if (__v.__size_) {
+    __vallocate(__v.__size_);
+    std::copy(__v.__begin_, __v.__begin_ + __external_cap_to_internal(__v.__size_), __begin_);
+  }
+  __size_ = __v.__size_;
+}
+
 template <class _Allocator>
 _LIBCPP_CONSTEXPR_SINCE_CXX20 vector<bool, _Allocator>::vector(const vector& __v)
     : __begin_(nullptr),
       __size_(0),
       __cap_(0),
       __alloc_(__storage_traits::select_on_container_copy_construction(__v.__alloc_)) {
-  if (__v.size() > 0) {
-    __vallocate(__v.size());
-    __construct_at_end(__v.begin(), __v.end(), __v.size());
-  }
+  __alloc_and_copy(__v);
----------------
winner245 wrote:

@philnik777 If the comment raised by @ldionne is addressed, we will have:


https://github.com/winner245/llvm-project/blob/ead9ab143db8f923e16d04c21276ff1d1728f7f8/libcxx/include/__vector/vector_bool.h#L708-L716

```cpp
template <class _Allocator>
_LIBCPP_CONSTEXPR_SINCE_CXX20 vector<bool, _Allocator>::vector(const vector& __v, const allocator_type& __a)
    : __begin_(nullptr), __size_(0), __cap_(0), __alloc_(__a) {
  if (__v.size() > 0) {
    __vallocate(__v.size());
    std::copy(__v.__begin_, __v.__begin_ + __external_cap_to_internal(__v.size()), __begin_);
    __size_ = __v.size();
  }
}
```


versus what we had before:

https://github.com/llvm/llvm-project/blob/afced70e697e66fb6920b53d489d3fa4498e22dc/libcxx/include/__vector/vector_bool.h#L707-L714

I would invite you and @ldionne for a second thought. IMO, this patch will slightly increase the clarity and reduce future maintenance burden. 

1. The call to the private function `__construct_at_end` is now replaced by a call to the general-purpose function `std::copy`, along with an explicit size update. Using the well-known `std::copy`, it is clear that the new implementation copies the underlying data and updates the size, without needing to understand what the internal `__construct_at_end` does. Additionally, making the size update explicit enhances clarity (previously, I had to jump into both overloads of `__construct_at_end` and make sure they both update the size).

2. The new implementation does not require any special optimization. It calls a general-purpose copy function, which naturally yields optimal performance. In contrast, the old implementation relies on complex optimizations for `__bit_iterator` to achieve similar performance. My patch actually removes these unnecessary bitwise optimizations, reducing future maintenance burdens (we cannot neglect the maintenance burdens of these bitwise optimization algorithms given that we are still fixing existing bugs related to these optimizations).

What do you think?

https://github.com/llvm/llvm-project/pull/120132