[libcxx-commits] [libcxx] [libcxx] adds a size-based representation for `vector`'s unstable ABI (PR #155330)

Christopher Di Bella via libcxx-commits libcxx-commits at lists.llvm.org
Wed Feb 11 00:36:10 PST 2026


================
@@ -0,0 +1,443 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___VECTOR_LAYOUT_H
+#define _LIBCPP___VECTOR_LAYOUT_H
+
+#include <__assert>
+#include <__config>
+#include <__memory/allocator_traits.h>
+#include <__memory/compressed_pair.h>
+#include <__memory/swap_allocator.h>
+#include <__split_buffer>
+#include <__type_traits/is_nothrow_constructible.h>
+#include <__utility/move.h>
+#include <__utility/swap.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+/// Defines `std::vector`'s storage layout and any operations that are affected by a change in the
+/// layout.
+///
+/// Dynamically-sized arrays like `std::vector` have several different representations. libc++
+/// supports two different layouts for `std::vector`:
+///
+///   * pointer-based layout
+///   * size-based layout
+//
+/// We describe these layouts below. All vector representations have a pointer that points to where
+/// the memory is allocated (called `__begin_`).
+///
+/// **Pointer-based layout**
+///
+/// The pointer-based layout uses two more pointers in addition to `__begin_`. The second pointer
+/// (called `__end_`) past the end of the part of the buffer that holds valid elements. The pointer
+/// (called `__capacity_`) points past the end of the allocated buffer. This is the default
+/// representation for libc++ due to historical reasons.
+///
+/// The second pointer has three primary use-cases:
+///   * to compute the size of the vector; and
+///   * to construct the past-the-end iterator; and
+///   * to indicate where the next element should be appended.
+///
+/// The third pointer is used to compute the capacity of the vector, which lets the vector know how
+/// many elements can be added to the vector before a reallocation is necessary.
+///
+///    __begin_ = 0xE4FD0, __end_ = 0xE4FF0, __capacity_ = 0xE5000
+///                 0xE4FD0                             0xE4FF0           0xE5000
+///                    v                                   v                 v
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///    | ????????????? |   3174 |   5656 |    648 |    489 | ------ | ------ | ??????????????????? |
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///                    ^                                   ^                 ^
+///                __begin_                             __end_          __capacity_
+///
+///    Figure 1: A visual representation of a pointer-based `std::vector<short>`. This vector has
+///    four elements, with the capacity to store six.
+///
+/// This is the default layout for libc++.
+///
+/// **Size-based layout**
+///
+/// The size-based layout uses integers to track its size and capacity, and computes pointers to
+/// past-the-end of the valid range and the whole buffer only when it's necessary. This layout is
+/// opt-in, but yields a significant performance boost relative to the pointer-based layout (see
+/// below).
+///
+///    __begin_ = 0xE4FD0, __size_ = 4, __capacity_ = 6
+///                 0xE4FD0
+///                    v
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///    | ????????????? |   3174 |   5656 |    648 |    489 | ------ | ------ | ??????????????????? |
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///                    ^
+///                __begin_
+///
+///    Figure 2: A visual representation of this a pointer-based layout. Blank boxes are not a part
+///    of the vector's allocated buffer. Boxes with numbers are valid elements within the vector,
+///    and boxes with `xx` have been allocated, but aren't being used as elements right now.
+//
+/// We conducted an extensive A/B test on production software to confirm that the size-based layout
+/// improves compute performance by 0.5%, and decreases system memory usage by up to 0.33%.
+///
+/// **Class design**
+///
+/// __vector_layout was designed with the following goals:
+///    1. to abstractly represent the buffer's boundaries; and
+///    2. to limit the number of `#ifdef` blocks that a reader needs to pass through; and
+///    3. given (1) and (2), to have no logically identical components in multiple `#ifdef` clauses.
+///
+/// To facilitate these goals, there is a single `__vector_layout` definition. Users must choose
+/// their vector's layout when libc++ is being configured, so there we don't need to manage multiple
+/// vector layout types (e.g. `__vector_size_layout`, `__vector_pointer_layout`, etc.). In doing so,
+/// we reduce a significant portion of duplicate code.
+template <class _Tp, class _Allocator>
+class __vector_layout {
+public:
+  using value_type _LIBCPP_NODEBUG     = _Tp;
+  using allocator_type _LIBCPP_NODEBUG = _Allocator;
+  using __alloc_traits _LIBCPP_NODEBUG = allocator_traits<allocator_type>;
+  using size_type _LIBCPP_NODEBUG      = typename __alloc_traits::size_type;
+  using pointer _LIBCPP_NODEBUG        = typename __alloc_traits::pointer;
+  using const_pointer _LIBCPP_NODEBUG  = typename __alloc_traits::const_pointer;
+#ifdef _LIBCPP_ABI_SIZE_BASED_VECTOR
+  using _SplitBuffer _LIBCPP_NODEBUG    = __split_buffer<_Tp, _Allocator, __split_buffer_size_layout>;
+  using __boundary_type _LIBCPP_NODEBUG = size_type;
+#else
+  using _SplitBuffer _LIBCPP_NODEBUG    = __split_buffer<_Tp, _Allocator, __split_buffer_pointer_layout>;
+  using __boundary_type _LIBCPP_NODEBUG = pointer;
+#endif
+
+  // Cannot be defaulted, since `_LIBCPP_COMPRESSED_PAIR` isn't an aggregate before C++14.
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __vector_layout()
+      _NOEXCEPT_(is_nothrow_default_constructible<allocator_type>::value)
+      : __capacity_(__zero_boundary_type()) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __vector_layout(allocator_type const& __a)
+      _NOEXCEPT_(is_nothrow_copy_constructible<allocator_type>::value)
+      : __capacity_(__zero_boundary_type()), __alloc_(__a) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __vector_layout(allocator_type&& __a)
+      _NOEXCEPT_(is_nothrow_move_constructible<allocator_type>::value)
+      : __capacity_(__zero_boundary_type()), __alloc_(std::move(__a)) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __vector_layout(__vector_layout&& __other)
+      _NOEXCEPT_(is_nothrow_move_constructible<allocator_type>::value)
+      : __begin_(std::move(__other.__begin_)),
+        __boundary_(std::move(__other.__boundary_)),
+        __capacity_(std::move(__other.__capacity_)),
+        __alloc_(std::move(__other.__alloc_)) {
+    __other.__begin_    = nullptr;
+    __other.__boundary_ = __zero_boundary_type();
+    __other.__capacity_ = __zero_boundary_type();
+  }
+
+  /// Returns a reference to the stored allocator.
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI allocator_type& __alloc() _NOEXCEPT {
+    return __alloc_;
+  }
+
+  /// Returns a reference to the stored allocator.
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI allocator_type const&
+  __alloc() const _NOEXCEPT {
+    return __alloc_;
+  }
+
+  /// Returns a pointer to the beginning of the buffer.
+  ///
+  /// `__begin_ptr()` is not called `data()` because `vector::data()` returns `T*`, but `__begin_`
+  /// is allowed to be a fancy pointer.
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer __begin_ptr() _NOEXCEPT {
+    return __begin_;
+  }
+
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_pointer __begin_ptr() const _NOEXCEPT {
+    return __begin_;
+  }
+
+  /// Returns the value that the layout uses to determine the vector's size.
+  ///
+  /// `__boundary_representation()` should only be used when directly operating on the layout from
+  /// outside `__vector_layout`. Its result must be used with type deduction to avoid compile-time
+  /// failures.
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __boundary_type
+  __boundary_representation() const _NOEXCEPT {
+    return __boundary_;
+  }
+
+  /// Returns the value that the layout uses to determine the vector's capacity.
+  ///
+  /// `__capacity_representation()` should only be used when directly operating on the layout from
+  /// outside `__vector_layout`. Its result must be used with type deduction to avoid compile-time
+  /// failures.
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __boundary_type
+  __capacity_representation() const _NOEXCEPT {
+    return __capacity_;
+  }
+
+  /// Returns how many elements can be added before a reallocation occurs.
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type
+  __remaining_capacity() const _NOEXCEPT {
+    return __capacity_ - __boundary_;
+  }
+
+  /// Determines if a reallocation is necessary.
+  [[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI bool __is_full() const _NOEXCEPT {
+    return __boundary_ == __capacity_;
+  }
----------------
cjdb wrote:

The key advantage of `__is_full()` is improved readability. To me, `__is_full()` is the capacity-equivalent of `empty()`. We prefer `empty()` to `size() == 0`, so I applied the same philosophy here.

https://github.com/llvm/llvm-project/pull/155330


More information about the libcxx-commits mailing list