[libcxx-commits] [libcxx] [libcxx] adds a size-based representation for `vector`'s unstable ABI (PR #155330)

Christopher Di Bella via libcxx-commits libcxx-commits at lists.llvm.org
Mon Mar 23 00:35:43 PDT 2026


================
@@ -0,0 +1,508 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___VECTOR_LAYOUT_H
+#define _LIBCPP___VECTOR_LAYOUT_H
+
+#include <__assert>
+#include <__config>
+#include <__debug_utils/sanitizers.h>
+#include <__memory/allocator_traits.h>
+#include <__memory/compressed_pair.h>
+#include <__memory/pointer_traits.h>
+#include <__memory/swap_allocator.h>
+#include <__memory/uninitialized_algorithms.h>
+#include <__split_buffer>
+#include <__type_traits/is_nothrow_constructible.h>
+#include <__utility/move.h>
+#include <__utility/swap.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+/// Defines `std::vector`'s storage layout and any operations that are affected by a change in the
+/// layout.
+///
+/// Dynamically-sized arrays like `std::vector` have several different representations. libc++
+/// supports two different layouts for `std::vector`:
+///
+///   * pointer-based layout
+///   * size-based layout
+//
+/// We describe these layouts below. All vector representations have a pointer that points to where
+/// the memory is allocated (called `__begin_`).
+///
+/// **Pointer-based layout**
+///
+/// The pointer-based layout uses two more pointers in addition to `__begin_`. The second pointer
+/// (called `__end_`) points past the end of the part of the buffer that holds valid elements.
+/// Another pointer (called `__capacity_`) points past the end of the allocated buffer. This is the
+/// default representation for libc++ due to historical reasons.
+///
+/// The `__end_` pointer has three primary use-cases:
+///   * to compute the size of the vector; and
+///   * to construct the past-the-end iterator; and
+///   * to indicate where the next element should be appended.
+///
+/// The `__capacity_` is used to compute the capacity of the vector, which lets the vector know how
+/// many elements can be added to the vector before a reallocation is necessary.
+///
+///    __begin_ = 0xE4FD0, __end_ = 0xE4FF0, __capacity_ = 0xE5000
+///                 0xE4FD0                             0xE4FF0           0xE5000
+///                    v                                   v                 v
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///    | ????????????? |   3174 |   5656 |    648 |    489 | ------ | ------ | ??????????????????? |
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///                    ^                                   ^                 ^
+///                __begin_                             __end_          __capacity_
+///
+///    Figure 1: A visual representation of a pointer-based `std::vector<short>`. This vector has
+///    four elements, with the capacity to store six. Boxes with numbers are valid elements within
+///    the vector, and boxes with `xx` have been allocated, but aren't being used as elements right
+///    now.
+///
+/// This is the default layout for libc++.
+///
+/// **Size-based layout**
+///
+/// The size-based layout uses integers to track its size and capacity, and computes pointers to
+/// past-the-end of the valid range and the whole buffer only when it's necessary. Programs using
+/// the size-based layout have been measured to yield improved compute and memory performance over
+/// the pointer-based layout. Despite these promising measurements, the size-based layout is opt-in,
+/// to preserve ABI compatibility with prebuilt binaries. Given the improved performance, we
+/// recommend preferring the size-based layout in the absence of such ABI constraints.
+///
+///    __begin_ = 0xE4FD0, __size_ = 4, __capacity_ = 6
+///                 0xE4FD0
+///                    v
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///    | ????????????? |   3174 |   5656 |    648 |    489 | ------ | ------ | ??????????????????? |
+///    +---------------+--------+--------+--------+--------+--------+--------+---------------------+
+///                    ^
+///                __begin_
+///
+///    Figure 2: A visual representation of this a size-based layout. Blank boxes are not a part
+///    of the vector's allocated buffer.
+//
+/// We conducted an extensive A/B test on production software to confirm that the size-based layout
+/// improves compute performance by 0.5%, and decreases system memory usage by up to 0.33%.
+///
+/// **Class design**
+///
+/// __vector_layout was designed with the following goals:
+///    1. to abstractly represent the buffer's boundaries; and
+///    2. to limit the number of `#ifdef` blocks that a reader needs to pass through; and
+///    3. given (1) and (2), to have no logically identical components in multiple `#ifdef` clauses.
+///
+/// To facilitate these goals, there is a single `__vector_layout` definition. Users must choose
+/// their vector's layout when libc++ is being configured, so there we don't need to manage multiple
+/// vector layout types (e.g. `__vector_size_layout`, `__vector_pointer_layout`, etc.). In doing so,
+/// we reduce a significant portion of duplicate code.
+template <class _Tp, class _Allocator>
+class __vector_layout {
+public:
+  using value_type _LIBCPP_NODEBUG     = _Tp;
+  using allocator_type _LIBCPP_NODEBUG = _Allocator;
+  using __alloc_traits _LIBCPP_NODEBUG = allocator_traits<allocator_type>;
+  using size_type _LIBCPP_NODEBUG      = typename __alloc_traits::size_type;
+  using pointer _LIBCPP_NODEBUG        = typename __alloc_traits::pointer;
+  using const_pointer _LIBCPP_NODEBUG  = typename __alloc_traits::const_pointer;
+#ifdef _LIBCPP_ABI_SIZE_BASED_VECTOR
+  using _SplitBuffer _LIBCPP_NODEBUG    = __split_buffer<_Tp, _Allocator, __split_buffer_size_layout>;
+  using __boundary_type _LIBCPP_NODEBUG = size_type;
+#else
+  using _SplitBuffer _LIBCPP_NODEBUG    = __split_buffer<_Tp, _Allocator, __split_buffer_pointer_layout>;
+  using __boundary_type _LIBCPP_NODEBUG = pointer;
+#endif
+
+  // Cannot be defaulted, since `_LIBCPP_COMPRESSED_PAIR` isn't an aggregate before C++14.
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __vector_layout()
+      _NOEXCEPT_(is_nothrow_default_constructible<allocator_type>::value)
+      : __capacity_(__zero_boundary_type()) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __vector_layout(allocator_type const& __a)
+      _NOEXCEPT_(is_nothrow_copy_constructible<allocator_type>::value)
+      : __capacity_(__zero_boundary_type()), __alloc_(__a) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __vector_layout(allocator_type&& __a)
+      _NOEXCEPT_(is_nothrow_move_constructible<allocator_type>::value)
+      : __capacity_(__zero_boundary_type()), __alloc_(std::move(__a)) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __vector_layout(__vector_layout&& __other)
+      _NOEXCEPT_(is_nothrow_move_constructible<allocator_type>::value)
+      : __begin_(std::move(__other.__begin_)),
+        __boundary_(std::move(__other.__boundary_)),
+        __capacity_(std::move(__other.__capacity_)),
+        __alloc_(std::move(__other.__alloc_)) {
+    __other.__begin_    = nullptr;
+    __other.__boundary_ = __zero_boundary_type();
+    __other.__capacity_ = __zero_boundary_type();
----------------
cjdb wrote:

That requires manually zeroing out the layout for every `std::move(__other.__layout_)`. Even if there are relatively few, I'd find it fairly surprising that the `__vector_layout` constructor didn't handle the zeroing out itself. That said, I'd be much more comfortable with this direction if #187953 lands. `std::__exchange(__other.__layout_, __vector_layout())` is the answer to my readability concern here.

It unfortunately looks like [`_LIBCPP_COMPRESSED_PAIR` blocks this for types that aren't trivially copyable](https://godbolt.org/z/hPraj36of), and I'm seeing a similar problem in #187953, so we'll need to see if that can be solved first.

> but you don't bother touching the allocator. This leaves __other in a funky state: it's "null" for most of its properties, but its allocator is in a moved-from state.

`__vector_layout` does its best to mirror what `vector` does in trunk. [`vector(vector&&)` doesn't do this today](https://github.com/llvm/llvm-project/blob/main/libcxx/include/__vector/vector.h#L964-L976), so I figured there was a good reason for not doing so.

https://github.com/llvm/llvm-project/pull/155330


More information about the libcxx-commits mailing list