[libcxx-commits] [libcxx] [libc++] Add some _LIBCPP_ASSUMEs for bounded iterators (PR #109033)

Louis Dionne via libcxx-commits libcxx-commits at lists.llvm.org
Thu Mar 27 09:03:08 PDT 2025


https://github.com/ldionne updated https://github.com/llvm/llvm-project/pull/109033

>From 3666789b571b21432901778f10b5a8ceed069cec Mon Sep 17 00:00:00 2001
From: David Benjamin <davidben at google.com>
Date: Tue, 17 Sep 2024 15:18:45 -0400
Subject: [PATCH 1/6] [libc++] Add some _LIBCPP_ASSUMEs for bounded iterators

Playing around, this seems to address #101370 for `std::vector<char>`,
but not `std::vector<int>`. `std::vector<int>` I believe also needs a
solution to #101372, which is an alignment issue.

The root problem is that vector uses end_cap instead of end as the
hardening fencepost. But user code (be it an actual `iter != vec.end()`
check, or one synthesized by the language in a range-for loop) uses the
container end as the fencepost.

We would like the user fencepost to delete the hardening fencepost. For
that to happen, the compiler must know that if you take your iterator
and then steadily `++iter`, stopping at `iter == end`, you won't hit
`iter == end_cap` along the way. To fgire this out, the compiler needs
to know a few things:

1. `iter <= end <= end_cap` at the start

2. `iter`, `end`, and `end_cap` are all compatibly aligned, such that
   `++iter` cannot skip over `end` and then get to `end_cap`.

The first of these is not obvious in `std::vector` for because
`std::vector` stores three pointers, rather than one pointer and then
sizes. That means the compiler never sees `end` (or `end_cap`) computed
as `begin + size` (or `begin + capacity`). Without type invariants, the
compiler does not know that the three pointers have any relation at all.

This PR addresses it by putting assumes in `__bounded_iter` itself. We
could also place it in `std::vector::__make_iter`, but this invariant is
important enough for reasoning about bounds that it seemed worth
establishing it across the board. (Note this means we trust container
implementations to use the bounded iterators correctly, which we already
do. We're interested in catching bugs in user code, not the STL itself.)

That alone is actually enough to handle this because constructing
`vector::end()` is enough to tell the compiler that `begin <= end`, and
loops usually start at `begin`. But since `__make_iter` is sometimes
called on non-endpoint iterators, I added one extra invariant to
`__make_iter`.

The second issue is #101372. This PR does not address it but will
(hopefully) take advantage of it once available.

In working on this, I noticed that _LIBCPP_ASSUME silences -Wassume.
Without that warning, I ended up spending a lot of time debugging
silently no-op assumes. This seems to be a remnant of when
_LIBCPP_ASSUME was part of _LIBCPP_ASSERT. Now that it's standalone, I
think we shouldn't disable the warning by default. If we ever need to
silence the warning, let's do it explicitly.
---
 libcxx/include/__assert                  |  4 +---
 libcxx/include/__iterator/bounded_iter.h | 12 ++++++++++++
 libcxx/include/__vector/vector.h         | 10 +++++++++-
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/libcxx/include/__assert b/libcxx/include/__assert
index 90eaa6023587b..5a5cad472425c 100644
--- a/libcxx/include/__assert
+++ b/libcxx/include/__assert
@@ -27,9 +27,7 @@
 // optimization intent. See https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609 for a
 // discussion.
 #if __has_builtin(__builtin_assume)
-#  define _LIBCPP_ASSUME(expression)                                                                                   \
-    (_LIBCPP_DIAGNOSTIC_PUSH _LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Wassume")                                              \
-         __builtin_assume(static_cast<bool>(expression)) _LIBCPP_DIAGNOSTIC_POP)
+#  define _LIBCPP_ASSUME(expression) __builtin_assume(static_cast<bool>(expression))
 #else
 #  define _LIBCPP_ASSUME(expression) ((void)0)
 #endif
diff --git a/libcxx/include/__iterator/bounded_iter.h b/libcxx/include/__iterator/bounded_iter.h
index d12750d1f81ac..d05abce76fe50 100644
--- a/libcxx/include/__iterator/bounded_iter.h
+++ b/libcxx/include/__iterator/bounded_iter.h
@@ -101,10 +101,22 @@ struct __bounded_iter {
   _LIBCPP_HIDE_FROM_ABI
   _LIBCPP_CONSTEXPR_SINCE_CXX14 explicit __bounded_iter(_Iterator __current, _Iterator __begin, _Iterator __end)
       : __current_(__current), __begin_(__begin), __end_(__end) {
+    // These are internal checks rather than hardening checks because the STL container is expected to ensure they are
+    // in order.
     _LIBCPP_ASSERT_INTERNAL(
         __begin <= __current, "__bounded_iter(current, begin, end): current and begin are inconsistent");
     _LIBCPP_ASSERT_INTERNAL(
         __current <= __end, "__bounded_iter(current, begin, end): current and end are inconsistent");
+
+    // However, this order is important to help the compiler reason about bounds checks. For example, `std::vector` sets
+    // `__end_ptr` to the capacity, not the true container end. To translate container-end fenceposts into hardening-end
+    // fenceposts, we must know that container-end <= hardening-end. `std::__to_address` is needed because `_Iterator`
+    // may be wrapped type, such that `operator<=` has side effects.
+    pointer __begin_ptr   = std::__to_address(__begin);
+    pointer __current_ptr = std::__to_address(__current);
+    pointer __end_ptr     = std::__to_address(__end);
+    _LIBCPP_ASSUME(__begin_ptr <= __current_ptr);
+    _LIBCPP_ASSUME(__current_ptr <= __end_ptr);
   }
 
   template <class _It>
diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index ae3ea1de61de0..503d231324a71 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -611,6 +611,10 @@ class _LIBCPP_TEMPLATE_VIS vector {
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI iterator __make_iter(pointer __p) _NOEXCEPT {
 #ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR
+    // `__bounded_iter` will tell the compiler that `__p` is bounded by `__begin_` and `__end_cap`, but nothing a priori
+    // relates `__p` to `__end_`.
+    _LIBCPP_ASSUME(__p <= this->__end_);
+
     // Bound the iterator according to the capacity, rather than the size.
     //
     // Vector guarantees that iterators stay valid as long as no reallocation occurs even if new elements are inserted
@@ -631,7 +635,11 @@ class _LIBCPP_TEMPLATE_VIS vector {
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_iterator __make_iter(const_pointer __p) const _NOEXCEPT {
 #ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR
-    // Bound the iterator according to the capacity, rather than the size.
+    // `__bounded_iter` will tell the compiler that `__p` is bounded by `__begin_` and `__end_cap`, but nothing a priori
+    // relates `__p` to `__end_`.
+    _LIBCPP_ASSUME(__p <= this->__end_);
+
+    // Bound the iterator according to the capacity, rather than the size. See above.
     return std::__make_bounded_iter(
         std::__wrap_iter<const_pointer>(__p),
         std::__wrap_iter<const_pointer>(this->__begin_),

>From bc319a50cedd6c531f6354c49a58ef1461788cb5 Mon Sep 17 00:00:00 2001
From: David Benjamin <davidben at google.com>
Date: Wed, 25 Sep 2024 14:01:09 -0400
Subject: [PATCH 2/6] Silence GCC warnings

---
 libcxx/include/__iterator/bounded_iter.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/libcxx/include/__iterator/bounded_iter.h b/libcxx/include/__iterator/bounded_iter.h
index d05abce76fe50..642307ce9f2f7 100644
--- a/libcxx/include/__iterator/bounded_iter.h
+++ b/libcxx/include/__iterator/bounded_iter.h
@@ -117,6 +117,10 @@ struct __bounded_iter {
     pointer __end_ptr     = std::__to_address(__end);
     _LIBCPP_ASSUME(__begin_ptr <= __current_ptr);
     _LIBCPP_ASSUME(__current_ptr <= __end_ptr);
+    // Silence warnings when assumptions are disabled.
+    (void)__begin_ptr;
+    (void)__current_ptr;
+    (void)__end_ptr;
   }
 
   template <class _It>

>From d85cfc922a92676a78511ccee1deb037905ba250 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 27 Mar 2025 11:42:10 -0400
Subject: [PATCH 3/6] Add benchmark for iterating over the whole sequence

---
 .../sequence/sequence_container_benchmarks.h  | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/libcxx/test/benchmarks/containers/sequence/sequence_container_benchmarks.h b/libcxx/test/benchmarks/containers/sequence/sequence_container_benchmarks.h
index dcd251d6997dd..b91b4ef6c9300 100644
--- a/libcxx/test/benchmarks/containers/sequence/sequence_container_benchmarks.h
+++ b/libcxx/test/benchmarks/containers/sequence/sequence_container_benchmarks.h
@@ -448,6 +448,27 @@ void sequence_container_benchmarks(std::string container) {
         }
       });
   }
+
+  /////////////////////////
+  // General usage patterns
+  /////////////////////////
+  bench("iterate-whole-container", [cheap](auto& st) {
+    auto const size = st.range(0);
+    std::vector<ValueType> in;
+    std::generate_n(std::back_inserter(in), size, cheap);
+    DoNotOptimizeData(in);
+
+    Container c(in.begin(), in.end());
+    DoNotOptimizeData(c);
+
+    auto use = [](auto& element) { benchmark::DoNotOptimize(element); };
+
+    for ([[maybe_unused]] auto _ : st) {
+      for (auto it = c.begin(); it != c.end(); ++it) {
+        use(*it);
+      }
+    }
+  });
 }
 
 } // namespace support

>From a7d61bc4838748b92d8fc3f7fc3c29adf2a4923b Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 27 Mar 2025 11:58:03 -0400
Subject: [PATCH 4/6] Add internal assertions

---
 libcxx/include/__vector/vector.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index b91be2a291c76..aeb562a5dfa2f 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -655,6 +655,7 @@ class _LIBCPP_TEMPLATE_VIS vector {
 #ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR
     // `__bounded_iter` will tell the compiler that `__p` is bounded by `__begin_` and `__end_cap`, but nothing a priori
     // relates `__p` to `__end_`.
+    _LIBCPP_ASSERT_INTERNAL(__p <= this->__end_, "vector::__make_iter passed an invalid pointer");
     _LIBCPP_ASSUME(__p <= this->__end_);
 
     // Bound the iterator according to the capacity, rather than the size.
@@ -679,6 +680,7 @@ class _LIBCPP_TEMPLATE_VIS vector {
 #ifdef _LIBCPP_ABI_BOUNDED_ITERATORS_IN_VECTOR
     // `__bounded_iter` will tell the compiler that `__p` is bounded by `__begin_` and `__end_cap`, but nothing a priori
     // relates `__p` to `__end_`.
+    _LIBCPP_ASSERT_INTERNAL(__p <= this->__end_, "vector::__make_iter passed an invalid pointer");
     _LIBCPP_ASSUME(__p <= this->__end_);
 
     // Bound the iterator according to the capacity, rather than the size. See above.

>From 798c008c720d8e5c1e014c1a42c57cd0f325b2fc Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 27 Mar 2025 11:59:40 -0400
Subject: [PATCH 5/6] Update libcxx/include/__iterator/bounded_iter.h

---
 libcxx/include/__iterator/bounded_iter.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libcxx/include/__iterator/bounded_iter.h b/libcxx/include/__iterator/bounded_iter.h
index 642307ce9f2f7..299eff000459b 100644
--- a/libcxx/include/__iterator/bounded_iter.h
+++ b/libcxx/include/__iterator/bounded_iter.h
@@ -101,8 +101,6 @@ struct __bounded_iter {
   _LIBCPP_HIDE_FROM_ABI
   _LIBCPP_CONSTEXPR_SINCE_CXX14 explicit __bounded_iter(_Iterator __current, _Iterator __begin, _Iterator __end)
       : __current_(__current), __begin_(__begin), __end_(__end) {
-    // These are internal checks rather than hardening checks because the STL container is expected to ensure they are
-    // in order.
     _LIBCPP_ASSERT_INTERNAL(
         __begin <= __current, "__bounded_iter(current, begin, end): current and begin are inconsistent");
     _LIBCPP_ASSERT_INTERNAL(

>From 43093d710dd21b2fd7b1a5c94dde900ca57513f4 Mon Sep 17 00:00:00 2001
From: Louis Dionne <ldionne.2 at gmail.com>
Date: Thu, 27 Mar 2025 12:02:54 -0400
Subject: [PATCH 6/6] Reword/remove comments

---
 libcxx/include/__iterator/bounded_iter.h | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/libcxx/include/__iterator/bounded_iter.h b/libcxx/include/__iterator/bounded_iter.h
index 299eff000459b..27793fca6490c 100644
--- a/libcxx/include/__iterator/bounded_iter.h
+++ b/libcxx/include/__iterator/bounded_iter.h
@@ -106,16 +106,14 @@ struct __bounded_iter {
     _LIBCPP_ASSERT_INTERNAL(
         __current <= __end, "__bounded_iter(current, begin, end): current and end are inconsistent");
 
-    // However, this order is important to help the compiler reason about bounds checks. For example, `std::vector` sets
-    // `__end_ptr` to the capacity, not the true container end. To translate container-end fenceposts into hardening-end
-    // fenceposts, we must know that container-end <= hardening-end. `std::__to_address` is needed because `_Iterator`
-    // may be wrapped type, such that `operator<=` has side effects.
+    // Add assumptions to help the compiler reason about bounds checks. For example, std::vector sets the end of
+    // the __bounded_iter's valid range to the capacity of the vector, not vector::end(). To translate container-end
+    // fenceposts into hardening-end fenceposts, we must know that container-end <= hardening-end.
     pointer __begin_ptr   = std::__to_address(__begin);
     pointer __current_ptr = std::__to_address(__current);
     pointer __end_ptr     = std::__to_address(__end);
     _LIBCPP_ASSUME(__begin_ptr <= __current_ptr);
     _LIBCPP_ASSUME(__current_ptr <= __end_ptr);
-    // Silence warnings when assumptions are disabled.
     (void)__begin_ptr;
     (void)__current_ptr;
     (void)__end_ptr;



More information about the libcxx-commits mailing list