[libcxx-commits] [libcxx] [libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators (PR #132896)

Sat Jun 7 09:47:30 PDT 2025

================
@@ -76,6 +76,10 @@ Improvements and New Features
 - The ``bitset::to_string`` function has been optimized, resulting in a performance improvement of up to 8.3x for bitsets
   with uniformly distributed zeros and ones, and up to 13.5x and 16.1x for sparse and dense bitsets, respectively.
 
+- The ``std::ranges::for_each`` and ``std::ranges::for_each_n`` algorithms have been optimized for segmented iterators,
+  resulting in performance improvements of up to 21.3x for ``std::deque::iterator`` and 24.9x for ``join_view`` of
+  ``vector<vector<char>>``.
----------------
winner245 wrote:

I've rerun the benchmarks multiple times, and I got similar and consistent speedups for the ranges algorithms. It is a bit strange these numbers seem greater than those reported earlier for the classical `std` algorithms. Ideally, these numbers should match. I haven't identified a clear reason why this is not the case. My guess is that the numbers reported earlier for the  classical `std` algorithms were obtained from comparison between `std::for_each` with/without segmented iterator optimization, while the numbers in this patch compare the ranges algorithm `std::ranges::for_each` with/without optimization. The difference here is that the comparisons for `std::for_each` did not have the noise such as the `std::invoke` call and projection call, whereas the comparisons for the ranges algorithms do. This noise might account for the difference. This is the only difference I could possibly think of at this moment. 

To avoid confusion, I will not report these numbers in this patch. Instead, I will stick to the previously reported and smaller numbers (which suffice to show the performance improvements). 

https://github.com/llvm/llvm-project/pull/132896