[libcxx-commits] [libcxx] [libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators (PR #132896)

Fri Mar 28 07:00:15 PDT 2025

================

----------------
winner245 wrote:

The following is the new benchmark I got for `join_view` of `vector<vector<int>>`, where the benchmark size refers to the total number of `int` elements, and the segment size is 256 (So we have roughly `n / 256` segments). 

```
--------------------------------------------------------------------------------------------------
Benchmark                                                      Before           After    slow-down
--------------------------------------------------------------------------------------------------
std::for_each_n(join_view(vector<vector<int>>))/8             5.44 ns         6.31 ns         
std::for_each_n(join_view(vector<vector<int>>))/32            20.2 ns         20.1 ns            
std::for_each_n(join_view(vector<vector<int>>))/50            39.0 ns         32.1 ns     
std::for_each_n(join_view(vector<vector<int>>))/1024           618 ns          627 ns           1%
std::for_each_n(join_view(vector<vector<int>>))/8192          3769 ns         5016 ns          33%
std::for_each_n(join_view(vector<vector<int>>))/16384         7567 ns        10052 ns          32%
std::for_each_n(join_view(vector<vector<int>>))/32768        15071 ns        20122 ns          34%
std::for_each_n(join_view(vector<vector<int>>))/65536        30099 ns        40203 ns          34%
std::for_each_n(join_view(vector<vector<int>>))/131072       59801 ns        80324 ns          34%
std::for_each_n(join_view(vector<vector<int>>))/262144      120641 ns       174557 ns          45%
std::for_each_n(join_view(vector<vector<int>>))/524288      255354 ns       325490 ns          27%
std::for_each_n(join_view(vector<vector<int>>))/1048576     523049 ns       665009 ns          27%
```

It indicates a 33% slow-down due to an additional traversal of the whole sequence in `std::next(__first, __n)`.  WDYT? Should we simply drop support for `join_view` optimization? 

Another possibility I can try is that we can potentially speed up `std::next(__first, __n)` for segmented iterators. For a non-random access segmented ranges, if each segment is a random access range (e.g., `join_view` of `vector<vector<T>>`, `vectoir<deque<T>>`, `deque<vector<T>>`),  then we can advance the iterators in O(1) time within each segment, which effectively speeds up `std::next(__first, __n)` by `segment_size` times. I think this would need a different patch. 

https://github.com/llvm/llvm-project/pull/132896