[libcxx-commits] [libcxx] [libc++][ranges] optimize the performance of `ranges::starts_with` (PR #84570)
Xiaoyang Liu via libcxx-commits
libcxx-commits at lists.llvm.org
Fri Apr 26 14:33:01 PDT 2024
xiaoyang-sde wrote:
I wrote a benchmark that compares the performance of `std::ranges::equal` and `std::ranges::mismatch`. I ran it on 2 machines and I observed different results.
```cpp
#include <algorithm>
#include <benchmark/benchmark.h>
#include <vector>
#include "test_iterators.h"
static void bm_starts_with_contiguous_iter_with_equal_impl(benchmark::State& state) {
std::vector<int> a(state.range(), 1);
std::vector<int> p(state.range(), 1);
for (auto _ : state) {
benchmark::DoNotOptimize(a);
benchmark::DoNotOptimize(p);
auto begin1 = contiguous_iterator(a.data());
auto end1 = contiguous_iterator(a.data() + a.size());
auto begin2 = contiguous_iterator(p.data());
auto end2 = contiguous_iterator(p.data() + p.size());
benchmark::DoNotOptimize(std::ranges::equal(begin1, end1, begin2, end2));
}
}
BENCHMARK(bm_starts_with_contiguous_iter_with_equal_impl)->RangeMultiplier(16)->Range(16, 16 << 20);
static void bm_starts_with_contiguous_iter_with_mismatch_impl(benchmark::State& state) {
std::vector<int> a(state.range(), 1);
std::vector<int> p(state.range(), 1);
for (auto _ : state) {
benchmark::DoNotOptimize(a);
benchmark::DoNotOptimize(p);
auto begin1 = contiguous_iterator(a.data());
auto end1 = contiguous_iterator(a.data() + a.size());
auto begin2 = contiguous_iterator(p.data());
auto end2 = contiguous_iterator(p.data() + p.size());
benchmark::DoNotOptimize(std::ranges::mismatch(begin1, end1, begin2, end2).in2 == end2);
}
}
BENCHMARK(bm_starts_with_contiguous_iter_with_mismatch_impl)->RangeMultiplier(16)->Range(16, 16 << 20);
BENCHMARK_MAIN();
```
The performance is similar on MacBook Air (M1, arm64):
```console
2024-04-26T17:14:19-04:00
Running ./build/libcxx/benchmarks/ranges_starts_with.libcxx.out
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 2.00, 3.59, 3.41
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_equal_impl/16 2.89 ns 2.86 ns 244051251
bm_starts_with_contiguous_iter_with_equal_impl/256 27.1 ns 26.1 ns 26963107
bm_starts_with_contiguous_iter_with_equal_impl/4096 451 ns 443 ns 1578169
bm_starts_with_contiguous_iter_with_equal_impl/65536 6693 ns 6667 ns 104706
bm_starts_with_contiguous_iter_with_equal_impl/1048576 110306 ns 109800 ns 6162
bm_starts_with_contiguous_iter_with_equal_impl/16777216 3129511 ns 2780193 ns 311
bm_starts_with_contiguous_iter_with_mismatch_impl/16 3.08 ns 3.05 ns 225504566
bm_starts_with_contiguous_iter_with_mismatch_impl/256 26.9 ns 26.7 ns 25911339
bm_starts_with_contiguous_iter_with_mismatch_impl/4096 422 ns 420 ns 1678504
bm_starts_with_contiguous_iter_with_mismatch_impl/65536 6834 ns 6722 ns 105055
bm_starts_with_contiguous_iter_with_mismatch_impl/1048576 124471 ns 123355 ns 5691
bm_starts_with_contiguous_iter_with_mismatch_impl/16777216 2337331 ns 2326288 ns 288
```
However, the performance is different on Arch Linux with a 4th Gen Xeon processor (avx2, x86_64):
```console
2024-04-26T20:45:12+00:00
Running ./build/libcxx/benchmarks/ranges_starts_with.libcxx.out
Run on (4 X 2294.61 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 4096 KiB (x4)
Load Average: 0.00, 0.32, 0.66
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_equal_impl/16 7.68 ns 7.68 ns 93071510
bm_starts_with_contiguous_iter_with_equal_impl/256 31.4 ns 31.4 ns 25166061
bm_starts_with_contiguous_iter_with_equal_impl/4096 396 ns 396 ns 1528737
bm_starts_with_contiguous_iter_with_equal_impl/65536 10798 ns 10798 ns 59100
bm_starts_with_contiguous_iter_with_equal_impl/1048576 496691 ns 496671 ns 1499
bm_starts_with_contiguous_iter_with_equal_impl/16777216 13436051 ns 13435049 ns 50
bm_starts_with_contiguous_iter_with_mismatch_impl/16 10.7 ns 10.7 ns 68709479
bm_starts_with_contiguous_iter_with_mismatch_impl/256 59.0 ns 59.0 ns 10459829
bm_starts_with_contiguous_iter_with_mismatch_impl/4096 1069 ns 1069 ns 729445
bm_starts_with_contiguous_iter_with_mismatch_impl/65536 16881 ns 16880 ns 34519
bm_starts_with_contiguous_iter_with_mismatch_impl/1048576 583530 ns 583395 ns 1250
bm_starts_with_contiguous_iter_with_mismatch_impl/16777216 15792555 ns 15791353 ns 43
```
https://github.com/llvm/llvm-project/pull/84570
More information about the libcxx-commits
mailing list