[libcxx-commits] [libcxx] [libc++][ranges] optimize the performance of `ranges::starts_with` (PR #84570)

Xiaoyang Liu via libcxx-commits libcxx-commits at lists.llvm.org
Mon Apr 8 01:01:03 PDT 2024


xiaoyang-sde wrote:

Here's the benchmark result of the unoptimized version of `ranges::starts_with` with vectorized `mismatch`:

```txt
2024-04-08T00:51:04-07:00
Running ./ranges_starts_with.libcxx.out
Run on (12 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1280 KiB (x6)
  L3 Unified 18432 KiB (x1)
Load Average: 1.27, 2.29, 1.34
-----------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_memcmp_optimization/16             3.09 ns         3.09 ns    212067553
bm_starts_with_contiguous_iter_with_memcmp_optimization/256            20.9 ns         20.9 ns     30603943
bm_starts_with_contiguous_iter_with_memcmp_optimization/4096            310 ns          310 ns      2176699
bm_starts_with_contiguous_iter_with_memcmp_optimization/65536          5394 ns         5394 ns       136959
bm_starts_with_contiguous_iter_with_memcmp_optimization/1048576      134816 ns       134817 ns         4916
bm_starts_with_contiguous_iter_with_memcmp_optimization/16777216    4792548 ns      4792472 ns          150
bm_starts_with_contiguous_iter/16                                      5.38 ns         5.38 ns    123790610
bm_starts_with_contiguous_iter/256                                     81.4 ns         81.4 ns      8541221
bm_starts_with_contiguous_iter/4096                                    1107 ns         1107 ns       615306
bm_starts_with_contiguous_iter/65536                                  16869 ns        16868 ns        41362
bm_starts_with_contiguous_iter/1048576                               293850 ns       293845 ns         2276
bm_starts_with_contiguous_iter/16777216                             6079325 ns      6079284 ns          108
bm_starts_with_random_access_iter/16                                   7.17 ns         7.04 ns    101903410
bm_starts_with_random_access_iter/256                                   100 ns          100 ns      6806306
bm_starts_with_random_access_iter/4096                                 1449 ns         1449 ns       482025
bm_starts_with_random_access_iter/65536                               23967 ns        23967 ns        27789
bm_starts_with_random_access_iter/1048576                            387731 ns       387727 ns         1752
bm_starts_with_random_access_iter/16777216                          6773716 ns      6773727 ns          100
bm_starts_with_bidirectional_iter/16                                   6.95 ns         6.95 ns     97150439
bm_starts_with_bidirectional_iter/256                                   103 ns          101 ns      6666863
bm_starts_with_bidirectional_iter/4096                                 1459 ns         1459 ns       477527
bm_starts_with_bidirectional_iter/65536                               23458 ns        23458 ns        29772
bm_starts_with_bidirectional_iter/1048576                            405195 ns       405189 ns         1822
bm_starts_with_bidirectional_iter/16777216                          6768824 ns      6768460 ns           99
bm_starts_with_forward_iter/16                                         6.86 ns         6.76 ns     99222099
bm_starts_with_forward_iter/256                                        99.3 ns         99.3 ns      6935465
bm_starts_with_forward_iter/4096                                       1455 ns         1455 ns       482635
bm_starts_with_forward_iter/65536                                     23077 ns        23077 ns        29802
bm_starts_with_forward_iter/1048576                                  379438 ns       379432 ns         1795
bm_starts_with_forward_iter/16777216                                6722769 ns      6722515 ns          104
```

Here's the benchmark result of the unoptimized version of `ranges::starts_with` with vectorized `mismatch`:

```txt
2024-04-08T00:58:01-07:00
Running ./ranges_starts_with.libcxx.out
Run on (12 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1280 KiB (x6)
  L3 Unified 18432 KiB (x1)
Load Average: 1.71, 2.34, 1.64
-----------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_memcmp_optimization/16             1.46 ns         1.46 ns    470246496
bm_starts_with_contiguous_iter_with_memcmp_optimization/256            12.4 ns         12.4 ns     58289034
bm_starts_with_contiguous_iter_with_memcmp_optimization/4096            172 ns          172 ns      4097284
bm_starts_with_contiguous_iter_with_memcmp_optimization/65536          4276 ns         4276 ns       161096
bm_starts_with_contiguous_iter_with_memcmp_optimization/1048576      136488 ns       136487 ns         6118
bm_starts_with_contiguous_iter_with_memcmp_optimization/16777216    3955840 ns      3955785 ns          178
bm_starts_with_contiguous_iter/16                                      6.22 ns         6.22 ns    102345915
bm_starts_with_contiguous_iter/256                                     98.3 ns         98.3 ns      6888855
bm_starts_with_contiguous_iter/4096                                    1473 ns         1444 ns       479038
bm_starts_with_contiguous_iter/65536                                  23314 ns        23314 ns        29179
bm_starts_with_contiguous_iter/1048576                               372080 ns       372078 ns         1873
bm_starts_with_contiguous_iter/16777216                             7160219 ns      7149335 ns          103
bm_starts_with_random_access_iter/16                                   6.32 ns         6.32 ns    108101430
bm_starts_with_random_access_iter/256                                   108 ns          106 ns      6451084
bm_starts_with_random_access_iter/4096                                 1525 ns         1525 ns       451699
bm_starts_with_random_access_iter/65536                               23463 ns        23463 ns        29878
bm_starts_with_random_access_iter/1048576                            389128 ns       389126 ns         1882
bm_starts_with_random_access_iter/16777216                          6599507 ns      6599458 ns           99
bm_starts_with_bidirectional_iter/16                                   6.54 ns         6.40 ns     92338661
bm_starts_with_bidirectional_iter/256                                   105 ns          105 ns      6695816
bm_starts_with_bidirectional_iter/4096                                 1519 ns         1519 ns       451521
bm_starts_with_bidirectional_iter/65536                               23141 ns        23141 ns        30112
bm_starts_with_bidirectional_iter/1048576                            398679 ns       398672 ns         1855
bm_starts_with_bidirectional_iter/16777216                          6574845 ns      6574846 ns          107
bm_starts_with_forward_iter/16                                         6.43 ns         6.43 ns    103394439
bm_starts_with_forward_iter/256                                         105 ns          105 ns      6459829
bm_starts_with_forward_iter/4096                                       1529 ns         1529 ns       452248
bm_starts_with_forward_iter/65536                                     23380 ns        23380 ns        30325
bm_starts_with_forward_iter/1048576                                  399525 ns       399523 ns         1744
bm_starts_with_forward_iter/16777216                                6918514 ns      6918404 ns          102
```

https://github.com/llvm/llvm-project/pull/84570


More information about the libcxx-commits mailing list