[libcxx-commits] [libcxx] [libc++] Tiny optimizations for is_permutation (PR #129565)

Louis Dionne via libcxx-commits libcxx-commits at lists.llvm.org
Tue Mar 25 14:03:59 PDT 2025


https://github.com/ldionne commented:

First, make sure to benchmark on the latest `main` since I recently fixed two issues where we wouldn't vectorize properly inside `mismatch`. I pulled your branch and rebased it onto `main` just now, and the algorithms I get that do worse are the following (I dropped all the lines where your patch was an improvement):

```
Comparing build/default/libcxx/test/benchmarks/algorithms/nonmodifying/Output/is_permutation.bench.cpp.dir/benchmark-result.json to build/candidate/libcxx/test/benchmarks/algorithms/nonmodifying/Output/is_permutation.bench.cpp.dir/benchmark-result.json
Benchmark                                                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
std::is_permutation(list<int>) (3leg) (common prefix)/8                           +0.0801         +0.0824             5             5             5             5
std::is_permutation(list<int>) (3leg) (common prefix)/1024                        +0.4706         +0.4731          1088          1599          1086          1599
std::is_permutation(list<int>) (3leg) (common prefix)/8192                        +0.1887         +0.1899         11456         13618         11445         13618
std::is_permutation(list<int>) (3leg, pred) (common prefix)/1024                  +0.0336         +0.0338          1137          1176          1137          1175
rng::is_permutation(list<int>) (4leg, pred) (common prefix)/8192                  +0.1133         +0.1139         12519         13938         12512         13937
std::is_permutation(list<int>) (3leg) (shuffled)/8                                +0.0699         +0.0701            61            65            61            65
std::is_permutation(list<int>) (3leg, pred) (shuffled)/1024                       +0.0281         +0.0289       2234288       2297102       2232489       2296954
std::is_permutation(list<int>) (4leg) (shuffled)/8                                +0.0703         +0.0723            61            65            61            65
rng::is_permutation(list<int>) (4leg) (shuffled)/8                                +0.0819         +0.0818            60            65            60            65
std::is_permutation(list<int>) (4leg, pred) (shuffled)/8                          +0.2659         +0.2659            76            96            76            96
rng::is_permutation(list<int>) (4leg, pred) (shuffled)/8                          +0.2426         +0.2472            77            96            77            96
std::is_permutation(deque<int>) (4leg) (common prefix)/8                          +0.2592         +0.2600            12            15            12            15
std::is_permutation(deque<int>) (4leg) (common prefix)/1024                       +0.5906         +0.5919           818          1301           817          1301
std::is_permutation(deque<int>) (4leg) (common prefix)/8192                       +0.5919         +0.5925          6456         10277          6453         10276
rng::is_permutation(deque<int>) (4leg) (common prefix)/8                          +0.4652         +0.4659            10            15            10            15
rng::is_permutation(deque<int>) (4leg) (common prefix)/1024                       +0.5856         +0.5861           812          1288           812          1288
rng::is_permutation(deque<int>) (4leg) (common prefix)/8192                       +0.5919         +0.5921          6443         10256          6441         10255
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/8                    +0.3581         +0.3591            11            16            11            16
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/1024                 +0.4977         +0.4986           862          1291           861          1291
std::is_permutation(deque<int>) (4leg, pred) (common prefix)/8192                 +0.5084         +0.5102          6826         10296          6817         10295
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/8                    +0.3754         +0.3763            11            16            11            16
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/1024                 +0.5137         +0.5142           852          1290           852          1290
rng::is_permutation(deque<int>) (4leg, pred) (common prefix)/8192                 +0.5136         +0.5144          6771         10248          6767         10248
std::is_permutation(deque<int>) (3leg) (shuffled)/8                               +0.0634         +0.0639            73            78            73            78
std::is_permutation(deque<int>) (3leg, pred) (shuffled)/8                         +0.0260         +0.0275            81            83            81            83
rng::is_permutation(deque<int>) (4leg, pred) (shuffled)/8                         +0.3511         +0.3530            81           109            81           109
std::is_permutation(vector<int>) (3leg, pred) (common prefix)/8                   +0.0183         +0.0172             4             4             4             4
std::is_permutation(vector<int>) (3leg) (shuffled)/8                              +0.1508         +0.1512            49            56            49            56
std::is_permutation(vector<int>) (3leg, pred) (shuffled)/8                        +0.0585         +0.0604            62            65            62            65
std::is_permutation(vector<int>) (4leg) (shuffled)/8                              +0.0977         +0.0989            49            54            49            54
rng::is_permutation(vector<int>) (4leg) (shuffled)/8                              +0.1512         +0.1525            49            56            49            56
std::is_permutation(vector<int>) (4leg, pred) (shuffled)/8                        +0.0705         +0.0721            61            66            61            66
```

- First, we can observe that `vector<int>` is only doing worse on very small sequences. That's actually a particularity of this benchmark, it operates on pretty small sequences since `is_permutation` is so expensive. I think we can mostly disregard the slowdown for `vector<int>` since it only affects 8 element sequences. I suspect that making our vectorized `mismatch` faster on small sequences would solve the problem here.
- Second, we can see that we're doing worse on several benchmarks that check the `common prefix` pattern. But with that data pattern, the algorithm should be dominated by `mismatch`. So I think we need to understand why our current `std::mismatch` behaves worse on `std::deque` than the hand-written loop that existed in `std::is_permutation` before your patch. I think you could also validate that switching from the hand-written loop to `std::mismatch` is the cause of the slowdown by locally reverting just that part of the change and seeing if the before/after benchmarks are better for `std::deque` on `common prefix`. BTW you can locally edit the benchmark to only run a subset of all the combinations in order to iterate more quickly.
- Last, we are also doing worse on `list` with the common prefix pattern, I suspect we might be hitting the same issue as `deque`.

So TLDR, I'd focus on confirming that `std::mismatch` is slower on `deque` and `list` than a naive hand-written loop, and go from there.

https://github.com/llvm/llvm-project/pull/129565


More information about the libcxx-commits mailing list