[libcxx-commits] [libcxx] [libc++] Optimizations for uniform_int_distribution (PR #140161)

Mon Jun 23 18:53:48 PDT 2025

LRFLEW wrote:

After seeing the benchmark results from the Apple Silicon mac, I wanted to run the benchmark on a different ARM64 computer. My first choice would have been a Raspberry Pi (or similar), but I don't have one of those on hand. However, I did end up getting Linux running on my Nintendo Switch, which is also ARM64. This is the results I got from that:

```
Comparing build/baseline/libcxx/test/benchmarks/numeric/Output/rand.uni.int.bench.cpp.dir/benchmark-result.json to build/uidibe/libcxx/test/benchmarks/numeric/Output/rand.uni.int.bench.cpp.dir/benchmark-result.json
Benchmark                                                                                      Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bm_uniform_int_distribution<std::minstd_rand0, 1ull << 20>                                  -0.5315         -0.5319            31            15            31            14
bm_uniform_int_distribution<std::ranlux24_base, 1ull << 20>                                 -0.3746         -0.3734            51            32            50            32
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 19) + 1ull>                         -0.2931         -0.2918            57            40            57            40
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 19) + 1ull>                        -0.1711         -0.1737            87            72            86            71
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 19) + (1ull << 18)>                 -0.3335         -0.3378            39            26            39            26
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 19) + (1ull << 18)>                -0.3019         -0.3004            66            46            66            46
bm_uniform_int_distribution<std::minstd_rand0, 1ull << 40>                                  -0.3005         -0.2991            39            28            39            27
bm_uniform_int_distribution<std::ranlux24_base, 1ull << 40>                                 -0.3220         -0.3218            68            46            68            46
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 39) + 1ull>                         -0.1080         -0.1080            76            68            75            67
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 39) + 1ull>                        -0.1693         -0.1722           123           102           122           101
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 39) + (1ull << 38)>                 -0.0960         -0.0979            50            46            50            45
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 39) + (1ull << 38)>                -0.2894         -0.2875            91            65            90            64
bm_uniform_int_distribution<std::minstd_rand0, 1ull << 41>                                  -0.3109         -0.3096            40            28            40            28
bm_uniform_int_distribution<std::ranlux24_base, 1ull << 41>                                 -0.3460         -0.3458            74            49            74            48
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 40) + 1ull>                         -0.0915         -0.0921            76            69            76            69
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 40) + 1ull>                        -0.2287         -0.2298           132           102           132           102
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 40) + (1ull << 39)>                 -0.1591         -0.1589            51            43            51            43
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 40) + (1ull << 39)>                -0.3261         -0.3242            98            66            97            65
OVERALL_GEOMEAN                                                                             -0.2729         -0.2730             0             0             0             0
```

It's not as *dramatic* of a difference as I saw on my 8700K, but it is certainly more significant than the results on Apple Silicon. With this, I'm confident in my assessment that the difference in results is due to the performance difference of integer division between the processors.

https://github.com/llvm/llvm-project/pull/140161