[libcxx-commits] [libcxx] [libc++] Optimizations for uniform_int_distribution (PR #140161)
via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Jun 23 18:53:48 PDT 2025
LRFLEW wrote:
After seeing the benchmark results from the Apple Silicon mac, I wanted to run the benchmark on a different ARM64 computer. My first choice would have been a Raspberry Pi (or similar), but I don't have one of those on hand. However, I did end up getting Linux running on my Nintendo Switch, which is also ARM64. This is the results I got from that:
```
Comparing build/baseline/libcxx/test/benchmarks/numeric/Output/rand.uni.int.bench.cpp.dir/benchmark-result.json to build/uidibe/libcxx/test/benchmarks/numeric/Output/rand.uni.int.bench.cpp.dir/benchmark-result.json
Benchmark Time CPU Time Old Time New CPU Old CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bm_uniform_int_distribution<std::minstd_rand0, 1ull << 20> -0.5315 -0.5319 31 15 31 14
bm_uniform_int_distribution<std::ranlux24_base, 1ull << 20> -0.3746 -0.3734 51 32 50 32
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 19) + 1ull> -0.2931 -0.2918 57 40 57 40
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 19) + 1ull> -0.1711 -0.1737 87 72 86 71
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 19) + (1ull << 18)> -0.3335 -0.3378 39 26 39 26
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 19) + (1ull << 18)> -0.3019 -0.3004 66 46 66 46
bm_uniform_int_distribution<std::minstd_rand0, 1ull << 40> -0.3005 -0.2991 39 28 39 27
bm_uniform_int_distribution<std::ranlux24_base, 1ull << 40> -0.3220 -0.3218 68 46 68 46
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 39) + 1ull> -0.1080 -0.1080 76 68 75 67
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 39) + 1ull> -0.1693 -0.1722 123 102 122 101
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 39) + (1ull << 38)> -0.0960 -0.0979 50 46 50 45
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 39) + (1ull << 38)> -0.2894 -0.2875 91 65 90 64
bm_uniform_int_distribution<std::minstd_rand0, 1ull << 41> -0.3109 -0.3096 40 28 40 28
bm_uniform_int_distribution<std::ranlux24_base, 1ull << 41> -0.3460 -0.3458 74 49 74 48
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 40) + 1ull> -0.0915 -0.0921 76 69 76 69
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 40) + 1ull> -0.2287 -0.2298 132 102 132 102
bm_uniform_int_distribution<std::minstd_rand0, (1ull << 40) + (1ull << 39)> -0.1591 -0.1589 51 43 51 43
bm_uniform_int_distribution<std::ranlux24_base, (1ull << 40) + (1ull << 39)> -0.3261 -0.3242 98 66 97 65
OVERALL_GEOMEAN -0.2729 -0.2730 0 0 0 0
```
It's not as *dramatic* of a difference as I saw on my 8700K, but it is certainly more significant than the results on Apple Silicon. With this, I'm confident in my assessment that the difference in results is due to the performance difference of integer division between the processors.
https://github.com/llvm/llvm-project/pull/140161
More information about the libcxx-commits
mailing list