[libcxx-commits] [libcxx] [libc++] Optimizations for uniform_int_distribution (PR #140161)

Sun Jun 15 18:07:07 PDT 2025

LRFLEW wrote:

Ok, I've went and addressed (most) of the review comments and rebased against the main branch. I removed the output test case, since that was only intended to be temporary anyways, but kept the benchmark on its own commit for a bit. The testing results you got has me wanting to try running the benchmark on some other devices to try and get a more complete picture of what the results of this change are.

> So overall, I'm seeing roughly 10% improvement, which is great. Do you have an hypothesis to explain why you were seeing up to 2.83x on some benchmarks and I am seeing more modest numbers?

The large improvements I saw were for the case where n = 1, with the main optimization there being the removal of two division calculations (to calculate the values for `n` and `w0`). I suspected maybe Apple put some sort of optimization in their silicon for trivial divisions (eg. `x / 1 = x` and `x / x = 1`), but looking into it, apparently [Apple Silicon can just perform integer division way faster than Intel](https://ridiculousfish.com/blog/posts/benchmarking-libdivide-m1-avx512.html). This result is a surprise to me, and means that Apple computers aren't going to see as much of an impact from this. I kind of want to see the benchmark results from another ARM64 CPU (to see if it's just Apple's CPUs), such as a Raspberry Pi, but I don't have one of those on hand to test it with.

https://github.com/llvm/llvm-project/pull/140161