[libcxx-commits] [PATCH] D125329: Replace modulus operations in std::seed_seq::generate with conditional checks.
Laramie Leavitt via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Tue May 10 16:01:43 PDT 2022
laramiel added a comment.
Ok, I managed to run the benchmarks with the before and after on both my x86 workstation and my mac M1 <https://reviews.llvm.org/M1> laptop. For the M1 <https://reviews.llvm.org/M1>, the new code is about the same, but the new code is dramatically faster on x86.
<Mac LAPTOP>
CPU Caches:
L1 Data 64 KiB (x10)
L1 Instruction 128 KiB (x10)
L2 Unified 4096 KiB (x5)
Load Average: 3.74, 4.25, 3.59
OLD:
----
Benchmark Time CPU Iterations
---------------------------------------------------------------------
BM_SeedSeq_Generate/1/1 16.9 ns 16.9 ns 39431288
BM_SeedSeq_Generate/8/1 50.1 ns 50.0 ns 13943668
BM_SeedSeq_Generate/16/1 88.4 ns 87.9 ns 7974663
BM_SeedSeq_Generate/1/8 56.1 ns 56.0 ns 12601941
BM_SeedSeq_Generate/8/8 60.7 ns 60.7 ns 11631385
BM_SeedSeq_Generate/16/8 97.0 ns 93.6 ns 7510568
BM_SeedSeq_Generate/1/64 564 ns 561 ns 1259672
BM_SeedSeq_Generate/8/64 549 ns 549 ns 1241113
BM_SeedSeq_Generate/16/64 541 ns 540 ns 1294139
BM_SeedSeq_Generate/1/256 2299 ns 2296 ns 304877
BM_SeedSeq_Generate/8/256 2298 ns 2297 ns 305722
BM_SeedSeq_Generate/16/256 2300 ns 2299 ns 305262
NEW:
----
Benchmark Time CPU Iterations
---------------------------------------------------------------------
BM_SeedSeq_Generate/1/1 17.0 ns 17.0 ns 38130515
BM_SeedSeq_Generate/8/1 49.5 ns 49.5 ns 14140414
BM_SeedSeq_Generate/16/1 86.6 ns 86.1 ns 8184933
BM_SeedSeq_Generate/1/8 47.3 ns 47.2 ns 14856161
BM_SeedSeq_Generate/8/8 50.7 ns 50.7 ns 13919268
BM_SeedSeq_Generate/16/8 79.8 ns 79.6 ns 8695436
BM_SeedSeq_Generate/1/64 520 ns 520 ns 1361550
BM_SeedSeq_Generate/8/64 519 ns 519 ns 1355355
BM_SeedSeq_Generate/16/64 525 ns 525 ns 1319261
BM_SeedSeq_Generate/1/256 2198 ns 2194 ns 317959
BM_SeedSeq_Generate/8/256 2196 ns 2194 ns 320623
BM_SeedSeq_Generate/16/256 2215 ns 2201 ns 317943
<x86>
Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
Run on (12 X 4000 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x6)
L1 Instruction 32 KiB (x6)
L2 Unified 256 KiB (x6)
L3 Unified 15360 KiB (x1)
Load Average: 0.34, 0.38, 0.36
OLD:
----
Benchmark Time CPU Iterations
---------------------------------------------------------------------
BM_SeedSeq_Generate/1/1 51.9 ns 51.9 ns 12867935
BM_SeedSeq_Generate/8/1 179 ns 179 ns 3915093
BM_SeedSeq_Generate/16/1 325 ns 325 ns 2158696
BM_SeedSeq_Generate/1/8 361 ns 361 ns 1960768
BM_SeedSeq_Generate/8/8 342 ns 342 ns 1982211
BM_SeedSeq_Generate/16/8 485 ns 484 ns 1449114
BM_SeedSeq_Generate/1/64 3010 ns 3008 ns 234187
BM_SeedSeq_Generate/8/64 2966 ns 2964 ns 237480
BM_SeedSeq_Generate/16/64 2910 ns 2909 ns 240558
BM_SeedSeq_Generate/1/256 12132 ns 12127 ns 58263
BM_SeedSeq_Generate/8/256 12051 ns 12046 ns 58368
BM_SeedSeq_Generate/16/256 12169 ns 12163 ns 58210
NEW:
----
Benchmark Time CPU Iterations
---------------------------------------------------------------------
BM_SeedSeq_Generate/1/1 26.1 ns 26.1 ns 25494346
BM_SeedSeq_Generate/8/1 45.5 ns 45.5 ns 15384561
BM_SeedSeq_Generate/16/1 78.0 ns 77.9 ns 9019345
BM_SeedSeq_Generate/1/8 65.0 ns 65.0 ns 10775171
BM_SeedSeq_Generate/8/8 69.4 ns 69.4 ns 10152010
BM_SeedSeq_Generate/16/8 100 ns 100.0 ns 7041379
BM_SeedSeq_Generate/1/64 488 ns 488 ns 1426010
BM_SeedSeq_Generate/8/64 488 ns 488 ns 1434429
BM_SeedSeq_Generate/16/64 489 ns 488 ns 1322369
BM_SeedSeq_Generate/1/256 1938 ns 1938 ns 362517
BM_SeedSeq_Generate/8/256 1934 ns 1934 ns 361980
BM_SeedSeq_Generate/16/256 1936 ns 1936 ns 361482
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D125329/new/
https://reviews.llvm.org/D125329
More information about the libcxx-commits
mailing list