[libc-commits] [libc] [llvm] [libc][CndVar] reimplmement conditional variable with FIFO ordering (PR #192748)
Schrodinger ZHU Yifan via libc-commits
libc-commits at lists.llvm.org
Sun Apr 19 17:04:52 PDT 2026
SchrodingerZhu wrote:
# Benchmark
Generated by `scripts/generate_data.py`.
Command used:
```sh
cargo bench --bench compare
```
All numbers below are Criterion `mean.point_estimate` values from
`target/criterion/*/*/*/new/estimates.json`.
Unit: nanoseconds.
## Benchmark Set
- `turn_ring`: microbenchmark-style strict turn handoff around a ring of threads. This stresses notify/wait latency and wakeup propagation under heavy contention.
- `bounded_queue`: producer/consumer queue with a fixed capacity. This is the main realistic condvar workload.
- `hash_map_queue`: the bounded queue plus a large `HashMap` touched inside the critical section on both push and pop.
- `btree_map_queue`: the bounded queue plus a large `BTreeMap` touched inside the critical section on both push and pop.
- `signal_stress`: producer/consumer stress test using `notify_one`, with half the threads producing and half consuming.
- `broadcast_stress`: the same stress test using `notify_all`.
Thread counts:
- `2`, `8`, `32`, `64`, `128`
Implementations compared:
- `musl`
- `glibc`
- `musl_wake`: musl with the requeue optimization disabled, so wakeup uses plain wake behavior instead of the musl requeue path.
- `std`: Rust `std::sync::Condvar`.
- `llvm_old`
- `llvm_new`
## Full Results
Each cell is `time (delta vs best in row)`.
### `turn_ring`
| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 199.705 us (+32.4%) | 205.748 us (+36.4%) | 238.180 us (+57.9%) | 150.878 us (best) | 176.599 us (+17.0%) | 232.869 us (+54.3%) |
| 8 | 5.073 ms (+91.8%) | 2.645 ms (best) | 6.375 ms (+141.0%) | 3.162 ms (+19.5%) | 5.835 ms (+120.6%) | 5.064 ms (+91.4%) |
| 32 | 51.389 ms (+19.0%) | 43.183 ms (best) | 57.375 ms (+32.9%) | 69.980 ms (+62.1%) | 92.559 ms (+114.3%) | 48.288 ms (+11.8%) |
| 64 | 192.041 ms (+10.0%) | 174.572 ms (best) | 232.593 ms (+33.2%) | 269.310 ms (+54.3%) | 367.053 ms (+110.3%) | 201.725 ms (+15.6%) |
| 128 | 1.054 s (+32.3%) | 796.751 ms (best) | 920.699 ms (+15.6%) | 1.080 s (+35.6%) | 1.280 s (+60.6%) | 917.478 ms (+15.2%) |
### `bounded_queue`
| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 110.781 us (+113.8%) | 51.819 us (best) | 113.254 us (+118.6%) | 143.824 us (+177.6%) | 69.688 us (+34.5%) | 72.064 us (+39.1%) |
| 8 | 1.073 ms (+154.9%) | 421.019 us (best) | 835.230 us (+98.4%) | 953.647 us (+126.5%) | 769.807 us (+82.8%) | 626.777 us (+48.9%) |
| 32 | 7.567 ms (+77.4%) | 4.264 ms (best) | 7.772 ms (+82.3%) | 9.307 ms (+118.2%) | 6.392 ms (+49.9%) | 6.313 ms (+48.0%) |
| 64 | 28.701 ms (+172.9%) | 10.516 ms (best) | 27.066 ms (+157.4%) | 22.024 ms (+109.4%) | 13.830 ms (+31.5%) | 14.357 ms (+36.5%) |
| 128 | 66.660 ms (+157.0%) | 25.935 ms (best) | 65.588 ms (+152.9%) | 42.743 ms (+64.8%) | 29.939 ms (+15.4%) | 33.652 ms (+29.8%) |
### `hash_map_queue`
| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 173.431 us (+81.7%) | 105.127 us (+10.1%) | 131.521 us (+37.8%) | 95.449 us (best) | 133.142 us (+39.5%) | 132.622 us (+38.9%) |
| 8 | 546.390 us (+12.0%) | 510.670 us (+4.7%) | 708.549 us (+45.3%) | 487.697 us (best) | 546.606 us (+12.1%) | 653.349 us (+34.0%) |
| 32 | 4.492 ms (+118.4%) | 3.051 ms (+48.4%) | 5.017 ms (+143.9%) | 2.057 ms (best) | 4.521 ms (+119.8%) | 4.095 ms (+99.1%) |
| 64 | 11.786 ms (+178.6%) | 8.104 ms (+91.6%) | 11.532 ms (+172.6%) | 4.231 ms (best) | 8.526 ms (+101.5%) | 8.177 ms (+93.3%) |
| 128 | 23.188 ms (+171.1%) | 16.887 ms (+97.4%) | 23.416 ms (+173.7%) | 8.554 ms (best) | 17.553 ms (+105.2%) | 16.361 ms (+91.3%) |
### `btree_map_queue`
| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 141.858 us (+33.3%) | 128.092 us (+20.4%) | 139.151 us (+30.8%) | 106.387 us (best) | 127.936 us (+20.3%) | 118.026 us (+10.9%) |
| 8 | 478.921 us (+0.8%) | 485.530 us (+2.2%) | 549.908 us (+15.7%) | 475.244 us (best) | 543.709 us (+14.4%) | 550.563 us (+15.8%) |
| 32 | 1.898 ms (best) | 2.035 ms (+7.2%) | 2.165 ms (+14.1%) | 1.981 ms (+4.4%) | 2.094 ms (+10.3%) | 2.001 ms (+5.4%) |
| 64 | 4.334 ms (+8.5%) | 4.416 ms (+10.6%) | 3.993 ms (best) | 4.124 ms (+3.3%) | 4.222 ms (+5.7%) | 4.201 ms (+5.2%) |
| 128 | 8.237 ms (best) | 8.857 ms (+7.5%) | 8.304 ms (+0.8%) | 10.099 ms (+22.6%) | 8.709 ms (+5.7%) | 8.628 ms (+4.8%) |
### `signal_stress`
| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 298.372 us (+7.1%) | 315.793 us (+13.4%) | 323.865 us (+16.3%) | 278.530 us (best) | 383.176 us (+37.6%) | 297.881 us (+6.9%) |
| 8 | 1.167 ms (+26.7%) | 1.028 ms (+11.6%) | 1.187 ms (+28.9%) | 920.465 us (best) | 1.451 ms (+57.7%) | 1.088 ms (+18.2%) |
| 32 | 7.358 ms (+50.7%) | 6.277 ms (+28.6%) | 8.080 ms (+65.6%) | 6.009 ms (+23.1%) | 7.462 ms (+52.9%) | 4.881 ms (best) |
| 64 | 19.338 ms (+81.2%) | 15.305 ms (+43.4%) | 19.127 ms (+79.2%) | 12.451 ms (+16.7%) | 13.416 ms (+25.7%) | 10.673 ms (best) |
| 128 | 41.099 ms (+80.1%) | 33.200 ms (+45.5%) | 40.980 ms (+79.6%) | 25.315 ms (+11.0%) | 28.089 ms (+23.1%) | 22.814 ms (best) |
### `broadcast_stress`
| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 295.617 us (+12.1%) | 263.780 us (best) | 322.797 us (+22.4%) | 337.968 us (+28.1%) | 398.988 us (+51.3%) | 279.450 us (+5.9%) |
| 8 | 921.990 us (+23.6%) | 774.165 us (+3.8%) | 1.186 ms (+59.0%) | 745.782 us (best) | 1.386 ms (+85.9%) | 908.373 us (+21.8%) |
| 32 | 7.268 ms (+82.5%) | 4.054 ms (+1.8%) | 7.877 ms (+97.8%) | 5.119 ms (+28.5%) | 6.585 ms (+65.3%) | 3.983 ms (best) |
| 64 | 17.360 ms (+86.5%) | 9.310 ms (best) | 18.877 ms (+102.8%) | 11.176 ms (+20.1%) | 13.851 ms (+48.8%) | 11.763 ms (+26.3%) |
| 128 | 39.085 ms (+94.6%) | 20.082 ms (best) | 40.764 ms (+103.0%) | 22.732 ms (+13.2%) | 28.456 ms (+41.7%) | 31.170 ms (+55.2%) |
## `lscpu`
```text
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 9950X 16-Core Processor
CPU family: 26
Model: 68
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 0
Microcode version: 0xb404035
Frequency boost: enabled
CPU(s) scaling MHz: 53%
CPU max MHz: 5756.4521
CPU min MHz: 624.1940
BogoMIPS: 8584.14
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d amd_lbr_pmc_freeze
Virtualization: AMD-V
L1d cache: 768 KiB (16 instances)
L1i cache: 512 KiB (16 instances)
L2 cache: 16 MiB (16 instances)
L3 cache: 64 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Old microcode: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; IBPB on VMEXIT only
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsa: Not affected
Vulnerability Tsx async abort: Not affected
Vulnerability Vmscape: Mitigation; IBPB on VMEXIT
```
https://github.com/llvm/llvm-project/pull/192748
More information about the libc-commits
mailing list