[libc-commits] [libc] [llvm] [libc][CndVar] reimplmement conditional variable with FIFO ordering (PR #192748)

Schrodinger ZHU Yifan via libc-commits libc-commits at lists.llvm.org
Mon Apr 20 07:41:22 PDT 2026


SchrodingerZhu wrote:

# Benchmark

Generated by `scripts/generate_data.py`.

Command used:

```sh
cargo bench --bench compare
```

All numbers below are Criterion `mean.point_estimate` values from
`target/criterion/*/*/*/new/estimates.json`.
Unit: nanoseconds.

## Benchmark Set

- `turn_ring`: microbenchmark-style strict turn handoff around a ring of threads. This stresses notify/wait latency and wakeup propagation under heavy contention.
- `bounded_queue`: producer/consumer queue with a fixed capacity. This is the main realistic condvar workload.
- `hash_map_queue`: the bounded queue plus a large `HashMap` touched inside the critical section on both push and pop.
- `btree_map_queue`: the bounded queue plus a large `BTreeMap` touched inside the critical section on both push and pop.
- `signal_stress`: producer/consumer stress test using `notify_one`, with half the threads producing and half consuming.
- `broadcast_stress`: the same stress test using `notify_all`.

Thread counts:

- `2`, `8`, `32`, `64`, `128`

Implementations compared:

- `musl`
- `glibc`
- `musl_wake`: musl with the requeue optimization disabled, so wakeup uses plain wake behavior instead of the musl requeue path.
- `std`: Rust `std::sync::Condvar`.
- `llvm_old`
- `llvm_new`

## Full Results

Each cell is `time (delta vs best in row)`.

### `turn_ring`

| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 598.086 us (+15.3%) | 548.606 us (+5.8%) | 650.277 us (+25.4%) | 518.564 us (best) | 644.916 us (+24.4%) | 607.320 us (+17.1%) |
| 8 | 17.569 ms (+440.2%) | 3.473 ms (+6.8%) | 16.608 ms (+410.7%) | 3.252 ms (best) | 12.739 ms (+291.7%) | 14.645 ms (+350.3%) |
| 32 | 211.778 ms (+730.0%) | 25.516 ms (best) | 152.583 ms (+498.0%) | 29.603 ms (+16.0%) | 236.796 ms (+828.0%) | 204.977 ms (+703.3%) |
| 64 | 899.049 ms (+855.4%) | 94.107 ms (best) | 649.790 ms (+590.5%) | 138.403 ms (+47.1%) | 1.006 s (+969.4%) | 878.408 ms (+833.4%) |
| 128 | 3.957 s (+377.2%) | 829.330 ms (best) | 2.675 s (+222.6%) | 874.943 ms (+5.5%) | 4.316 s (+420.5%) | 3.896 s (+369.8%) |

### `bounded_queue`

| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 263.472 us (+125.5%) | 116.847 us (best) | 237.582 us (+103.3%) | 822.926 us (+604.3%) | 135.646 us (+16.1%) | 121.566 us (+4.0%) |
| 8 | 2.606 ms (+173.9%) | 951.656 us (best) | 2.475 ms (+160.0%) | 4.811 ms (+405.5%) | 1.091 ms (+14.7%) | 1.050 ms (+10.3%) |
| 32 | 9.985 ms (+138.2%) | 4.191 ms (best) | 8.857 ms (+111.3%) | 19.209 ms (+358.3%) | 4.976 ms (+18.7%) | 4.746 ms (+13.2%) |
| 64 | 16.388 ms (+72.2%) | 9.519 ms (best) | 18.914 ms (+98.7%) | 36.321 ms (+281.6%) | 11.316 ms (+18.9%) | 10.041 ms (+5.5%) |
| 128 | 39.254 ms (+81.1%) | 21.671 ms (best) | 39.420 ms (+81.9%) | 83.022 ms (+283.1%) | 27.265 ms (+25.8%) | 22.916 ms (+5.7%) |

### `hash_map_queue`

| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 490.975 us (+46.3%) | 357.988 us (+6.7%) | 492.499 us (+46.8%) | 430.836 us (+28.4%) | 341.111 us (+1.7%) | 335.563 us (best) |
| 8 | 1.808 ms (+10.1%) | 1.662 ms (+1.2%) | 1.821 ms (+10.8%) | 2.201 ms (+33.9%) | 1.643 ms (best) | 1.675 ms (+1.9%) |
| 32 | 6.987 ms (+8.2%) | 6.586 ms (+2.0%) | 7.000 ms (+8.4%) | 8.935 ms (+38.4%) | 6.528 ms (+1.1%) | 6.457 ms (best) |
| 64 | 14.397 ms (+3.0%) | 14.121 ms (+1.0%) | 14.459 ms (+3.5%) | 17.863 ms (+27.8%) | 14.376 ms (+2.9%) | 13.975 ms (best) |
| 128 | 36.677 ms (+20.7%) | 31.011 ms (+2.1%) | 36.471 ms (+20.0%) | 36.894 ms (+21.4%) | 30.975 ms (+1.9%) | 30.385 ms (best) |

### `btree_map_queue`

| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 299.881 us (+16.0%) | 258.506 us (best) | 295.340 us (+14.2%) | 274.304 us (+6.1%) | 263.021 us (+1.7%) | 275.262 us (+6.5%) |
| 8 | 1.390 ms (+9.0%) | 1.422 ms (+11.4%) | 1.450 ms (+13.7%) | 1.276 ms (best) | 1.393 ms (+9.2%) | 1.340 ms (+5.0%) |
| 32 | 5.578 ms (+4.6%) | 5.340 ms (+0.2%) | 5.675 ms (+6.4%) | 5.711 ms (+7.1%) | 5.332 ms (best) | 5.632 ms (+5.6%) |
| 64 | 11.146 ms (+1.7%) | 10.962 ms (best) | 11.235 ms (+2.5%) | 11.534 ms (+5.2%) | 11.360 ms (+3.6%) | 11.326 ms (+3.3%) |
| 128 | 22.711 ms (+3.3%) | 22.419 ms (+2.0%) | 22.564 ms (+2.7%) | 23.178 ms (+5.4%) | 22.984 ms (+4.6%) | 21.981 ms (best) |

### `signal_stress`

| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 972.542 us (+57.5%) | 617.369 us (best) | 1.029 ms (+66.7%) | 1.051 ms (+70.2%) | 1.253 ms (+103.0%) | 1.071 ms (+73.4%) |
| 8 | 2.933 ms (+35.7%) | 2.290 ms (+6.0%) | 2.948 ms (+36.4%) | 2.161 ms (best) | 3.158 ms (+46.1%) | 3.121 ms (+44.4%) |
| 32 | 11.528 ms (+46.0%) | 8.597 ms (+8.9%) | 11.212 ms (+42.0%) | 7.897 ms (best) | 12.206 ms (+54.6%) | 10.714 ms (+35.7%) |
| 64 | 23.592 ms (+47.8%) | 18.777 ms (+17.7%) | 23.524 ms (+47.4%) | 15.958 ms (best) | 23.654 ms (+48.2%) | 21.704 ms (+36.0%) |
| 128 | 54.810 ms (+54.2%) | 41.247 ms (+16.1%) | 50.753 ms (+42.8%) | 35.534 ms (best) | 49.190 ms (+38.4%) | 45.016 ms (+26.7%) |

### `broadcast_stress`

| threads | musl | glibc | musl_wake | std | llvm_old | llvm_new |
| --- | --- | --- | --- | --- | --- | --- |
| 2 | 991.403 us (+53.9%) | 644.302 us (best) | 996.001 us (+54.6%) | 1.072 ms (+66.3%) | 1.282 ms (+99.0%) | 988.270 us (+53.4%) |
| 8 | 3.281 ms (+107.5%) | 1.581 ms (best) | 3.123 ms (+97.5%) | 1.927 ms (+21.9%) | 1.989 ms (+25.8%) | 3.586 ms (+126.9%) |
| 32 | 13.389 ms (+177.0%) | 4.833 ms (best) | 10.449 ms (+116.2%) | 6.560 ms (+35.7%) | 9.872 ms (+104.2%) | 10.479 ms (+116.8%) |
| 64 | 28.984 ms (+211.6%) | 9.301 ms (best) | 22.847 ms (+145.6%) | 13.488 ms (+45.0%) | 32.536 ms (+249.8%) | 25.911 ms (+178.6%) |
| 128 | 64.338 ms (+244.1%) | 18.696 ms (best) | 53.262 ms (+184.9%) | 30.663 ms (+64.0%) | 78.737 ms (+321.1%) | 59.837 ms (+220.0%) |

## `lscpu`

```text
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  128
On-line CPU(s) list:                     0-127
Vendor ID:                               AuthenticAMD
Model name:                              AMD EPYC 7773X 64-Core Processor
CPU family:                              25
Model:                                   1
Thread(s) per core:                      2
Core(s) per socket:                      64
Socket(s):                               1
Stepping:                                2
Microcode version:                       0xa001247
Frequency boost:                         enabled
CPU(s) scaling MHz:                      65%
CPU max MHz:                             3529.9060
CPU min MHz:                             401.4410
BogoMIPS:                                4399.92
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization:                          AMD-V
L1d cache:                               2 MiB (64 instances)
L1i cache:                               2 MiB (64 instances)
L2 cache:                                32 MiB (64 instances)
L3 cache:                                768 MiB (8 instances)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-127
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; Safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Mitigation; Clear CPU buffers
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace
```


https://github.com/llvm/llvm-project/pull/192748


More information about the libc-commits mailing list