[libcxx-commits] [libcxx] [libc++] Improve performance of std::atomic_flag on Windows (PR #163524)

Sat Oct 18 09:09:44 PDT 2025

https://github.com/huixie90 requested changes to this pull request.

> we'd still need a loop above WaitOnAddress checking the value, since as per C++20 the wait functions on atomics must not unblock spuriously, while WaitOnAddress can

we already do that. Please check the header atomic_sync.h for implementation of ‘atomic/hatch/barrier’s wait function. Basically , we check the predicate (not binary equal) in a while loop. If wait time < 4us , we spin, otherwise, we call platform dependent code (eg futex wait on Linux ).  So it already handles spurious wake up in the platform code. 

> WaitOnAddress can wait on 1, 2, 4, or 8 bytes in both x86-64 and 32-bit x86, so can implement most atomic<T>::wait operations directly.

I would suggest to get familiar how Libc++ dispatch the wait based on the type. see contention_t.h.  Unfortunately the current trunk behaviour is not optimal. The way it works is that, each platform defines a contention_t, (eg int32_t on Linux) if the atomic value type happens to be the same, it goes to the happy path and call platform wait directly. Otherwise, if the type does not match exactly, eg uint32_t, it will go through a global contention table, and using the proxy atomic to call the platform code. Which is ineffient. Unfortunately allowing dispatch based on the size is an ABI break (in the sense that it changed the side effect and the post condition of a function).  In short, I would highly recommend to implement efficiently from the very beginning for windows.  

I have a refactoring of the whole thing in progress 
https://github.com/llvm/llvm-project/pull/161086

I would suggest wait until we land that first, so you can get better wait strategy from the very beginning, without needing to fighting with the ABI macros

https://github.com/llvm/llvm-project/pull/163524