[libcxx-commits] [libcxx] [libc++] Improve performance of std::atomic_flag on Windows (PR #163524)

Sun Nov 2 13:45:02 PST 2025

================
@@ -101,6 +105,46 @@ static void __libcpp_platform_wake_by_address(__cxx_atomic_contention_t const vo
   _umtx_op(const_cast<__cxx_atomic_contention_t*>(__ptr), UMTX_OP_WAKE, __notify_one ? 1 : INT_MAX, nullptr, nullptr);
 }
 
+#elif defined(_WIN32)
+
+static void
+__libcpp_platform_wait_on_address(__cxx_atomic_contention_t const volatile* __ptr, __cxx_contention_t __val) {
+  // WaitOnAddress was added in Windows 8 (build 9200)
+  static auto wait_on_address = reinterpret_cast<BOOL(WINAPI*)(volatile void*, PVOID, SIZE_T, DWORD)>(
+      GetProcAddress(GetModuleHandleW(L"api-ms-win-core-synch-l1-2-0.dll"), "WaitOnAddress"));
+  if (wait_on_address != nullptr) {
----------------
RogerSanders wrote:

Well this is very interesting:

`    [[nodiscard]] __std_atomic_api_level _Init_wait_functions(__std_atomic_api_level _Level) {
        while (!_Wait_functions._Api_level.compare_exchange_weak(
            _Level, __std_atomic_api_level::__detecting, _STD memory_order_acq_rel)) {
            if (_Level > __std_atomic_api_level::__detecting) {
                return _Level;
            }
        }

        _Level = __std_atomic_api_level::__has_srwlock;

        const HMODULE _Sync_module = GetModuleHandleW(L"api-ms-win-core-synch-l1-2-0.dll");
        if (_Sync_module != nullptr) {
            const auto _Wait_on_address =
                reinterpret_cast<decltype(&::WaitOnAddress)>(GetProcAddress(_Sync_module, "WaitOnAddress"));
            const auto _Wake_by_address_single =
                reinterpret_cast<decltype(&::WakeByAddressSingle)>(GetProcAddress(_Sync_module, "WakeByAddressSingle"));
            const auto _Wake_by_address_all =
                reinterpret_cast<decltype(&::WakeByAddressAll)>(GetProcAddress(_Sync_module, "WakeByAddressAll"));

            if (_Wait_on_address != nullptr && _Wake_by_address_single != nullptr && _Wake_by_address_all != nullptr) {
                _Wait_functions._Pfn_WaitOnAddress.store(_Wait_on_address, _STD memory_order_relaxed);
                _Wait_functions._Pfn_WakeByAddressSingle.store(_Wake_by_address_single, _STD memory_order_relaxed);
                _Wait_functions._Pfn_WakeByAddressAll.store(_Wake_by_address_all, _STD memory_order_relaxed);
                _Level = __std_atomic_api_level::__has_wait_on_address;
            }
        }

        // for __has_srwlock, relaxed would have been enough, not distinguishing for consistency
        _Wait_functions._Api_level.store(_Level, _STD memory_order_release);
        return _Level;
    }
`

So according to the Microsoft STL implementation, it is safe to assume this module is always loaded, because they do. This is true in practice of course, but it now HAS to be forever. Everything using the current and previous versions of the Microsoft STL on the latest and future versions of Windows would regress if this wasn't the case. If they accept this implementation, I think we should too.

Very interested in thoughts on this, but I think mirroring the Microsoft STL assumptions here is the correct move. This means I'd revert to GetModuleHandle, with no LoadModule/UnloadModule.

https://github.com/llvm/llvm-project/pull/163524