[llvm-branch-commits] [libcxx] release/22.x: [libc++] Fix semaphore timed wait hanging on Windows (#180398) (PR #181415)

Fri Feb 13 12:21:23 PST 2026

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: Hui (huixie90)

<details>
<summary>Changes</summary>

Fixes #180334

The semaphore timed wait test is flaky on Windows. It hangs from time to time.

Some examples:

[windows (clang-cl-no-vcruntime, false, clang-cl,
clang-cl)](https://github.com/llvm/llvm-project/actions/runs/21737380876/job/62707542836#logs) [windows (mingw-static, true, cc,
c++)](https://github.com/llvm/llvm-project/actions/runs/21636063482/job/62367831823?pr=179483#logs) [windows (clang-cl-static, false, clang-cl,
clang-cl)](https://github.com/llvm/llvm-project/actions/runs/21453876753/job/61794464147#logs) [windows (clang-cl-dll, false, clang-cl,
clang-cl)](https://github.com/llvm/llvm-project/actions/runs/21382902941/job/61556154029#logs) [windows (mingw-static, true, cc,
c++)](https://github.com/llvm/llvm-project/actions/runs/21365713577/job/61502377123#logs)

The internal dylib function takes a timeout

```cpp
static void __platform_wait_on_address(void const* __ptr, void const* __val, uint64_t __timeout_ns)
```
We followed the same convention as `__libcpp_thread_poll_with_backoff`, where we used `0ns` to indicate wait indefinitely until being notified.

```cpp
_LIBCPP_HIDE_FROM_ABI __poll_with_backoff_results __libcpp_thread_poll_with_backoff(
    _Poll&& __poll, _Backoff&& __backoff, chrono::nanoseconds __max_elapsed = chrono::nanoseconds::zero())
```

This is problematic, if the caller indeed wants to wait `0ns` and passes `0`, the internal dylib function `__platform_wait_on_address` would wait indefinitely

```cpp
__timeout_ns == 0 ? INFINITE : static_cast<DWORD>(__timeout_ns / 1'000'000)
```

This is what actually happened here. So the fix is to update internal dylib function to use `optional` and use `nullopt` to indicate wait indefinitely

```cpp
static void __platform_wait_on_address(void const* __ptr, void const* __val, optional<uint64_t> __timeout_ns)
```

Edit: after code review, the code is updated to use tag type `NoTimeout` to indicate "wait indefinitely". this is superior because it is coded into the type system instead of runtime check

problem?

`__libcpp_thread_poll_with_backoff` has this " `0ns` means wait indefinitely " semantic for years (it has always been like that), but it never causes issues. This is because, it has

```cpp
chrono::nanoseconds const __elapsed = chrono::high_resolution_clock::now() - __start;
if (__max_elapsed != chrono::nanoseconds::zero() && __max_elapsed < __elapsed)
      return __poll_with_backoff_results::__timeout;
```

`__max_elapsed` is what user passed in, let's assume the user passed in `0ns`, and `__elapsed` is certainly a positive number so it never goes to the backoff function and directly returned. No hanging possible

So in the test, we passed wait some time, say `1ms`, from the `semaphore` public API, which calls `__libcpp_thread_poll_with_backoff` with `1ms` timeout. `__libcpp_thread_poll_with_backoff` will do some polling loops and then calling into the backoff function, platform timed wait in this case, with timeout of `1ms - elapsed` , say `950us`. However, Windows platform wait has millisecond precision, so `950us` is rounded down to `0ms` (`static_cast<DWORD>(__timeout_ns / 1'000'000)`), so the function call almost immediately returns, and `__libcpp_thread_poll_with_backoff` will keep its polling loop like this. As time goes by in the polling loop, the timeout for platform wait will decrease from `950us` to smaller and smaller number.

In the `__libcpp_thread_poll_with_backoff`
```cpp
if (__max_elapsed != chrono::nanoseconds::zero() && __max_elapsed < __elapsed)
      return __poll_with_backoff_results::__timeout;
if (auto __backoff_res = __backoff(__elapsed); __backoff_res == __backoff_results::__continue_poll)
```
`__max_elapsed` is user requested timeout, which is `1ms` in this case, and `__elapsed` gradually increases and eventually, if it becomes greater than `1ms`, we have `__max_elapsed < __elapsed`, it will return and test passes. all Good. But there is a slim chance that on one loop, `__elapsed` is exactly the same number of the user requested `__max_elapsed` `1ms`, so `__max_elapsed == __elapsed`, this will make the code path go to backoff platform wait, with the timeout `__max_elapsed - __elapsed == 0ns`. So now we are requesting `__platform_wait_on_address` to wait exactly `0ns`, and due to our ambiguous API, `0ns` means wait indefinitely

```cpp
__timeout_ns == 0 ? INFINITE : static_cast<DWORD>(__timeout_ns / 1'000'000)
```

The test will just hang forever.

in `__libcpp_thread_poll_with_backoff`, If we check `__max_elapsed <= __elapsed` instead of `__max_elapsed < __elapsed`, we would avoid the call to platform wait with `0ns`. But according to the standard, I think the current `__max_elapsed < __elapsed` is more correct

> The timeout expires
([[thread.req.timing]](https://eel.is/c++draft/thread.req.timing)) when the current time is after abs_time (for try_acquire_until) or when at least rel_time has passed from the start of the function (for try_acquire_for)[.](https://eel.is/c++draft/thread.sema#cnt-18.sentence-2)

https://eel.is/c++draft/thread.sema#cnt-18.2

So this **after** means that it needs strictly greater than i think.

The fix is to update the internal dylib API to use `nullopt` to indicate wait indefinitely. So a call to platform wait with `0ns` will not cause a hang.

Edit: after code review, the code is updated to use tag type `NoTimeout` to indicate "wait indefinitely". this is superior because it is coded into the type system instead of runtime check

I made a small change as well. it is very easy to get into a situation where the requested platform wait timeout is `<1ms`, we will be keep calling Windows platform wait with `0ns` because of the rounding, and this is effectively a spin lock . I made a change such that if the requested timeout is between `100us to 1ms`, just rounded up to `1ms` to wait a bit longer (which is conforming IIUC) . let me know if this change is necessary, happy to take this part out if it is not considered good.

---
Full diff: https://github.com/llvm/llvm-project/pull/181415.diff


2 Files Affected:

- (modified) libcxx/src/atomic.cpp (+67-34) 
- (added) libcxx/test/std/thread/thread.semaphore/timed.hang.pass.cpp (+45) 


``````````diff

diff --git a/libcxx/src/atomic.cpp b/libcxx/src/atomic.cpp
index 637c971ed8896..ab31915daa5a0 100644
--- a/libcxx/src/atomic.cpp
+++ b/libcxx/src/atomic.cpp
@@ -59,22 +59,27 @@
 
 #endif
 
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
 _LIBCPP_BEGIN_NAMESPACE_STD
 
+struct NoTimeout {};
+
 #ifdef __linux__
 
-template <std::size_t _Size>
-static void __platform_wait_on_address(void const* __ptr, void const* __val, uint64_t __timeout_ns) {
+template <std::size_t _Size, class MaybeTimeout>
+static void __platform_wait_on_address(void const* __ptr, void const* __val, MaybeTimeout maybe_timeout_ns) {
   static_assert(_Size == 4, "Can only wait on 4 bytes value");
   char buffer[_Size];
   std::memcpy(&buffer, const_cast<const void*>(__val), _Size);
   static constexpr timespec __default_timeout = {2, 0};
   timespec __timeout;
-  if (__timeout_ns == 0) {
+  if constexpr (is_same_v<MaybeTimeout, NoTimeout>) {
     __timeout = __default_timeout;
   } else {
-    __timeout.tv_sec  = __timeout_ns / 1'000'000'000;
-    __timeout.tv_nsec = __timeout_ns % 1'000'000'000;
+    __timeout.tv_sec  = maybe_timeout_ns / 1'000'000'000;
+    __timeout.tv_nsec = maybe_timeout_ns % 1'000'000'000;
   }
   _LIBCPP_FUTEX(__ptr, FUTEX_WAIT_PRIVATE, *reinterpret_cast<__cxx_contention_t const*>(&buffer), &__timeout, 0, 0);
 }
@@ -96,18 +101,27 @@ extern "C" int __ulock_wake(uint32_t operation, void* addr, uint64_t wake_value)
 #  define UL_COMPARE_AND_WAIT64 5
 #  define ULF_WAKE_ALL 0x00000100
 
-template <std::size_t _Size>
-static void __platform_wait_on_address(void const* __ptr, void const* __val, uint64_t __timeout_ns) {
+template <std::size_t _Size, class MaybeTimeout>
+static void __platform_wait_on_address(void const* __ptr, void const* __val, MaybeTimeout maybe_timeout_ns) {
   static_assert(_Size == 8 || _Size == 4, "Can only wait on 8 bytes or 4 bytes value");
-  char buffer[_Size];
-  std::memcpy(&buffer, const_cast<const void*>(__val), _Size);
-  auto __timeout_us = __timeout_ns == 0 ? 0 : static_cast<uint32_t>(__timeout_ns / 1000);
-  if constexpr (_Size == 4)
+  auto __timeout_us = [&] {
+    if constexpr (is_same_v<MaybeTimeout, NoTimeout>) {
+      return uint32_t(0);
+    } else {
+      return std::max(static_cast<uint32_t>(maybe_timeout_ns / 1000), uint32_t(1));
+    }
+  }();
+  if constexpr (_Size == 4) {
+    alignas(uint32_t) char buffer[_Size];
+    std::memcpy(&buffer, const_cast<const void*>(__val), _Size);
     __ulock_wait(
         UL_COMPARE_AND_WAIT, const_cast<void*>(__ptr), *reinterpret_cast<uint32_t const*>(&buffer), __timeout_us);
-  else
+  } else {
+    alignas(uint64_t) char buffer[_Size];
+    std::memcpy(&buffer, const_cast<const void*>(__val), _Size);
     __ulock_wait(
         UL_COMPARE_AND_WAIT64, const_cast<void*>(__ptr), *reinterpret_cast<uint64_t const*>(&buffer), __timeout_us);
+  }
 }
 
 template <std::size_t _Size>
@@ -127,17 +141,17 @@ static void __platform_wake_by_address(void const* __ptr, bool __notify_one) {
  * limit its use to architectures where long and int64_t are synonyms.
  */
 
-template <std::size_t _Size>
-static void __platform_wait_on_address(void const* __ptr, void const* __val, uint64_t __timeout_ns) {
+template <std::size_t _Size, class MaybeTimeout>
+static void __platform_wait_on_address(void const* __ptr, void const* __val, MaybeTimeout maybe_timeout_ns) {
   static_assert(_Size == 8, "Can only wait on 8 bytes value");
   char buffer[_Size];
   std::memcpy(&buffer, const_cast<const void*>(__val), _Size);
-  if (__timeout_ns == 0) {
+  if constexpr (is_same_v<MaybeTimeout, NoTimeout>) {
     _umtx_op(const_cast<void*>(__ptr), UMTX_OP_WAIT, *reinterpret_cast<__cxx_contention_t*>(&buffer), nullptr, nullptr);
   } else {
     _umtx_time ut;
-    ut._timeout.tv_sec  = __timeout_ns / 1'000'000'000;
-    ut._timeout.tv_nsec = __timeout_ns % 1'000'000'000;
+    ut._timeout.tv_sec  = maybe_timeout_ns / 1'000'000'000;
+    ut._timeout.tv_nsec = maybe_timeout_ns % 1'000'000'000;
     ut._flags           = 0;               // Relative time (not absolute)
     ut._clockid         = CLOCK_MONOTONIC; // Use monotonic clock
 
@@ -181,22 +195,35 @@ static void* win32_get_synch_api_function(const char* function_name) {
   return reinterpret_cast<void*>(GetProcAddress(module_handle, function_name));
 }
 
-template <std::size_t _Size>
-static void __platform_wait_on_address(void const* __ptr, void const* __val, uint64_t __timeout_ns) {
+template <std::size_t _Size, class MaybeTimeout>
+static void __platform_wait_on_address(void const* __ptr, void const* __val, MaybeTimeout maybe_timeout_ns) {
   static_assert(_Size == 8, "Can only wait on 8 bytes value");
   // WaitOnAddress was added in Windows 8 (build 9200)
   static auto wait_on_address =
       reinterpret_cast<BOOL(WINAPI*)(void*, PVOID, SIZE_T, DWORD)>(win32_get_synch_api_function("WaitOnAddress"));
   if (wait_on_address != nullptr) {
-    wait_on_address(const_cast<void*>(__ptr),
-                    const_cast<void*>(__val),
-                    _Size,
-                    __timeout_ns == 0 ? INFINITE : static_cast<DWORD>(__timeout_ns / 1'000'000));
+    auto timeout_ms = [&]() -> DWORD {
+      if constexpr (is_same_v<MaybeTimeout, NoTimeout>) {
+        return INFINITE;
+      } else {
+        uint64_t ms = maybe_timeout_ns / 1'000'000;
+        if (ms == 0 && maybe_timeout_ns > 100'000)
+          // Round up to 1ms if requested between 100us - 1ms
+          return 1;
+
+        return static_cast<DWORD>(std::min(static_cast<uint64_t>(INFINITE), ms));
+      }
+    }();
+    wait_on_address(const_cast<void*>(__ptr), const_cast<void*>(__val), _Size, timeout_ms);
   } else {
+    std::chrono::nanoseconds timeout = std::chrono::nanoseconds(0);
+    if constexpr (!is_same_v<MaybeTimeout, NoTimeout>) {
+      timeout = std::chrono::nanoseconds(maybe_timeout_ns);
+    }
     __libcpp_thread_poll_with_backoff(
         [=]() -> bool { return std::memcmp(const_cast<const void*>(__ptr), __val, _Size) != 0; },
         __libcpp_timed_backoff_policy(),
-        std::chrono::nanoseconds(__timeout_ns));
+        timeout);
   }
 }
 
@@ -230,12 +257,16 @@ static void __platform_wake_by_address(void const* __ptr, bool __notify_one) {
 
 // Baseline is just a timed backoff
 
-template <std::size_t _Size>
-static void __platform_wait_on_address(void const* __ptr, void const* __val, uint64_t __timeout_ns) {
+template <std::size_t _Size, class MaybeTimeout>
+static void __platform_wait_on_address(void const* __ptr, void const* __val, MaybeTimeout maybe_timeout_ns) {
+  std::chrono::nanoseconds timeout = std::chrono::nanoseconds(0);
+  if constexpr (!is_same_v<MaybeTimeout, NoTimeout>) {
+    timeout = std::chrono::nanoseconds(maybe_timeout_ns);
+  }
   __libcpp_thread_poll_with_backoff(
       [=]() -> bool { return std::memcmp(const_cast<const void*>(__ptr), __val, _Size) != 0; },
       __libcpp_timed_backoff_policy(),
-      std::chrono::nanoseconds(__timeout_ns));
+      timeout);
 }
 
 template <std::size_t _Size>
@@ -258,17 +289,17 @@ __contention_notify(__cxx_atomic_contention_t* __waiter_count, void const* __add
     __platform_wake_by_address<_Size>(__address_to_notify, __notify_one);
 }
 
-template <std::size_t _Size>
+template <std::size_t _Size, class MaybeTimeout>
 static void __contention_wait(__cxx_atomic_contention_t* __waiter_count,
                               void const* __address_to_wait,
                               void const* __old_value,
-                              uint64_t __timeout_ns) {
+                              MaybeTimeout maybe_timeout_ns) {
   __cxx_atomic_fetch_add(__waiter_count, __cxx_contention_t(1), memory_order_relaxed);
   // https://llvm.org/PR109290
   // There are no platform guarantees of a memory barrier in the platform wait implementation
   __cxx_atomic_thread_fence(memory_order_seq_cst);
   // We sleep as long as the monitored value hasn't changed.
-  __platform_wait_on_address<_Size>(__address_to_wait, __old_value, __timeout_ns);
+  __platform_wait_on_address<_Size>(__address_to_wait, __old_value, maybe_timeout_ns);
   __cxx_atomic_fetch_sub(__waiter_count, __cxx_contention_t(1), memory_order_release);
 }
 
@@ -331,7 +362,7 @@ _LIBCPP_EXPORTED_FROM_ABI void
 __atomic_wait_global_table(void const* __location, __cxx_contention_t __old_value) noexcept {
   auto const __entry = __get_global_contention_state(__location);
   __contention_wait<sizeof(__cxx_atomic_contention_t)>(
-      &__entry->__waiter_count, &__entry->__platform_state, &__old_value, 0);
+      &__entry->__waiter_count, &__entry->__platform_state, &__old_value, NoTimeout{});
 }
 
 _LIBCPP_EXPORTED_FROM_ABI void __atomic_wait_global_table_with_timeout(
@@ -353,7 +384,7 @@ _LIBCPP_EXPORTED_FROM_ABI void __atomic_notify_all_global_table(void const* __lo
 
 template <std::size_t _Size>
 _LIBCPP_EXPORTED_FROM_ABI void __atomic_wait_native(void const* __address, void const* __old_value) noexcept {
-  __contention_wait<_Size>(__get_native_waiter_count(__address), __address, __old_value, 0);
+  __contention_wait<_Size>(__get_native_waiter_count(__address), __address, __old_value, NoTimeout{});
 }
 
 template <std::size_t _Size>
@@ -428,7 +459,7 @@ _LIBCPP_EXPORTED_FROM_ABI void
 __libcpp_atomic_wait(void const volatile* __location, __cxx_contention_t __old_value) noexcept {
   auto const __entry = __get_global_contention_state(const_cast<void const*>(__location));
   __contention_wait<sizeof(__cxx_atomic_contention_t)>(
-      &__entry->__waiter_count, &__entry->__platform_state, &__old_value, 0);
+      &__entry->__waiter_count, &__entry->__platform_state, &__old_value, NoTimeout{});
 }
 
 _LIBCPP_EXPORTED_FROM_ABI void __cxx_atomic_notify_one(__cxx_atomic_contention_t const volatile* __location) noexcept {
@@ -447,7 +478,7 @@ _LIBCPP_EXPORTED_FROM_ABI void
 __libcpp_atomic_wait(__cxx_atomic_contention_t const volatile* __location, __cxx_contention_t __old_value) noexcept {
   auto __location_cast = const_cast<const void*>(static_cast<const volatile void*>(__location));
   __contention_wait<sizeof(__cxx_atomic_contention_t)>(
-      __get_native_waiter_count(__location_cast), __location_cast, &__old_value, 0);
+      __get_native_waiter_count(__location_cast), __location_cast, &__old_value, NoTimeout{});
 }
 
 // this function is even unused in the old ABI
@@ -459,3 +490,5 @@ __libcpp_atomic_monitor(__cxx_atomic_contention_t const volatile* __location) no
 _LIBCPP_DIAGNOSTIC_POP
 
 _LIBCPP_END_NAMESPACE_STD
+
+_LIBCPP_POP_MACROS
diff --git a/libcxx/test/std/thread/thread.semaphore/timed.hang.pass.cpp b/libcxx/test/std/thread/thread.semaphore/timed.hang.pass.cpp
new file mode 100644
index 0000000000000..75366bd4f5397
--- /dev/null
+++ b/libcxx/test/std/thread/thread.semaphore/timed.hang.pass.cpp
@@ -0,0 +1,45 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// UNSUPPORTED: no-threads
+// UNSUPPORTED: c++03, c++11, c++14, c++17
+
+// <semaphore>
+
+// This is a regression test for a bug in semaphore::try_acquire_for
+// where it can wait indefinitely
+// https://github.com/llvm/llvm-project/issues/180334
+
+#include <semaphore>
+#include <thread>
+#include <chrono>
+#include <cassert>
+
+#include "make_test_thread.h"
+#include "test_macros.h"
+
+void test() {
+  auto const start = std::chrono::steady_clock::now();
+  std::counting_semaphore<> s(0);
+
+  assert(!s.try_acquire_for(std::chrono::nanoseconds(1)));
+  assert(!s.try_acquire_for(std::chrono::microseconds(1)));
+  assert(!s.try_acquire_for(std::chrono::milliseconds(1)));
+  assert(!s.try_acquire_for(std::chrono::milliseconds(100)));
+
+  auto const end = std::chrono::steady_clock::now();
+  assert(end - start < std::chrono::seconds(10));
+}
+
+int main(int, char**) {
+  for (auto i = 0; i < 10; ++i) {
+    test();
+  }
+
+  return 0;
+}

``````````

</details>


https://github.com/llvm/llvm-project/pull/181415