[libcxx-commits] [libcxx] [libc++] Refactor the predicate taking variant of `__cxx_atomic_wait` (PR #80596)

Sun Feb 4 05:38:29 PST 2024

Jan =?utf-8?q?Kokemüller?= <jan.kokemueller at gmail.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/80596 at github.com>


llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: Jan Kokemüller (jiixyj)

<details>
<summary>Changes</summary>

This is a follow-up PR to <https://github.com/llvm/llvm-project/pull/79265>. It aims to be a gentle refactoring of the `__cxx_atomic_wait` function that takes a predicate.

The key idea here is that this function's signature is changed to look like this (`std::function` used just for clarity):

```c++
__cxx_atomic_wait_fn(Atp*, std::function<bool(Tp &)> poll, memory_order __order);
```

...where `Tp` is the corresponding `value_type` to the atomic variable type `Atp`. The function's semantics are similar to `atomic`s `.wait()`, but instead of having a hardcoded predicate (is the loaded value unequal to `old`?) the predicate is specified explicitly.

The `poll` function may change its argument, and it is very important that if it returns `false`, it leaves its current understanding of the atomic's value in the argument. Internally, `__cxx_atomic_wait_fn` dispatches to two waiting mechanisms, depending on the type of the atomic variable:

1. If the atomic variable can be waited on directly (for example, Linux's futex mechanism only supports waiting on 32 bit long variables), the value of the atomic variable (which `poll` made its decision on) is then given to the underlying system wait function (e.g. futex).
2. If the atomic variable can not be waited on directly, there is a global pool of atomics that are used for this task. The ["eventcount" pattern](<https://gist.github.com/mratsim/04a29bdd98d6295acda4d0677c4d0041>) is employed to make this possible.

The eventcount pattern needs a "monitor" variable which is read before the condition is checked another time. libcxx has the `__libcpp_atomic_monitor` function for this. However, this function only has to be called in case "2", i.e. when the eventcount is actually used. In case "1", the futex is used directly, so the monitor must be the value of the atomic variable that the `poll` function made its decision on to continue blocking. Previously, `__libcpp_atomic_monitor` was _also_ used in case "1". This was the source of the ABA style bug that PR#79265 fixed.

However, the solution in PR#79265 has some disadvantages:

- It exposes internals such as `cxx_contention_t` or the fact that `__libcpp_thread_poll_with_backoff` needs two functions to higher level constructs such as `semaphore`.
- It doesn't prevent consumers calling `__cxx_atomic_wait` in an error prone way, i.e. by providing to it a predicate that doesn't take an argument. This makes ABA style issues more likely to appear.

Now, `__cxx_atomic_wait_fn` takes just _one_ function, which is then transformed into the `poll` and `backoff` callables needed by `__libcpp_thread_poll_with_backoff`.

Aside from the `__cxx_atomic_wait` changes, the only other change is the weakening of the initial atomic load of `semaphore`'s `try_acquire` into `memory_order_relaxed` and the CAS inside the loop is changed from `strong` to `weak`. Both weakenings should be fine, since the CAS is called in a loop, and the "acquire" semantics of `try_acquire` come from the CAS, not from the initial load.

---
Full diff: https://github.com/llvm/llvm-project/pull/80596.diff


5 Files Affected:

- (modified) libcxx/include/__atomic/atomic_sync.h (+69-22) 
- (modified) libcxx/include/latch (+9-2) 
- (modified) libcxx/include/semaphore (+14-15) 
- (modified) libcxx/src/atomic.cpp (+1) 
- (added) libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp (+64) 


``````````diff

diff --git a/libcxx/include/__atomic/atomic_sync.h b/libcxx/include/__atomic/atomic_sync.h
index 93527958b2e1c..1eca545afb99f 100644
--- a/libcxx/include/__atomic/atomic_sync.h
+++ b/libcxx/include/__atomic/atomic_sync.h
@@ -43,17 +43,55 @@ __libcpp_atomic_monitor(__cxx_atomic_contention_t const volatile*);
 _LIBCPP_AVAILABILITY_SYNC _LIBCPP_EXPORTED_FROM_ABI void
 __libcpp_atomic_wait(__cxx_atomic_contention_t const volatile*, __cxx_contention_t);
 
-template <class _Atp, class _Fn>
+template <class _Atp, class _Poll>
+struct __libcpp_atomic_wait_poll_impl {
+  _Atp* __a_;
+  _Poll __poll_;
+  memory_order __order_;
+
+  _LIBCPP_AVAILABILITY_SYNC
+  _LIBCPP_HIDE_FROM_ABI bool operator()() const {
+    auto __current_val = __cxx_atomic_load(__a_, __order_);
+    return __poll_(__current_val);
+  }
+};
+
+template <class _Atp, class _Poll>
 struct __libcpp_atomic_wait_backoff_impl {
-  _Atp* __a;
-  _Fn __test_fn;
+  _Atp* __a_;
+  _Poll __poll_;
+  memory_order __order_;
+
+  _LIBCPP_AVAILABILITY_SYNC
+  _LIBCPP_HIDE_FROM_ABI bool
+  __poll_or_get_monitor(__cxx_atomic_contention_t const volatile*, __cxx_contention_t& __monitor) const {
+    // In case the atomic can be waited on directly, the monitor value is just
+    // the value of the atomic.
+    // `__poll_` takes the current value of the atomic as an in-out argument
+    // to potentially modify it. After it returns, `__monitor` has a value
+    // which can be safely waited on by `std::__libcpp_atomic_wait` without any
+    // ABA style issues.
+    __monitor = __cxx_atomic_load(__a_, __order_);
+    return __poll_(__monitor);
+  }
+
+  _LIBCPP_AVAILABILITY_SYNC
+  _LIBCPP_HIDE_FROM_ABI bool __poll_or_get_monitor(void const volatile*, __cxx_contention_t& __monitor) const {
+    // In case we must wait on an atomic from the pool, the monitor comes from
+    // `std::__libcpp_atomic_monitor`.
+    // Only then we may read from `__a_`. This is the "event count" pattern.
+    __monitor          = std::__libcpp_atomic_monitor(__a_);
+    auto __current_val = __cxx_atomic_load(__a_, __order_);
+    return __poll_(__current_val);
+  }
+
   _LIBCPP_AVAILABILITY_SYNC
   _LIBCPP_HIDE_FROM_ABI bool operator()(chrono::nanoseconds __elapsed) const {
     if (__elapsed > chrono::microseconds(64)) {
-      auto const __monitor = std::__libcpp_atomic_monitor(__a);
-      if (__test_fn())
+      __cxx_contention_t __monitor;
+      if (__poll_or_get_monitor(__a_, __monitor))
         return true;
-      std::__libcpp_atomic_wait(__a, __monitor);
+      std::__libcpp_atomic_wait(__a_, __monitor);
     } else if (__elapsed > chrono::microseconds(4))
       __libcpp_thread_yield();
     else {
@@ -62,10 +100,20 @@ struct __libcpp_atomic_wait_backoff_impl {
   }
 };
 
-template <class _Atp, class _Fn>
-_LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI bool __cxx_atomic_wait(_Atp* __a, _Fn&& __test_fn) {
-  __libcpp_atomic_wait_backoff_impl<_Atp, __decay_t<_Fn> > __backoff_fn = {__a, __test_fn};
-  return std::__libcpp_thread_poll_with_backoff(__test_fn, __backoff_fn);
+// The semantics of this function are similar to `atomic`'s
+// `.wait(T old, std::memory_order order)`, but instead of having a hardcoded
+// predicate (is the loaded value unequal to `old`?), the predicate function is
+// specified as an argument. The loaded value is given as an in-out argument to
+// the predicate. If the predicate function returns `true`,
+// `_cxx_atomic_wait_fn` will return. If the predicate function returns
+// `false`, it must set the argument to its current understanding of the atomic
+// value. The predicate function must not return `false` spuriously.
+template <class _Atp, class _Poll>
+_LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI void
+__cxx_atomic_wait_fn(_Atp* __a, _Poll&& __poll, memory_order __order) {
+  __libcpp_atomic_wait_poll_impl<_Atp, __decay_t<_Poll> > __poll_fn       = {__a, __poll, __order};
+  __libcpp_atomic_wait_backoff_impl<_Atp, __decay_t<_Poll> > __backoff_fn = {__a, __poll, __order};
+  (void)std::__libcpp_thread_poll_with_backoff(__poll_fn, __backoff_fn);
 }
 
 #else // _LIBCPP_HAS_NO_THREADS
@@ -74,9 +122,10 @@ template <class _Tp>
 _LIBCPP_HIDE_FROM_ABI void __cxx_atomic_notify_all(__cxx_atomic_impl<_Tp> const volatile*) {}
 template <class _Tp>
 _LIBCPP_HIDE_FROM_ABI void __cxx_atomic_notify_one(__cxx_atomic_impl<_Tp> const volatile*) {}
-template <class _Atp, class _Fn>
-_LIBCPP_HIDE_FROM_ABI bool __cxx_atomic_wait(_Atp*, _Fn&& __test_fn) {
-  return std::__libcpp_thread_poll_with_backoff(__test_fn, __spinning_backoff_policy());
+template <class _Atp, class _Poll>
+_LIBCPP_HIDE_FROM_ABI void __cxx_atomic_wait_fn(_Atp*, _Poll&& __poll, memory_order __order) {
+  __libcpp_atomic_wait_poll_impl<_Atp, __decay_t<_Poll> > __poll_fn = {__a, __poll, __order};
+  (void)std::__libcpp_thread_poll_with_backoff(__poll_fn, __spinning_backoff_policy());
 }
 
 #endif // _LIBCPP_HAS_NO_THREADS
@@ -86,21 +135,19 @@ _LIBCPP_HIDE_FROM_ABI bool __cxx_nonatomic_compare_equal(_Tp const& __lhs, _Tp c
   return std::memcmp(std::addressof(__lhs), std::addressof(__rhs), sizeof(_Tp)) == 0;
 }
 
-template <class _Atp, class _Tp>
-struct __cxx_atomic_wait_test_fn_impl {
-  _Atp* __a;
+template <class _Tp>
+struct __cxx_atomic_wait_poll_fn_impl {
   _Tp __val;
-  memory_order __order;
-  _LIBCPP_HIDE_FROM_ABI bool operator()() const {
-    return !std::__cxx_nonatomic_compare_equal(std::__cxx_atomic_load(__a, __order), __val);
+  _LIBCPP_HIDE_FROM_ABI bool operator()(_Tp& __current_val) const {
+    return !std::__cxx_nonatomic_compare_equal(__current_val, __val);
   }
 };
 
 template <class _Atp, class _Tp>
-_LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI bool
+_LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI void
 __cxx_atomic_wait(_Atp* __a, _Tp const __val, memory_order __order) {
-  __cxx_atomic_wait_test_fn_impl<_Atp, _Tp> __test_fn = {__a, __val, __order};
-  return std::__cxx_atomic_wait(__a, __test_fn);
+  __cxx_atomic_wait_poll_fn_impl<_Tp> __poll_fn = {__val};
+  std::__cxx_atomic_wait_fn(__a, __poll_fn, __order);
 }
 
 _LIBCPP_END_NAMESPACE_STD
diff --git a/libcxx/include/latch b/libcxx/include/latch
index ad7b35579913b..45c1b5c91e6cd 100644
--- a/libcxx/include/latch
+++ b/libcxx/include/latch
@@ -97,9 +97,13 @@ public:
     if (__old == __update)
       __a_.notify_all();
   }
-  inline _LIBCPP_HIDE_FROM_ABI bool try_wait() const noexcept { return 0 == __a_.load(memory_order_acquire); }
+  inline _LIBCPP_HIDE_FROM_ABI bool try_wait() const noexcept {
+    auto __value = __a_.load(memory_order_acquire);
+    return try_wait_impl(__value);
+  }
   inline _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI void wait() const {
-    __cxx_atomic_wait(&__a_.__a_, [this]() -> bool { return try_wait(); });
+    __cxx_atomic_wait_fn(
+        &__a_.__a_, [this](ptrdiff_t& __value) -> bool { return try_wait_impl(__value); }, memory_order_acquire);
   }
   inline _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI void arrive_and_wait(ptrdiff_t __update = 1) {
     _LIBCPP_ASSERT_ARGUMENT_WITHIN_DOMAIN(__update >= 0, "latch::arrive_and_wait called with a negative value");
@@ -108,6 +112,9 @@ public:
     count_down(__update);
     wait();
   }
+
+private:
+  inline _LIBCPP_HIDE_FROM_ABI bool try_wait_impl(ptrdiff_t& __value) const noexcept { return __value == 0; }
 };
 
 _LIBCPP_END_NAMESPACE_STD
diff --git a/libcxx/include/semaphore b/libcxx/include/semaphore
index 649705f45b049..27e5951e7d982 100644
--- a/libcxx/include/semaphore
+++ b/libcxx/include/semaphore
@@ -54,6 +54,7 @@ using binary_semaphore = counting_semaphore<1>;
 #include <__assert> // all public C++ headers provide the assertion handler
 #include <__atomic/atomic_base.h>
 #include <__atomic/atomic_sync.h>
+#include <__atomic/contention_t.h>
 #include <__atomic/memory_order.h>
 #include <__availability>
 #include <__chrono/time_point.h>
@@ -94,35 +95,33 @@ public:
     auto __old = __a_.fetch_add(__update, memory_order_release);
     _LIBCPP_ASSERT_ARGUMENT_WITHIN_DOMAIN(
         __update <= _LIBCPP_SEMAPHORE_MAX - __old, "update is greater than the expected value");
-
-    if (__old > 0) {
-      // Nothing to do
-    } else if (__update > 1)
+    if (__old == 0) {
       __a_.notify_all();
-    else
-      __a_.notify_one();
+    }
   }
   _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI void acquire() {
-    auto const __test_fn = [this]() -> bool {
-      auto __old = __a_.load(memory_order_relaxed);
-      return (__old != 0) && __a_.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
-    };
-    __cxx_atomic_wait(&__a_.__a_, __test_fn);
+    __cxx_atomic_wait_fn(
+        &__a_.__a_, [this](ptrdiff_t& __old) { return __try_acquire_impl(__old); }, memory_order_relaxed);
   }
   template <class _Rep, class _Period>
   _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI bool
   try_acquire_for(chrono::duration<_Rep, _Period> const& __rel_time) {
     if (__rel_time == chrono::duration<_Rep, _Period>::zero())
       return try_acquire();
-    auto const __test_fn = [this]() { return try_acquire(); };
-    return std::__libcpp_thread_poll_with_backoff(__test_fn, __libcpp_timed_backoff_policy(), __rel_time);
+    auto const __poll_fn = [this]() { return try_acquire(); };
+    return std::__libcpp_thread_poll_with_backoff(__poll_fn, __libcpp_timed_backoff_policy(), __rel_time);
   }
   _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI bool try_acquire() {
-    auto __old = __a_.load(memory_order_acquire);
+    auto __old = __a_.load(memory_order_relaxed);
+    return __try_acquire_impl(__old);
+  }
+
+private:
+  _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI bool __try_acquire_impl(ptrdiff_t& __old) {
     while (true) {
       if (__old == 0)
         return false;
-      if (__a_.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed))
+      if (__a_.compare_exchange_weak(__old, __old - 1, memory_order_acquire, memory_order_relaxed))
         return true;
     }
   }
diff --git a/libcxx/src/atomic.cpp b/libcxx/src/atomic.cpp
index 2f0389ae6974a..2b67685c8a0a1 100644
--- a/libcxx/src/atomic.cpp
+++ b/libcxx/src/atomic.cpp
@@ -178,6 +178,7 @@ _LIBCPP_EXPORTED_FROM_ABI void __cxx_atomic_notify_one(__cxx_atomic_contention_t
 _LIBCPP_EXPORTED_FROM_ABI void __cxx_atomic_notify_all(__cxx_atomic_contention_t const volatile* __location) {
   __libcpp_contention_notify(&__libcpp_contention_state(__location)->__contention_state, __location, false);
 }
+// This function is never used, but still exported for ABI compatibility.
 _LIBCPP_EXPORTED_FROM_ABI __cxx_contention_t
 __libcpp_atomic_monitor(__cxx_atomic_contention_t const volatile* __location) {
   return __libcpp_contention_monitor_for_wait(&__libcpp_contention_state(__location)->__contention_state, __location);
diff --git a/libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp b/libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
new file mode 100644
index 0000000000000..dca3c01cf0103
--- /dev/null
+++ b/libcxx/test/std/thread/thread.semaphore/lost_wakeup.pass.cpp
@@ -0,0 +1,64 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: no-threads
+// UNSUPPORTED: c++03, c++11, c++14, c++17
+
+// XFAIL: availability-synchronization_library-missing
+
+// This is a regression test for https://llvm.org/PR47013.
+
+// <semaphore>
+
+#include <barrier>
+#include <semaphore>
+#include <thread>
+#include <vector>
+
+#include "make_test_thread.h"
+
+static std::counting_semaphore<> s(0);
+static std::barrier<> b(8 + 1);
+
+void acquire() {
+  for (int i = 0; i < 10'000; ++i) {
+    s.acquire();
+    b.arrive_and_wait();
+  }
+}
+
+void release() {
+  for (int i = 0; i < 10'000; ++i) {
+    s.release(1);
+    s.release(1);
+    s.release(1);
+    s.release(1);
+
+    s.release(1);
+    s.release(1);
+    s.release(1);
+    s.release(1);
+
+    b.arrive_and_wait();
+  }
+}
+
+int main(int, char**) {
+  for (int run = 0; run < 20; ++run) {
+    std::vector<std::thread> threads;
+    for (int i = 0; i < 8; ++i)
+      threads.push_back(support::make_test_thread(acquire));
+
+    threads.push_back(support::make_test_thread(release));
+
+    for (auto& thread : threads)
+      thread.join();
+  }
+
+  return 0;
+}

``````````

</details>


https://github.com/llvm/llvm-project/pull/80596