[libcxx-commits] [libcxx] [libc++] fix `counting_semaphore` lost wakeups (PR #79265)

Wed Jan 31 08:43:50 PST 2024

================
@@ -94,20 +95,25 @@ public:
     auto __old = __a_.fetch_add(__update, memory_order_release);
     _LIBCPP_ASSERT_ARGUMENT_WITHIN_DOMAIN(
         __update <= _LIBCPP_SEMAPHORE_MAX - __old, "update is greater than the expected value");
-
-    if (__old > 0) {
-      // Nothing to do
-    } else if (__update > 1)
+    (void)__old;
+    if (__update > 1) {
       __a_.notify_all();
-    else
+    } else {
       __a_.notify_one();
+    }
   }
   _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI void acquire() {
     auto const __test_fn = [this]() -> bool {
       auto __old = __a_.load(memory_order_relaxed);
       return (__old != 0) && __a_.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
     };
-    __cxx_atomic_wait(&__a_.__a_, __test_fn);
+    auto const __test_with_old = [this](__cxx_contention_t& __monitor) -> bool {
----------------
jiixyj wrote:

It's a bit unfortunate that with this design, you need two functions (`__test_fn` and `__test_with_old`). This is unneeded duplication and could lead to confusion about which function is used. I think it would be better to _always_ give to `__test_fn` the current value of the atomic. `__cxx_atomic_wait` then would take another `memory_order` parameter which would be used when the atomic is read inside the `__cxx_atomic_wait` machinery. It would look a bit like this:

```c++
-        auto const __test_fn = [this]() -> bool {
-            auto __old = __a.load(memory_order_relaxed);
-            return (__old != 0) && __a.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
+        auto const __test_fn = [this](ptrdiff_t& __old) -> bool {
+            while (true) {
+                if (__old == 0)
+                    return false;
+                if (__a.compare_exchange_weak(__old, __old - 1, memory_order_acquire, memory_order_relaxed))
+                    return true;
+            }
         };
-        __cxx_atomic_wait(&__a.__a_, __test_fn);
+        __cxx_atomic_wait_fn(&__a.__a_, __test_fn, memory_order_relaxed);
```

...so `__test_fn` always takes the `ptrdiff_t` (current atomic value) as in-out parameter, and the `memory_order_relaxed` parameter previously used to read `__old` is given to `__cxx_atomic_wait`. The advantage is that if the code is refactored like this, it would immediately become apparent that `latch` suffers from the same race, which could be fixed like this:

```c++
     inline _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY
     void wait() const
     {
-        auto const __test_fn = [=]() -> bool {
-            return try_wait();
+        auto const __test_fn = [=](ptrdiff_t& __val) -> bool {
+            return __val == 0;
         };
-        __cxx_atomic_wait(&__a.__a_, __test_fn);
+        __cxx_atomic_wait_fn(&__a.__a_, __test_fn, memory_order_acquire);
     }
```

I played around a bit with this approach back in the day if you are interested: https://github.com/jiixyj/llvm-project/commit/bc11102a7707b8b8f19307984b732ae1f6933a6a

https://github.com/llvm/llvm-project/pull/79265