[libcxx-commits] [libcxx] [libc++] fix `counting_semaphore` lost wakeups (PR #79265)

Thu Feb 1 14:17:34 PST 2024

================
@@ -94,20 +95,25 @@ public:
     auto __old = __a_.fetch_add(__update, memory_order_release);
     _LIBCPP_ASSERT_ARGUMENT_WITHIN_DOMAIN(
         __update <= _LIBCPP_SEMAPHORE_MAX - __old, "update is greater than the expected value");
-
-    if (__old > 0) {
-      // Nothing to do
-    } else if (__update > 1)
+    (void)__old;
+    if (__update > 1) {
       __a_.notify_all();
-    else
+    } else {
       __a_.notify_one();
+    }
   }
   _LIBCPP_AVAILABILITY_SYNC _LIBCPP_HIDE_FROM_ABI void acquire() {
     auto const __test_fn = [this]() -> bool {
       auto __old = __a_.load(memory_order_relaxed);
       return (__old != 0) && __a_.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
     };
-    __cxx_atomic_wait(&__a_.__a_, __test_fn);
+    auto const __test_with_old = [this](__cxx_contention_t& __monitor) -> bool {
----------------
huixie90 wrote:

I am still not convinced. In my opinion, it is simply a choice of on which abstraction we should build our `counting_semaphore` and `latch` on. I think your suggestion is that we should build these things on top of the internal `__cxx_atomic_wait`, `contention_state`, `__libcpp_thread_poll_with_backoff`. In my opinion, we should simply build it on top of `atomic`.

If you look at the current form of the PR, it only happens to work because:
- `counting_semaphore` uses `__atomic_base<ptrdiff_t>` and it contains `__cxx_atomic_impl<ptrdiff_t>`
- `__cxx_atomic_contention_t`  happens to be  `__cxx_atomic_impl<__cxx_contention_t>`
- `__cxx_contention_t` happens to be `int64_t`
- `ptrdiff_t` happens to be `int64_t`
- thus  `__atomic_base<ptrdiff_t>` happens to contain a `__cxx_atomic_contention_t`
- thus  `auto __monitor = std::__libcpp_atomic_monitor(__a_);` happens to call `__libcpp_atomic_monitor` the overload that takes `__cxx_atomic_contention_t` which uses its input (the `atomic` variable inside `semaphore` ) directly. If this type does not match, the fallback overload would use the global contention table and our write to the `monitor` variable would not update the `atomic` variable at all. (https://github.com/llvm/llvm-project/blob/main/libcxx/src/atomic.cpp#L163C46-L163C69)

I think, we should avoid messing around with these internal APIs, especially not to mess around the backing off functions that are dealing with contention state.  Instead, we could just use `atomic_base::{wait, notify_all, compare_exchange_weak, fetch_add, fetch_sub, __whatever_else_that_is_needed}`, so that our abstraction is built on top of something that is working and clear about its behaviour. If it is not working, then we know that we have a bug inside `atomic` and we would just fix the bug there. Instead of now, we are thinking does `A`, `B`, `C` that uses these internal functions have the same problem and we need to fix `A`, `B` and `C`.

then `counting_semaphore` and `latch` could be something like
```
struct semaphore{
   atomic_base<int> a_;

    void acquire() {
        auto old = a_.load(std::memory_order_relaxed);
        while(true) {
            if (old == 0) {
                a_.wait(old, std::memory_order_relaxed);
                old = a_.load(std::memory_order_relaxed);
            } else if (a_.compare_exchange_weak(old , old - 1, std::memory_order_acquire, std::memory_order_relaxed)){
                break;
            }
        }
    }

    void release(int update = 1) {
        auto old = a_.fetch_add(update, std::memory_order_release);
        if (old == 0) {
            a_.notify_all();
        }
    }
};


struct latch {
   atomic_base<int> a_;

    void count_down(int c = 1) {
        auto old = a_.fetch_sub(c, std::memory_order_release);
        if (old == 0) {
            a_.notify_all();
        }
    }

    void wait() {
        a_.wait(0, std::memory_order_acquire);
    }
};
```
And we don't care what pattern it is using, it is an implementation detail of `atomic_base::wait`. yes maybe the interface of `atomic` is insufficient to implement all the functions but we could add them as we need. I think working on higher level abstractions is making the code easier to maintain. 

https://github.com/llvm/llvm-project/pull/79265