[libcxx-commits] [libcxx] [libc++] Replace mutex+condvar with atomics in __call_once (PR #192433)

Shonie Caplan via libcxx-commits libcxx-commits at lists.llvm.org
Thu Apr 16 18:18:34 PDT 2026


================
@@ -42,27 +45,38 @@ void __call_once(volatile once_flag::_State_type& flag, void* arg, void (*func)(
 
 #else // !_LIBCPP_HAS_THREADS
 
-  __libcpp_mutex_lock(&mut);
-  while (flag == once_flag::_Pending)
-    __libcpp_condvar_wait(&cv, &mut);
-  if (flag == once_flag::_Unset) {
-    auto guard = std::__make_exception_guard([&flag] {
-      __libcpp_mutex_lock(&mut);
-      __libcpp_relaxed_store(&flag, once_flag::_Unset);
-      __libcpp_mutex_unlock(&mut);
-      __libcpp_condvar_broadcast(&cv);
-    });
-
-    __libcpp_relaxed_store(&flag, once_flag::_Pending);
-    __libcpp_mutex_unlock(&mut);
-    func(arg);
-    __libcpp_mutex_lock(&mut);
-    __libcpp_atomic_store(&flag, once_flag::_Complete, _AO_Release);
-    __libcpp_mutex_unlock(&mut);
-    __libcpp_condvar_broadcast(&cv);
-    guard.__complete();
-  } else {
-    __libcpp_mutex_unlock(&mut);
+  auto flag_read = __atomic_load_n(&flag, __ATOMIC_ACQUIRE);
+
+WAIT:
+  while (flag_read == once_flag::_Pending) {
----------------
shoniecaplan wrote:

Unfortunately, no. I can explain a bit more:

I made the first atomic operation (`auto flag_read = __atomic_load_n(&flag, __ATOMIC_ACQUIRE);`) a load, instead of using `__atomic_compare_exchange_n()`, to avoid the write in most cases. I use the compare-exchange (instead of store) for the corner-case where there's: >1 thread with the same flag, calling `call_once`, for the first time, and at the same time. In this case, they will (all) load `_Unset` into `flag_read`. The compare exchange is needed to ensure only one of these simultaneous callers will actually call the function. The goto is only there to toss the losing thread(s) back to the while-loop.

>From here, this corner case handles the same as the more common case with: 1st thread updated the flag to `_Pending`, then it ran `func(arg)`, during this time, another thread (or more) (with the same flag) calls `call_once`, and sees `_Pending`, so it falls into the while-loop, where it will wait until the 1st thread returns from the function call successfully (waiting thread(s) just return), or the 1st thread throws, resetting the flag to `_Unset`, in which case the waiting thread(s) have a chance to call the function. This is the same behavior as the original function.

Summarizing:
If you are wondering why my update requires the goto and the original didn't, it is because I wanted to use a load, instead of compare-exchange in the majority of cases, so the serialization is much less common but has to occur within an extra level of nesting. Originally, trailing callers saw `_Pending` when it was their turn with the mutex, so they immediately jumped to the while-loop. Since there isn't a funnel on the mutex anymore, we have to "reset" the losing threads to the while-loop. If anyone finds a nice simplification that removes the goto, I am happy to use it, I have no desire to use goto.

https://github.com/llvm/llvm-project/pull/192433


More information about the libcxx-commits mailing list