[libc-commits] [libc] [libc][rwlock] fix timeout writer signal stealing problem (PR #201937)

Schrodinger ZHU Yifan via libc-commits libc-commits at lists.llvm.org
Tue Jun 9 10:55:14 PDT 2026


================
@@ -405,22 +405,55 @@ class RawRwLock {
       }
 
       // Phase 7: unregister ourselves as a pending reader/writer.
+      bool writer_serial_changed = false;
       {
         // Similarly, the unregister operation should also be an atomic
         // transaction.
         WaitingQueue::Guard guard = queue.acquire(is_pshared);
         guard.pending_count<role>()--;
-        // Clear the flag if we are the last reader. The flag must be
+        // Clear the flag if we are the last one. The flag must be
         // cleared otherwise operations like trylock may fail even though
         // there is no competitors.
         if (guard.pending_count<role>() == 0)
           RwState::fetch_clear_pending_bit<role>(state,
                                                  cpp::MemoryOrder::RELAXED);
+        if constexpr (role == Role::Writer) {
+          int new_serial =
+              guard.serialization<role>().load(cpp::MemoryOrder::RELAXED);
+          writer_serial_changed = new_serial != serial_number;
+        }
       }
 
       // Phase 8: exit the loop is timeout is reached.
-      if (timeout_flag)
+      if (timeout_flag) {
+        // When a timeout triggers, the waiting thread wakes up, unregisters
+        // itself from the waiting queue, and exits. However, if the timing-out
+        // thread is preempted after waking up but before it can unregister,
+        // and a concurrent unlock occurs during this window, the timing-out
+        // thread may consume the wake-up signal.
+        //
+        // For example, assume the lock is in writer-preference mode and a
+        // writer (W0) holds the lock. A reader (R) and another writer (W1,
+        // with a short timeout) arrive and join the queue. W1's timeout
+        // expires, so it wakes up and attempts to acquire the queue lock,
+        // but is preempted before succeeding. W0 then releases the lock and,
+        // preferring writers, sends a wake-up signal to W1. When W1 resumes,
+        // it acquires the queue lock, unregisters, and exits due to the
+        // timeout, ignoring the wake-up signal. As a result, the reader (R)
+        // is left waiting indefinitely, leading to a deadlock.
+        //
+        // To fix this, we track whether the serialization number changed
+        // specifically for writers, and propagate the wake signal if it did.
+        // If the timing-out thread is a reader, signal consumption is safe
+        // because:
+        // 1. If there are pending writers, they will be woken up first.
+        // 2. Otherwise, if there are pending readers, they are all woken up
+        //    via broadcasting (notify_all), so one reader timing out does not
+        //    steal others' signal.
+        if (writer_serial_changed)
+          notify_pending_threads();
         return LockResult::TimedOut;
----------------
SchrodingerZhu wrote:

this race is exactly captured by existing integration tests.

https://github.com/llvm/llvm-project/pull/201937


More information about the libc-commits mailing list