[Lldb-commits] [lldb] df4ad36 - [lldb/linux] Fix a race in handling of simultaneous thread exits

Pavel Labath via lldb-commits lldb-commits at lists.llvm.org
Wed Jan 5 04:21:47 PST 2022


Author: Pavel Labath
Date: 2022-01-05T13:21:35+01:00
New Revision: df4ad3625fad54d91b688564d5b2408aa943ebe4

URL: https://github.com/llvm/llvm-project/commit/df4ad3625fad54d91b688564d5b2408aa943ebe4
DIFF: https://github.com/llvm/llvm-project/commit/df4ad3625fad54d91b688564d5b2408aa943ebe4.diff

LOG: [lldb/linux] Fix a race in handling of simultaneous thread exits

D116372, while fixing one kind of a race, ended up creating a new one.
The new issue could occur when one inferior thread exits while another
thread initiates termination of the entire process (exit_group(2)).

With some bad luck, we could start processing the exit notification
(PTRACE_EVENT_EXIT) only to have the become unresponsive (ESRCH) in the
middle of the MonitorCallback function. This function would then delete
the thread from our list even though it wasn't completely dead (it stays
zombified until we read the WIFEXITED event). The linux kernel will not
deliver the exited event for the entire process until we process
individual thread exits.

In a pre-D116372 world, this wouldn't be a problem because we would read
this event (even though we would not know what to do with it) with
waitpid(-1). Now, when we issue invididual waitpids, this event will
never be picked up, and we end up hanging.

The fix for this is actually quite simple -- don't delete the thread in
this situation. The thread will be deleted when the WIFEXITED event
comes.

This situation was kind of already tested by
TestCreateDuringInstructionStep (which is how I found this problem), but
it was mostly accidental, so I am also creating a dedicated test which
reproduces this situation.

Added: 
    lldb/test/API/functionalities/thread/concurrent_events/exit/Makefile
    lldb/test/API/functionalities/thread/concurrent_events/exit/TestConcurrentThreadExit.py
    lldb/test/API/functionalities/thread/concurrent_events/exit/main.cpp

Modified: 
    lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp

Removed: 
    


################################################################################
diff  --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index 4a77e791343ca..02d2b33c4273c 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -485,34 +485,16 @@ void NativeProcessLinux::MonitorCallback(NativeThreadLinux &thread,
     } else {
       // ptrace(GETSIGINFO) failed (but not due to group-stop).
 
-      // A return value of ESRCH means the thread/process is no longer on the
-      // system, so it was killed somehow outside of our control.  Either way,
-      // we can't do anything with it anymore.
-
-      // Stop tracking the metadata for the thread since it's entirely off the
-      // system now.
-      StopTrackingThread(thread);
+      // A return value of ESRCH means the thread/process has died in the mean
+      // time. This can (e.g.) happen when another thread does an exit_group(2)
+      // or the entire process get SIGKILLed.
+      // We can't do anything with this thread anymore, but we keep it around
+      // until we get the WIFEXITED event.
 
       LLDB_LOG(log,
-               "GetSignalInfo failed: {0}, tid = {1}, status = {2}, "
-               "status = {3}, main_thread = {4}",
-               info_err, thread.GetID(), status, status, is_main_thread);
-
-      if (is_main_thread) {
-        // Notify the delegate - our process is not available but appears to
-        // have been killed outside our control.  Is eStateExited the right
-        // exit state in this case?
-        SetExitStatus(status, true);
-        SetState(StateType::eStateExited, true);
-      } else {
-        // This thread was pulled out from underneath us.  Anything to do here?
-        // Do we want to do an all stop?
-        LLDB_LOG(log,
-                 "pid {0} tid {1} non-main thread exit occurred, didn't "
-                 "tell delegate anything since thread disappeared out "
-                 "from underneath us",
-                 GetID(), thread.GetID());
-      }
+               "GetSignalInfo({0}) failed: {1}, status = {2}, main_thread = "
+               "{3}. Expecting WIFEXITED soon.",
+               thread.GetID(), info_err, status, is_main_thread);
     }
   }
 }

diff  --git a/lldb/test/API/functionalities/thread/concurrent_events/exit/Makefile b/lldb/test/API/functionalities/thread/concurrent_events/exit/Makefile
new file mode 100644
index 0000000000000..ebecfbf92416a
--- /dev/null
+++ b/lldb/test/API/functionalities/thread/concurrent_events/exit/Makefile
@@ -0,0 +1,3 @@
+ENABLE_THREADS := YES
+CXX_SOURCES := main.cpp
+include Makefile.rules

diff  --git a/lldb/test/API/functionalities/thread/concurrent_events/exit/TestConcurrentThreadExit.py b/lldb/test/API/functionalities/thread/concurrent_events/exit/TestConcurrentThreadExit.py
new file mode 100644
index 0000000000000..63ff28026e4c2
--- /dev/null
+++ b/lldb/test/API/functionalities/thread/concurrent_events/exit/TestConcurrentThreadExit.py
@@ -0,0 +1,22 @@
+"""
+This test verifies the correct handling of the situation when a thread exits
+while another thread triggers the termination (exit) of the entire process.
+"""
+
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+import lldbsuite.test.lldbutil as lldbutil
+
+
+class ConcurrentThreadExitTestCase(TestBase):
+
+    mydir = TestBase.compute_mydir(__file__)
+    NO_DEBUG_INFO_TESTCASE = True
+
+    @skipIf(oslist=no_match(["linux"]))
+    def test(self):
+        self.build()
+        exe = self.getBuildArtifact("a.out")
+        self.runCmd("file " + exe, CURRENT_EXECUTABLE_SET)
+        self.expect("run", substrs=["exited with status = 47"])

diff  --git a/lldb/test/API/functionalities/thread/concurrent_events/exit/main.cpp b/lldb/test/API/functionalities/thread/concurrent_events/exit/main.cpp
new file mode 100644
index 0000000000000..9a51b21dff635
--- /dev/null
+++ b/lldb/test/API/functionalities/thread/concurrent_events/exit/main.cpp
@@ -0,0 +1,30 @@
+#include "pseudo_barrier.h"
+#include <cstdlib>
+#include <thread>
+
+// Use low-level exit functions to achieve predictable timing.
+#if defined(__linux__)
+#include <sys/syscall.h>
+#include <unistd.h>
+
+void exit_thread(int status) { syscall(SYS_exit, status); }
+void exit_process(int status) { syscall(SYS_exit_group, status); }
+#else
+#error Unimplemented
+#endif
+
+pseudo_barrier_t g_barrier;
+
+void thread_func() {
+  pseudo_barrier_wait(g_barrier);
+  exit_thread(42);
+}
+
+int main() {
+  pseudo_barrier_init(g_barrier, 2);
+  std::thread(thread_func).detach();
+
+  pseudo_barrier_wait(g_barrier);
+
+  exit_process(47);
+}


        


More information about the lldb-commits mailing list