[Lldb-commits] [lldb] 88558d5 - Avoid stalls when MainLoop::Interrupt fails to wake up the MainLoop (#164905)
via lldb-commits
lldb-commits at lists.llvm.org
Mon Oct 27 11:38:14 PDT 2025
Author: jimingham
Date: 2025-10-27T11:38:10-07:00
New Revision: 88558d52c71081d5c6c372f87fb454a89747c5dd
URL: https://github.com/llvm/llvm-project/commit/88558d52c71081d5c6c372f87fb454a89747c5dd
DIFF: https://github.com/llvm/llvm-project/commit/88558d52c71081d5c6c372f87fb454a89747c5dd.diff
LOG: Avoid stalls when MainLoop::Interrupt fails to wake up the MainLoop (#164905)
Turns out there's a bug in the current lldb sources that if you fork,
set the stdio file handles to close on exec and then exec lldb with some
commands and the `--batch` flag, lldb will stall on exit. The first
cause of the bug is that the Python session handler - and probably other
places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles,
and open and close and dup them as needed. NB: I am NOT trying to fix
that bug. I'm not convinced running the lldb driver headless is worth a
lot of effort, it's just as easy to redirect them to /dev/null, which
does work.
But I would like to keep lldb from stalling on the way out when this
happens. The reason we stall is that we have a MainLoop waiting for
signals, and we try to Interrupt it, but because stdio was closed, the
interrupt pipe for the MainLoop gets the file descriptor 0, which gets
closed by the Python session handler if you run some script command. So
the Interrupt fails.
We were running the Write to the interrupt pipe wrapped in
`llvm::cantFail`, but in a no asserts build that just drops the error on
the floor. So then lldb went on to call std::thread::join on the still
active MainLoop, and that stalls
I made Interrupt (and AddCallback & AddPendingCallback) return a bool
for "interrupt success" instead. All the places where code was
requesting termination, I added checks for that failure, and skip the
std::thread::join call on the MainLoop thread, since that is almost
certainly going to stall at this point.
I didn't do the same for the Windows MainLoop, as I don't know if/when
the WSASetEvent call can fail, so I always return true here. I also
didn't turn the test off for Windows. According to the Python docs all
the API's I used should work on Windows... If that turns out not to be
true I'll make the test Darwin/Unix only.
Added:
lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py
Modified:
lldb/include/lldb/Host/MainLoopBase.h
lldb/include/lldb/Host/posix/MainLoopPosix.h
lldb/include/lldb/Host/windows/MainLoopWindows.h
lldb/source/Host/common/MainLoopBase.cpp
lldb/source/Host/posix/MainLoopPosix.cpp
lldb/source/Host/windows/MainLoopWindows.cpp
lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
lldb/tools/driver/Driver.cpp
lldb/tools/lldb-dap/DAP.cpp
lldb/unittests/DAP/TestBase.cpp
lldb/unittests/Host/JSONTransportTest.cpp
lldb/unittests/Host/MainLoopTest.cpp
lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
Removed:
################################################################################
diff --git a/lldb/include/lldb/Host/MainLoopBase.h b/lldb/include/lldb/Host/MainLoopBase.h
index be9a2676e7443..9529f2c214784 100644
--- a/lldb/include/lldb/Host/MainLoopBase.h
+++ b/lldb/include/lldb/Host/MainLoopBase.h
@@ -57,18 +57,23 @@ class MainLoopBase {
// Add a pending callback that will be executed once after all the pending
// events are processed. The callback will be executed even if termination
// was requested.
- void AddPendingCallback(const Callback &callback) {
- AddCallback(callback, std::chrono::steady_clock::time_point());
+ // Returns false if an interrupt was needed to get the loop to act on the new
+ // callback, but the interrupt failed, true otherwise. Mostly used when the
+ // pending callback is a RequestTermination, since if the interrupt fails for
+ // that callback, waiting for the MainLoop thread to terminate could stall.
+ bool AddPendingCallback(const Callback &callback) {
+ return AddCallback(callback, std::chrono::steady_clock::time_point());
}
// Add a callback that will be executed after a certain amount of time has
- // passed.
- void AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
- AddCallback(callback, std::chrono::steady_clock::now() + delay);
+ // passed. See AddPendingCallback comment for the return value.
+ bool AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
+ return AddCallback(callback, std::chrono::steady_clock::now() + delay);
}
// Add a callback that will be executed after a given point in time.
- void AddCallback(const Callback &callback, TimePoint point);
+ // See AddPendingCallback comment for the return value.
+ bool AddCallback(const Callback &callback, TimePoint point);
// Waits for registered events and invoke the proper callbacks. Returns when
// all callbacks deregister themselves or when someone requests termination.
@@ -85,8 +90,9 @@ class MainLoopBase {
virtual void UnregisterReadObject(IOObject::WaitableHandle handle) = 0;
- // Interrupt the loop that is currently waiting for events.
- virtual void Interrupt() = 0;
+ /// Interrupt the loop that is currently waiting for events. Return true if
+ /// the interrupt succeeded, false if it failed.
+ virtual bool Interrupt() = 0;
void ProcessCallbacks();
diff --git a/lldb/include/lldb/Host/posix/MainLoopPosix.h b/lldb/include/lldb/Host/posix/MainLoopPosix.h
index e9ac798b948df..92cdbe9d87ec3 100644
--- a/lldb/include/lldb/Host/posix/MainLoopPosix.h
+++ b/lldb/include/lldb/Host/posix/MainLoopPosix.h
@@ -54,7 +54,7 @@ class MainLoopPosix : public MainLoopBase {
void UnregisterReadObject(IOObject::WaitableHandle handle) override;
void UnregisterSignal(int signo, std::list<Callback>::iterator callback_it);
- void Interrupt() override;
+ bool Interrupt() override;
private:
void ProcessReadObject(IOObject::WaitableHandle handle);
diff --git a/lldb/include/lldb/Host/windows/MainLoopWindows.h b/lldb/include/lldb/Host/windows/MainLoopWindows.h
index 705e7e78ba48a..65b44aa1582c3 100644
--- a/lldb/include/lldb/Host/windows/MainLoopWindows.h
+++ b/lldb/include/lldb/Host/windows/MainLoopWindows.h
@@ -50,7 +50,7 @@ class MainLoopWindows : public MainLoopBase {
protected:
void UnregisterReadObject(IOObject::WaitableHandle handle) override;
- void Interrupt() override;
+ bool Interrupt() override;
private:
llvm::Expected<size_t> Poll();
diff --git a/lldb/source/Host/common/MainLoopBase.cpp b/lldb/source/Host/common/MainLoopBase.cpp
index 64a57e65849e9..232b9bc0aa354 100644
--- a/lldb/source/Host/common/MainLoopBase.cpp
+++ b/lldb/source/Host/common/MainLoopBase.cpp
@@ -12,8 +12,9 @@
using namespace lldb;
using namespace lldb_private;
-void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
+bool MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
bool interrupt_needed;
+ bool interrupt_succeeded = true;
{
std::lock_guard<std::mutex> lock{m_callback_mutex};
// We need to interrupt the main thread if this callback is scheduled to
@@ -22,7 +23,8 @@ void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
m_callbacks.emplace(point, callback);
}
if (interrupt_needed)
- Interrupt();
+ interrupt_succeeded = Interrupt();
+ return interrupt_succeeded;
}
void MainLoopBase::ProcessCallbacks() {
diff --git a/lldb/source/Host/posix/MainLoopPosix.cpp b/lldb/source/Host/posix/MainLoopPosix.cpp
index 19a7128fbe407..c6fe7814bd22e 100644
--- a/lldb/source/Host/posix/MainLoopPosix.cpp
+++ b/lldb/source/Host/posix/MainLoopPosix.cpp
@@ -387,10 +387,11 @@ void MainLoopPosix::ProcessSignal(int signo) {
}
}
-void MainLoopPosix::Interrupt() {
+bool MainLoopPosix::Interrupt() {
if (m_interrupting.exchange(true))
- return;
+ return true;
char c = '.';
- cantFail(m_interrupt_pipe.Write(&c, 1));
+ llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);
+ return result && *result != 0;
}
diff --git a/lldb/source/Host/windows/MainLoopWindows.cpp b/lldb/source/Host/windows/MainLoopWindows.cpp
index 9b7df10258bcd..5e5888aee2181 100644
--- a/lldb/source/Host/windows/MainLoopWindows.cpp
+++ b/lldb/source/Host/windows/MainLoopWindows.cpp
@@ -272,4 +272,6 @@ Status MainLoopWindows::Run() {
return Status();
}
-void MainLoopWindows::Interrupt() { WSASetEvent(m_interrupt_event); }
+bool MainLoopWindows::Interrupt() {
+ return WSASetEvent(m_interrupt_event);
+}
diff --git a/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp b/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
index 390cf3eeb16a5..77a3ba6574cde 100644
--- a/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
+++ b/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
@@ -133,11 +133,12 @@ llvm::Error ProtocolServerMCP::Stop() {
}
// Stop the main loop.
- m_loop.AddPendingCallback(
+ bool addition_succeeded = m_loop.AddPendingCallback(
[](lldb_private::MainLoopBase &loop) { loop.RequestTermination(); });
- // Wait for the main loop to exit.
- if (m_loop_thread.joinable())
+ // Wait for the main loop to exit, but not if we didn't succeed in inserting
+ // our pending callback or we'll wait forever.
+ if (addition_succeeded && m_loop_thread.joinable())
m_loop_thread.join();
m_accept_handles.clear();
diff --git a/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py b/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py
new file mode 100644
index 0000000000000..cff97b822db81
--- /dev/null
+++ b/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py
@@ -0,0 +1,51 @@
+"""
+Test that if you exec lldb with the stdio file handles
+closed, it is able to exit without hanging.
+"""
+
+
+import lldb
+import os
+import sys
+import socket
+import fcntl
+
+import lldbsuite.test.lldbutil as lldbutil
+from lldbsuite.test.lldbtest import *
+
+
+class TestDriverWithClosedSTDIO(TestBase):
+ # If your test case doesn't stress debug info, then
+ # set this to true. That way it won't be run once for
+ # each debug info format.
+ NO_DEBUG_INFO_TESTCASE = True
+
+ def test_run_lldb_and_wait(self):
+ """This test forks, closes the stdio channels and exec's lldb.
+ Then it waits for it to exit and asserts it did that successfully"""
+ pid = os.fork()
+ if pid == 0:
+ fcntl.fcntl(sys.stdin, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+ fcntl.fcntl(sys.stdout, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+ fcntl.fcntl(sys.stderr, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+ lldb = lldbtest_config.lldbExec
+ print(f"About to run: {lldb}")
+ os.execlp(
+ lldb,
+ lldb,
+ "-x",
+ "-o",
+ "script print(lldb.debugger.GetNumTargets())",
+ "--batch",
+ )
+ else:
+ if pid == -1:
+ print("Couldn't fork a process.")
+ return
+ ret_pid, status = os.waitpid(pid, 0)
+ # We're really just checking that lldb doesn't stall.
+ # At the time this test was written, if you close stdin
+ # in an asserts build, lldb aborts. So handle both
+ # of those cases. The failure will just be that the
+ # waitpid doesn't return, and the test times out.
+ self.assertFalse(os.WIFSTOPPED(status), "We either exited or crashed.")
diff --git a/lldb/tools/driver/Driver.cpp b/lldb/tools/driver/Driver.cpp
index ba0041111045b..733331f4ddac0 100644
--- a/lldb/tools/driver/Driver.cpp
+++ b/lldb/tools/driver/Driver.cpp
@@ -902,9 +902,10 @@ int main(int argc, char const *argv[]) {
}
#if !defined(_WIN32)
- signal_loop.AddPendingCallback(
- [](MainLoopBase &loop) { loop.RequestTermination(); });
- signal_thread.join();
+ // Try to interrupt the signal thread. If that succeeds, wait for it to exit.
+ if (signal_loop.AddPendingCallback(
+ [](MainLoopBase &loop) { loop.RequestTermination(); }))
+ signal_thread.join();
#endif
return exit_code;
diff --git a/lldb/tools/lldb-dap/DAP.cpp b/lldb/tools/lldb-dap/DAP.cpp
index 3c4f2253d1ad5..f009a902f79e7 100644
--- a/lldb/tools/lldb-dap/DAP.cpp
+++ b/lldb/tools/lldb-dap/DAP.cpp
@@ -1104,9 +1104,11 @@ llvm::Error DAP::Loop() {
"unhandled packet");
}
- m_loop.AddPendingCallback(
- [](MainLoopBase &loop) { loop.RequestTermination(); });
- thread.join();
+ // Don't wait to join the mainloop thread if our callback wasn't added
+ // successfully, or we'll wait forever.
+ if (m_loop.AddPendingCallback(
+ [](MainLoopBase &loop) { loop.RequestTermination(); }))
+ thread.join();
if (m_error_occurred)
return llvm::createStringError(llvm::inconvertibleErrorCode(),
diff --git a/lldb/unittests/DAP/TestBase.cpp b/lldb/unittests/DAP/TestBase.cpp
index 83a303554ad6b..8cb459964f7d8 100644
--- a/lldb/unittests/DAP/TestBase.cpp
+++ b/lldb/unittests/DAP/TestBase.cpp
@@ -55,8 +55,9 @@ void TransportBase::SetUp() {
}
void TransportBase::Run() {
- loop.AddPendingCallback(
+ bool addition_succeeded = loop.AddPendingCallback(
[](lldb_private::MainLoopBase &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
EXPECT_THAT_ERROR(loop.Run().takeError(), llvm::Succeeded());
}
diff --git a/lldb/unittests/Host/JSONTransportTest.cpp b/lldb/unittests/Host/JSONTransportTest.cpp
index 54f1372ca0fff..e90ab8e85a105 100644
--- a/lldb/unittests/Host/JSONTransportTest.cpp
+++ b/lldb/unittests/Host/JSONTransportTest.cpp
@@ -269,12 +269,13 @@ template <typename T> class JSONTransportTest : public PipePairTest {
loop.RequestTermination();
});
}
- loop.AddCallback(
+ bool addition_succeeded = loop.AddCallback(
[](MainLoopBase &loop) {
loop.RequestTermination();
FAIL() << "timeout";
},
timeout);
+ EXPECT_TRUE(addition_succeeded);
auto handle = transport->RegisterMessageHandler(loop, message_handler);
if (!handle)
return handle.takeError();
@@ -367,7 +368,9 @@ class TransportBinderTest : public testing::Test {
}
void Run() {
- loop.AddPendingCallback([](auto &loop) { loop.RequestTermination(); });
+ bool addition_succeeded =
+ loop.AddPendingCallback([](auto &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
EXPECT_THAT_ERROR(loop.Run().takeError(), Succeeded());
}
};
@@ -435,8 +438,9 @@ TEST_F(HTTPDelimitedJSONTransportTest, ReadPartialMessage) {
EXPECT_CALL(message_handler, Received(Request{5, "foo", std::nullopt}));
ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
- loop.AddPendingCallback(
+ bool addition_succeeded = loop.AddPendingCallback(
[](MainLoopBase &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
ASSERT_THAT_EXPECTED(input.Write(part2.data(), part2.size()), Succeeded());
input.CloseWriteFileDescriptor();
@@ -454,15 +458,17 @@ TEST_F(HTTPDelimitedJSONTransportTest, ReadWithZeroByteWrites) {
ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
// Run the main loop once for the initial read.
- loop.AddPendingCallback(
+ bool addition_succeeded = loop.AddPendingCallback(
[](MainLoopBase &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
// zero-byte write.
ASSERT_THAT_EXPECTED(input.Write(part1.data(), 0),
Succeeded()); // zero-byte write.
- loop.AddPendingCallback(
+ addition_succeeded = loop.AddPendingCallback(
[](MainLoopBase &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
// Write the remaining part of the message.
@@ -569,8 +575,9 @@ TEST_F(JSONRPCTransportTest, ReadPartialMessage) {
EXPECT_CALL(message_handler, Received(Request{42, "foo", std::nullopt}));
ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
- loop.AddPendingCallback(
+ bool addition_succeeded = loop.AddPendingCallback(
[](MainLoopBase &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
ASSERT_THAT_ERROR(Run(/*close_input=*/false), Succeeded());
ASSERT_THAT_EXPECTED(input.Write(part2.data(), part2.size()), Succeeded());
diff --git a/lldb/unittests/Host/MainLoopTest.cpp b/lldb/unittests/Host/MainLoopTest.cpp
index ae16d02101819..8a248100c936a 100644
--- a/lldb/unittests/Host/MainLoopTest.cpp
+++ b/lldb/unittests/Host/MainLoopTest.cpp
@@ -179,9 +179,13 @@ TEST_F(MainLoopTest, PipeDelayBetweenRegisterAndRun) {
ASSERT_THAT_EXPECTED(pipe.Write(&X, len), llvm::HasValue(1));
};
// Add a write that triggers a read events.
- loop.AddCallback(cb, std::chrono::milliseconds(500));
- loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
- std::chrono::milliseconds(1000));
+ bool addition_succeeded =
+ loop.AddCallback(cb, std::chrono::milliseconds(500));
+ ASSERT_TRUE(addition_succeeded);
+ addition_succeeded =
+ loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+ std::chrono::milliseconds(1000));
+ ASSERT_TRUE(addition_succeeded);
ASSERT_TRUE(error.Success());
ASSERT_TRUE(handle);
@@ -310,8 +314,10 @@ TEST_F(MainLoopTest, NoSpuriousSocketReads) {
error);
ASSERT_THAT_ERROR(error.ToError(), llvm::Succeeded());
// Terminate the loop after one second.
- loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
- std::chrono::seconds(1));
+ bool addition_succeeded =
+ loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+ std::chrono::seconds(1));
+ ASSERT_TRUE(addition_succeeded);
ASSERT_THAT_ERROR(loop.Run().ToError(), llvm::Succeeded());
// Make sure the callback was called only once.
@@ -388,10 +394,11 @@ TEST_F(MainLoopTest, PendingCallbackTrigger) {
MainLoop loop;
std::promise<void> add_callback2;
bool callback1_called = false;
- loop.AddPendingCallback([&](MainLoopBase &loop) {
+ bool addition_succeeded = loop.AddPendingCallback([&](MainLoopBase &loop) {
callback1_called = true;
add_callback2.set_value();
});
+ EXPECT_TRUE(addition_succeeded);
Status error;
ASSERT_THAT_ERROR(error.ToError(), llvm::Succeeded());
bool callback2_called = false;
@@ -416,9 +423,11 @@ TEST_F(MainLoopTest, ManyPendingCallbacks) {
// caused a deadlock when the pipe filled up (either because the main loop was
// not running, because it was slow, or because it was busy/blocked doing
// something else).
- for (int i = 0; i < 65536; ++i)
- loop.AddPendingCallback(
+ for (int i = 0; i < 65536; ++i) {
+ bool addition_succeeded = loop.AddPendingCallback(
[&](MainLoopBase &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
+ }
ASSERT_TRUE(loop.Run().Success());
}
@@ -444,8 +453,10 @@ TEST_F(MainLoopTest, TimedCallbacksRunInOrder) {
add_cb(2);
add_cb(4);
add_cb(1);
- loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
- start + 5 * epsilon);
+ bool addition_succeeded =
+ loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+ start + 5 * epsilon);
+ EXPECT_TRUE(addition_succeeded);
ASSERT_THAT_ERROR(loop.Run().takeError(), llvm::Succeeded());
EXPECT_GE(std::chrono::steady_clock::now() - start, 5 * epsilon);
ASSERT_THAT(order, testing::ElementsAre(1, 2, 3, 4));
@@ -455,22 +466,24 @@ TEST_F(MainLoopTest, TimedCallbackShortensSleep) {
MainLoop loop;
auto start = std::chrono::steady_clock::now();
bool long_callback_called = false;
- loop.AddCallback(
+ bool addition_succeeded = loop.AddCallback(
[&](MainLoopBase &loop) {
long_callback_called = true;
loop.RequestTermination();
},
std::chrono::seconds(30));
+ EXPECT_TRUE(addition_succeeded);
std::future<Status> async_run =
std::async(std::launch::async, &MainLoop::Run, std::ref(loop));
std::this_thread::sleep_for(std::chrono::milliseconds(100));
bool short_callback_called = false;
- loop.AddCallback(
+ addition_succeeded = loop.AddCallback(
[&](MainLoopBase &loop) {
short_callback_called = true;
loop.RequestTermination();
},
std::chrono::seconds(1));
+ EXPECT_TRUE(addition_succeeded);
ASSERT_THAT_ERROR(async_run.get().takeError(), llvm::Succeeded());
EXPECT_LT(std::chrono::steady_clock::now() - start, std::chrono::seconds(10));
EXPECT_TRUE(short_callback_called);
diff --git a/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp b/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
index 45464db958e04..97f32e2fbb1bf 100644
--- a/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
+++ b/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
@@ -150,8 +150,9 @@ class ProtocolServerMCPTest : public testing::Test {
/// Runs the MainLoop a single time, executing any pending callbacks.
void Run() {
- loop.AddPendingCallback(
+ bool addition_succeeded = loop.AddPendingCallback(
[](MainLoopBase &loop) { loop.RequestTermination(); });
+ EXPECT_TRUE(addition_succeeded);
EXPECT_THAT_ERROR(loop.Run().takeError(), Succeeded());
}
More information about the lldb-commits
mailing list