[Lldb-commits] [lldb] [lldb] Fix deadlock when scripted frame providers load on private state thread (PR #191913)
Med Ismail Bennani via lldb-commits
lldb-commits at lists.llvm.org
Mon Apr 13 21:55:57 PDT 2026
https://github.com/medismailben updated https://github.com/llvm/llvm-project/pull/191913
>From ea94875cda57dee20b72d98ae2d4daa799c1b9d7 Mon Sep 17 00:00:00 2001
From: Med Ismail Bennani <ismail at bennani.ma>
Date: Mon, 13 Apr 2026 21:29:13 -0700
Subject: [PATCH] [lldb] Fix deadlock when scripted frame providers load on
private state thread
When a scripted breakpoint's `was_hit` callback calls `EvaluateExpression`,
`RunThreadPlan` spawns an override private state thread (Thread B) and
reassigns `m_current_private_state_thread_sp` to it. The original private state
thread (Thread A) continues processing events inside `RunThreadPlan` via
`FindNextEventInternal` -> `DoOnRemoval` -> `GetStackFrameList`.
Since Thread A is no longer recognized as the private state thread, it proceeds
to load a scripted frame provider whose Python code tries to acquire a mutex
already held further up Thread A's own call stack, causing a deadlock.
The existing re-entrancy guard in `GetStackFrameList` (added in e1cd558) only
handles the case where a provider is already active
(`m_active_frame_providers_by_thread` is non-empty). In this scenario,
Thread A initiates provider loading before any provider is active, so
the guard doesn't trigger.
`CurrentThreadIsPrivateStateThread` cannot catch this because Thread A
is no longer `m_current_private_state_thread_sp` after the override is
created.
This patch fixes the issue by introducing `PrivateStateThreadGuard`,
an RAII guard that sets a `thread_local` flag on entry to
`RunPrivateStateThread` and clears it on exit. Both the original and
override private state threads enter this function, so both are
correctly identified. `GetStackFrameList` checks this flag and returns
plain unwinder frames when called from any private state thread.
rdar://174679105
Signed-off-by: Med Ismail Bennani <ismail at bennani.ma>
---
lldb/include/lldb/Target/Process.h | 16 +++
lldb/source/Target/Process.cpp | 3 +
lldb/source/Target/Thread.cpp | 63 +++++++---
.../was_hit_deadlock/Makefile | 4 +
.../TestWasHitWithFrameProviderDeadlock.py | 119 ++++++++++++++++++
.../was_hit_deadlock/bkpt_resolver.py | 47 +++++++
.../was_hit_deadlock/frame_provider.py | 41 ++++++
.../was_hit_deadlock/main.c | 18 +++
8 files changed, 294 insertions(+), 17 deletions(-)
create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile
create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py
create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py
create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py
create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c
diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index e32a484198668..f1d91040b339e 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -3657,6 +3657,22 @@ class UtilityFunctionScope {
}
};
+/// RAII guard that marks the current thread as a private state thread.
+///
+/// Both the original and override private state threads enter
+/// RunPrivateStateThread(), which creates this guard on the stack.
+/// Unlike CurrentThreadIsPrivateStateThread(), this correctly identifies the
+/// original private state thread even after RunThreadPlan has reassigned
+/// m_current_private_state_thread_sp to an override thread.
+struct PrivateStateThreadGuard {
+ PrivateStateThreadGuard() { g_is_private_state_thread = true; }
+ ~PrivateStateThreadGuard() { g_is_private_state_thread = false; }
+ static bool IsPrivateStateThread() { return g_is_private_state_thread; }
+
+private:
+ static thread_local bool g_is_private_state_thread;
+};
+
} // namespace lldb_private
#endif // LLDB_TARGET_PROCESS_H
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index e725f9eac7d5f..891f7c7c54a2f 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -4198,7 +4198,10 @@ Status Process::HaltPrivate() {
return error;
}
+thread_local bool PrivateStateThreadGuard::g_is_private_state_thread = false;
+
thread_result_t Process::RunPrivateStateThread() {
+ PrivateStateThreadGuard guard;
bool control_only = true;
Log *log = GetLog(LLDBLog::Process);
diff --git a/lldb/source/Target/Thread.cpp b/lldb/source/Target/Thread.cpp
index c199fd236f5cd..da297e5fe5be0 100644
--- a/lldb/source/Target/Thread.cpp
+++ b/lldb/source/Target/Thread.cpp
@@ -1491,30 +1491,46 @@ bool Thread::IsAnyProviderActive() {
StackFrameListSP Thread::GetStackFrameList() {
std::lock_guard<std::recursive_mutex> guard(m_frame_mutex);
- // If a provider is currently fetching frames, return the provider's input
- // frames instead of m_curr_frames_sp. m_curr_frames_sp IS the
- // SyntheticStackFrameList, and accessing it would trigger provider code on
- // THIS thread too. That is dangerous because:
- // - On the provider's own host thread: circular dependency / deadlock.
- // - On the private state thread: the provider may call EvaluateExpression
- // which needs the private state thread to process events -> deadlock.
- // - On any other thread: would run the provider concurrently.
- // Returning the input (parent) frames is always safe.
+ // Determine if we must avoid running provider code on this call.
+ // Providers execute arbitrary Python that can call back into the SB API or
+ // evaluate expressions. That is unsafe in two situations:
+ //
+ // 1. Re-entrancy: a provider is already active on some host thread.
+ // - Same thread: the provider's get_frame_at_index() calls
+ // HandleCommand("bt") or accesses input_frames, which re-enters
+ // GetStackFrameList() -> infinite recursion.
+ // - Private state thread: the provider called EvaluateExpression()
+ // which resumed the process via RunThreadPlan; the private state
+ // thread must process the resulting stop event, but if it tries to
+ // build the synthetic frame list it will re-enter the provider ->
+ // deadlock.
+ // - Any other thread: would run the provider concurrently with the
+ // thread that is already mid-construction.
+ //
+ // 2. Current thread is a private state thread.
+ // Even when no provider is active yet, loading or running one from
+ // inside RunThreadPlan / DoOnRemoval / ShouldStop would execute
+ // Python on a thread that is mid-process-control. The provider code
+ // can call SB API methods that acquire locks held further up the
+ // same call stack (e.g. SBThread.__bool__() -> GetStoppedExecution-
+ // Context -> recursive_mutex already held by RunThreadPlan).
+ //
+ // In both cases we return the original unwinder frames. For case (1), if a
+ // provider is active we return its input (parent) frames. For case (2),
+ // we return/create the unwinder frame list without caching it in
+ // m_curr_frames_sp so that non-private-state callers still get providers
+ // once the process settles.
+ ProcessSP process_sp = GetProcess();
{
std::lock_guard<std::mutex> pguard(m_provider_frames_mutex);
if (!m_active_frame_providers_by_thread.empty()) {
- // Check if the current host thread is inside a provider call.
+ // Case 1a: current host thread is inside a provider call.
HostThread current(Host::GetCurrentThread());
auto it = m_active_frame_providers_by_thread.find(current);
if (it != m_active_frame_providers_by_thread.end() && !it->second.empty())
return it->second.back();
- // If the private state thread calls GetStackFrameList while a provider
- // is active on another thread, return parent frames too. The provider
- // may call EvaluateExpression which needs the private state thread to
- // process events — touching m_curr_frames_sp (the synthetic list) would
- // trigger the provider and deadlock.
- ProcessSP process_sp = GetProcess();
+ // Case 1b: private state thread while a provider is active elsewhere.
if (process_sp && process_sp->CurrentThreadIsPrivateStateThread())
return m_active_frame_providers_by_thread.begin()->second.back();
}
@@ -1523,9 +1539,22 @@ StackFrameListSP Thread::GetStackFrameList() {
if (m_curr_frames_sp)
return m_curr_frames_sp;
+ // Case 2: current thread is a private state thread -- no provider loading.
+ // We use IsCurrentThreadPrivateState() (a thread_local flag set by
+ // RunPrivateStateThread) rather than CurrentThreadIsPrivateStateThread()
+ // because RunThreadPlan reassigns m_current_private_state_thread_sp to
+ // an override thread. The original private state thread -- which continues
+ // processing events inside RunThreadPlan -- would no longer be recognized
+ // by CurrentThreadIsPrivateStateThread().
+ if (PrivateStateThreadGuard::IsPrivateStateThread()) {
+ if (!m_unwinder_frames_sp)
+ m_unwinder_frames_sp = std::make_shared<StackFrameList>(
+ *this, m_prev_frames_sp, true, /*provider_id=*/0);
+ return m_unwinder_frames_sp;
+ }
+
// First, try to load frame providers if we don't have any yet.
if (m_frame_providers.empty()) {
- ProcessSP process_sp = GetProcess();
if (process_sp) {
Target &target = process_sp->GetTarget();
const auto &descriptors = target.GetScriptedFrameProviderDescriptors();
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile
new file mode 100644
index 0000000000000..695335e068c0c
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile
@@ -0,0 +1,4 @@
+C_SOURCES := main.c
+CFLAGS_EXTRAS := -std=c99
+
+include Makefile.rules
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py
new file mode 100644
index 0000000000000..febbef49de9e7
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py
@@ -0,0 +1,119 @@
+"""
+Test that a scripted breakpoint's was_hit callback calling EvaluateExpression
+does not deadlock when a scripted frame provider is also registered.
+
+This reproduces the deadlock from:
+ lldb-rpc-server sample showing two private state threads in mutual wait:
+
+ Thread A (lldb.process.internal-state):
+ BreakpointResolverScripted::WasHit -> Python -> SBFrame.EvaluateExpression
+ -> RunThreadPlan -> Halt -> WaitForProcessToStop
+ (holds mutex, waits for state change event)
+
+ Thread B (lldb.process.internal-state-override):
+ HandlePrivateEvent -> ShouldStop -> GetStackFrameList
+ -> LoadScriptedFrameProvider -> ScriptedFrameProvider.__init__
+ -> Python -> SBThread.__bool__ -> GetStoppedExecutionContext
+ -> recursive_mutex::lock (BLOCKED - mutex held by Thread A)
+
+ Thread A waits for the event that Thread B would deliver, but Thread B
+ is blocked on the mutex Thread A holds.
+"""
+
+import os
+import lldb
+import lldbsuite.test.lldbutil as lldbutil
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+
+
+class TestWasHitWithFrameProviderDeadlock(TestBase):
+ NO_DEBUG_INFO_TESTCASE = True
+
+ @expectedFailureAll(oslist=["windows"], bugnumber="llvm.org/pr24528")
+ def test_was_hit_with_frame_provider_no_deadlock(self):
+ """
+ Test that a scripted breakpoint doing EvaluateExpression in was_hit
+ does not deadlock when a scripted frame provider is registered.
+ """
+ self.build()
+
+ target, process, thread, _ = lldbutil.run_to_name_breakpoint(self, "main")
+
+ # Import both the breakpoint resolver and the frame provider scripts.
+ resolver_path = os.path.join(self.getSourceDir(), "bkpt_resolver.py")
+ provider_path = os.path.join(self.getSourceDir(), "frame_provider.py")
+ self.runCmd("command script import " + resolver_path)
+ self.runCmd("command script import " + provider_path)
+
+ # Register the scripted frame provider FIRST.
+ # This means that when any stop event is processed and the thread's
+ # stack frames are accessed, the provider will be loaded, calling
+ # its __init__ which accesses SBThread.
+ error = lldb.SBError()
+ provider_id = target.RegisterScriptedFrameProvider(
+ "frame_provider.SBAPIAccessInInitProvider",
+ lldb.SBStructuredData(),
+ error,
+ )
+ self.assertTrue(
+ error.Success(),
+ f"Should register frame provider: {error}",
+ )
+ self.assertNotEqual(provider_id, 0, "Provider ID should be non-zero")
+
+ # Create the scripted breakpoint that calls EvaluateExpression in was_hit.
+ extra_args = lldb.SBStructuredData()
+ stream = lldb.SBStream()
+ stream.Print('{"symbol": "target_func"}')
+ extra_args.SetFromJSON(stream)
+
+ bkpt = target.BreakpointCreateFromScript(
+ "bkpt_resolver.ExprEvalResolver",
+ extra_args,
+ lldb.SBFileSpecList(),
+ lldb.SBFileSpecList(),
+ )
+ self.assertTrue(bkpt.IsValid(), "Scripted breakpoint should be valid")
+ self.assertGreater(
+ bkpt.GetNumLocations(), 0, "Breakpoint should have locations"
+ )
+
+ # Continue the process. When the breakpoint is hit:
+ # 1. was_hit runs on private state thread, calls EvaluateExpression
+ # 2. EvaluateExpression -> RunThreadPlan -> Halt -> WaitForProcessToStop
+ # (holds mutex, waits for state event)
+ # 3. Override state thread processes the stop, loads frame provider
+ # 4. Provider __init__ calls SBThread.__bool__ -> GetStoppedExecutionContext
+ # -> tries to acquire mutex held by step 2 -> DEADLOCK
+ #
+ # If this test completes without hanging, the deadlock is fixed.
+ process.Continue()
+ self.assertState(process.GetState(), lldb.eStateStopped)
+
+ thread = process.GetSelectedThread()
+ self.assertTrue(thread.IsValid(), "Thread should be valid")
+ self.assertEqual(
+ thread.GetStopReason(),
+ lldb.eStopReasonBreakpoint,
+ "Should stop at breakpoint",
+ )
+
+ # Verify that EvaluateExpression in was_hit actually ran successfully
+ # by checking that g_value was incremented.
+ g_value = target.FindFirstGlobalVariable("g_value")
+ self.assertTrue(g_value.IsValid(), "Should find g_value")
+ self.assertGreater(
+ g_value.GetValueAsUnsigned(),
+ 0,
+ "g_value should have been incremented by the was_hit callback",
+ )
+
+ # Continue to hit the breakpoint again to ensure it's not a one-off.
+ process.Continue()
+ self.assertState(process.GetState(), lldb.eStateStopped)
+ self.assertEqual(
+ thread.GetStopReason(),
+ lldb.eStopReasonBreakpoint,
+ "Should stop at breakpoint again",
+ )
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py
new file mode 100644
index 0000000000000..7fb7f8675f2a6
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py
@@ -0,0 +1,47 @@
+"""
+Scripted breakpoint resolver whose was_hit callback calls EvaluateExpression.
+
+This reproduces the deadlock seen in the sample where:
+1. BreakpointResolverScripted::WasHit runs on the private state thread
+2. was_hit calls SBFrame.EvaluateExpression -> RunThreadPlan -> Halt ->
+ WaitForProcessToStop (holds a mutex, waits for state event)
+3. The override private state thread handles the stop and loads a scripted
+ frame provider
+4. The provider's __init__ calls SBThread.__bool__ -> GetStoppedExecutionContext
+ -> tries to acquire the same mutex -> DEADLOCK
+"""
+
+import lldb
+
+
+class ExprEvalResolver:
+ """Scripted breakpoint resolver that evaluates an expression in was_hit."""
+
+ def __init__(self, bkpt, extra_args, dict):
+ self.bkpt = bkpt
+ sym_name = extra_args.GetValueForKey("symbol").GetStringValue(100)
+ self.sym_name = sym_name
+ self.facade_loc = None
+
+ def __callback__(self, sym_ctx):
+ sym = sym_ctx.module.FindSymbol(self.sym_name, lldb.eSymbolTypeCode)
+ if sym.IsValid():
+ self.bkpt.AddLocation(sym.GetStartAddress())
+ self.facade_loc = self.bkpt.AddFacadeLocation()
+
+ def get_short_help(self):
+ return f"ExprEvalResolver for {self.sym_name}"
+
+ def was_hit(self, frame, bp_loc):
+ # This runs on the private state thread. Calling EvaluateExpression
+ # here triggers RunThreadPlan -> Halt -> WaitForProcessToStop, which
+ # holds a mutex and waits for a state change event.
+ options = lldb.SBExpressionOptions()
+ options.SetStopOthers(True)
+ options.SetTryAllThreads(False)
+
+ result = frame.EvaluateExpression("increment()", options)
+ if not result.error.success:
+ return lldb.LLDB_INVALID_BREAK_ID
+
+ return self.facade_loc
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py
new file mode 100644
index 0000000000000..10d4fe3ab0fce
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py
@@ -0,0 +1,41 @@
+"""
+Scripted frame provider that accesses the SB API in __init__.
+
+This provider checks SBThread validity during construction, which calls
+GetStoppedExecutionContext and tries to acquire a recursive_mutex.
+When combined with the scripted breakpoint's was_hit callback doing
+EvaluateExpression on the private state thread, this causes a deadlock:
+
+- Thread A (private state): holds the mutex in RunThreadPlan/WaitForProcessToStop
+- Thread B (override state): loads this provider, calls SBThread.__bool__ ->
+ GetStoppedExecutionContext -> tries to lock the same mutex -> DEADLOCK
+"""
+
+import lldb
+from lldb.plugins.scripted_frame_provider import ScriptedFrameProvider
+
+
+class SBAPIAccessInInitProvider(ScriptedFrameProvider):
+ """Provider that accesses SBThread (triggering mutex acquisition) in __init__."""
+
+ def __init__(self, input_frames, args):
+ super().__init__(input_frames, args)
+ # This is the key line that triggers the deadlock: checking if the
+ # thread is valid calls SBThread::operator bool() ->
+ # GetStoppedExecutionContext -> recursive_mutex::lock().
+ # When the private state thread holds that mutex (during
+ # RunThreadPlan/WaitForProcessToStop from a was_hit callback),
+ # this blocks forever.
+ if self.thread:
+ self.thread_is_valid = bool(self.thread)
+ else:
+ self.thread_is_valid = False
+
+ @staticmethod
+ def get_description():
+ return "Provider that accesses SB API in __init__ (deadlock reproducer)"
+
+ def get_frame_at_index(self, idx):
+ if idx < len(self.input_frames):
+ return idx
+ return None
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c
new file mode 100644
index 0000000000000..9ebeaf3380ed9
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c
@@ -0,0 +1,18 @@
+#include <stdio.h>
+
+int g_value = 0;
+
+int increment() { return ++g_value; }
+
+int target_func() {
+ printf("target_func: %d\n", g_value);
+ return g_value;
+}
+
+int main() {
+ for (int i = 0; i < 10; i++) {
+ target_func();
+ }
+ increment();
+ return 0;
+}
More information about the lldb-commits
mailing list