[Lldb-commits] [lldb] [lldb] Fix deadlock when scripted frame providers load on private state thread (PR #191913)

Med Ismail Bennani via lldb-commits lldb-commits at lists.llvm.org
Mon Apr 13 18:52:09 PDT 2026


https://github.com/medismailben updated https://github.com/llvm/llvm-project/pull/191913

>From 62e69decf820b726c5f571eaaff2c7f8015d76fb Mon Sep 17 00:00:00 2001
From: Med Ismail Bennani <ismail at bennani.ma>
Date: Mon, 13 Apr 2026 18:45:08 -0700
Subject: [PATCH] [lldb] Fix deadlock when scripted frame providers load on
 private state thread

When a scripted breakpoint's `was_hit` callback calls `EvaluateExpression`,
`RunThreadPlan` spawns an override private state thread (Thread B) and
reassigns `m_current_private_state_thread_sp` to it. The original private state
thread (Thread A) continues processing events inside `RunThreadPlan` via
`FindNextEventInternal` -> `DoOnRemoval` -> `GetStackFrameList`.

Since Thread A is no longer recognized as the private state thread, it proceeds
to load a scripted frame provider whose Python code tries to acquire a mutex
already held further up Thread A's own call stack, causing a deadlock.

The existing re-entrancy guard in `GetStackFrameList` (added in e1cd558) only
handles the case where a provider is already active
(`m_active_frame_providers_by_thread` is non-empty). In this scenario,
Thread A initiates provider loading before any provider is active, so the guard
doesn't trigger.

`CurrentThreadIsPrivateStateThread` cannot catch this because Thread A
is no longer `m_current_private_state_thread_sp` after the override is
created.

This patch fixes the issue by adding a `thread_local` flag that is set
on entry to `RunPrivateStateThread` and cleared on exit. Both the
original and override private state threads enter this function, so
both are correctly identified. `GetStackFrameList` checks this flag
and returns plain unwinder frames when called from any private state thread.

rdar://174679105

Signed-off-by: Med Ismail Bennani <ismail at bennani.ma>
---
 lldb/include/lldb/Target/Process.h            |   6 +
 lldb/source/Target/Process.cpp                |   8 ++
 lldb/source/Target/Thread.cpp                 |  63 +++++++---
 .../was_hit_deadlock/Makefile                 |   4 +
 .../TestWasHitWithFrameProviderDeadlock.py    | 119 ++++++++++++++++++
 .../was_hit_deadlock/bkpt_resolver.py         |  47 +++++++
 .../was_hit_deadlock/frame_provider.py        |  41 ++++++
 .../was_hit_deadlock/main.c                   |  18 +++
 8 files changed, 289 insertions(+), 17 deletions(-)
 create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile
 create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py
 create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py
 create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py
 create mode 100644 lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c

diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index 307b4e932d396..02f4e11502324 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -2618,6 +2618,12 @@ void PruneThreadPlans();
 
   bool CurrentThreadPosesAsPrivateStateThread();
 
+  /// Returns true if the calling thread is running RunPrivateStateThread().
+  /// Unlike CurrentThreadIsPrivateStateThread(), this correctly identifies the
+  /// original private state thread even after RunThreadPlan has reassigned
+  /// m_current_private_state_thread_sp to an override thread.
+  static bool IsCurrentThreadPrivateState();
+
   virtual Status SendEventData(const char *data) {
     return Status::FromErrorString(
         "Sending an event is not supported for this process.");
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index e725f9eac7d5f..8a76e9c15e3c8 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -4198,7 +4198,10 @@ Status Process::HaltPrivate() {
   return error;
 }
 
+static thread_local bool g_is_private_state_thread = false;
+
 thread_result_t Process::RunPrivateStateThread() {
+  g_is_private_state_thread = true;
   bool control_only = true;
 
   Log *log = GetLog(LLDBLog::Process);
@@ -4328,6 +4331,7 @@ thread_result_t Process::RunPrivateStateThread() {
             __FUNCTION__, static_cast<void *>(this), GetID());
 
   SetPublicRunLockToStopped();
+  g_is_private_state_thread = false;
   return {};
 }
 
@@ -6053,6 +6057,10 @@ bool Process::CurrentThreadIsPrivateStateThread()
       Host::GetCurrentThread());
 }
 
+bool Process::IsCurrentThreadPrivateState() {
+  return g_is_private_state_thread;
+}
+
 bool Process::CurrentThreadPosesAsPrivateStateThread() {
   // If we haven't started up the private state thread yet, then whatever thread
   // is fetching this event should be temporarily the private state thread.
diff --git a/lldb/source/Target/Thread.cpp b/lldb/source/Target/Thread.cpp
index c199fd236f5cd..d8e66438f5df3 100644
--- a/lldb/source/Target/Thread.cpp
+++ b/lldb/source/Target/Thread.cpp
@@ -1491,30 +1491,46 @@ bool Thread::IsAnyProviderActive() {
 StackFrameListSP Thread::GetStackFrameList() {
   std::lock_guard<std::recursive_mutex> guard(m_frame_mutex);
 
-  // If a provider is currently fetching frames, return the provider's input
-  // frames instead of m_curr_frames_sp. m_curr_frames_sp IS the
-  // SyntheticStackFrameList, and accessing it would trigger provider code on
-  // THIS thread too. That is dangerous because:
-  //  - On the provider's own host thread: circular dependency / deadlock.
-  //  - On the private state thread: the provider may call EvaluateExpression
-  //    which needs the private state thread to process events -> deadlock.
-  //  - On any other thread: would run the provider concurrently.
-  // Returning the input (parent) frames is always safe.
+  // Determine if we must avoid running provider code on this call.
+  // Providers execute arbitrary Python that can call back into the SB API or
+  // evaluate expressions. That is unsafe in two situations:
+  //
+  //  1. Re-entrancy: a provider is already active on some host thread.
+  //     - Same thread: the provider's get_frame_at_index() calls
+  //       HandleCommand("bt") or accesses input_frames, which re-enters
+  //       GetStackFrameList() -> infinite recursion.
+  //     - Private state thread: the provider called EvaluateExpression()
+  //       which resumed the process via RunThreadPlan; the private state
+  //       thread must process the resulting stop event, but if it tries to
+  //       build the synthetic frame list it will re-enter the provider ->
+  //       deadlock.
+  //     - Any other thread: would run the provider concurrently with the
+  //       thread that is already mid-construction.
+  //
+  //  2. Current thread is a private state thread.
+  //     Even when no provider is active yet, loading or running one from
+  //     inside RunThreadPlan / DoOnRemoval / ShouldStop would execute
+  //     Python on a thread that is mid-process-control. The provider code
+  //     can call SB API methods that acquire locks held further up the
+  //     same call stack (e.g. SBThread.__bool__() -> GetStoppedExecution-
+  //     Context -> recursive_mutex already held by RunThreadPlan).
+  //
+  // In both cases we return the original unwinder frames. For case (1), if a
+  // provider is active we return its input (parent) frames. For case (2),
+  // we return/create the unwinder frame list without caching it in
+  // m_curr_frames_sp so that non-private-state callers still get providers
+  // once the process settles.
+  ProcessSP process_sp = GetProcess();
   {
     std::lock_guard<std::mutex> pguard(m_provider_frames_mutex);
     if (!m_active_frame_providers_by_thread.empty()) {
-      // Check if the current host thread is inside a provider call.
+      // Case 1a: current host thread is inside a provider call.
       HostThread current(Host::GetCurrentThread());
       auto it = m_active_frame_providers_by_thread.find(current);
       if (it != m_active_frame_providers_by_thread.end() && !it->second.empty())
         return it->second.back();
 
-      // If the private state thread calls GetStackFrameList while a provider
-      // is active on another thread, return parent frames too. The provider
-      // may call EvaluateExpression which needs the private state thread to
-      // process events — touching m_curr_frames_sp (the synthetic list) would
-      // trigger the provider and deadlock.
-      ProcessSP process_sp = GetProcess();
+      // Case 1b: private state thread while a provider is active elsewhere.
       if (process_sp && process_sp->CurrentThreadIsPrivateStateThread())
         return m_active_frame_providers_by_thread.begin()->second.back();
     }
@@ -1523,9 +1539,22 @@ StackFrameListSP Thread::GetStackFrameList() {
   if (m_curr_frames_sp)
     return m_curr_frames_sp;
 
+  // Case 2: current thread is a private state thread -- no provider loading.
+  // We use IsCurrentThreadPrivateState() (a thread_local flag set by
+  // RunPrivateStateThread) rather than CurrentThreadIsPrivateStateThread()
+  // because RunThreadPlan reassigns m_current_private_state_thread_sp to
+  // an override thread. The original private state thread -- which continues
+  // processing events inside RunThreadPlan -- would no longer be recognized
+  // by CurrentThreadIsPrivateStateThread().
+  if (Process::IsCurrentThreadPrivateState()) {
+    if (!m_unwinder_frames_sp)
+      m_unwinder_frames_sp = std::make_shared<StackFrameList>(
+          *this, m_prev_frames_sp, true, /*provider_id=*/0);
+    return m_unwinder_frames_sp;
+  }
+
   // First, try to load frame providers if we don't have any yet.
   if (m_frame_providers.empty()) {
-    ProcessSP process_sp = GetProcess();
     if (process_sp) {
       Target &target = process_sp->GetTarget();
       const auto &descriptors = target.GetScriptedFrameProviderDescriptors();
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile
new file mode 100644
index 0000000000000..695335e068c0c
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/Makefile
@@ -0,0 +1,4 @@
+C_SOURCES := main.c
+CFLAGS_EXTRAS := -std=c99
+
+include Makefile.rules
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py
new file mode 100644
index 0000000000000..febbef49de9e7
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/TestWasHitWithFrameProviderDeadlock.py
@@ -0,0 +1,119 @@
+"""
+Test that a scripted breakpoint's was_hit callback calling EvaluateExpression
+does not deadlock when a scripted frame provider is also registered.
+
+This reproduces the deadlock from:
+    lldb-rpc-server sample showing two private state threads in mutual wait:
+
+    Thread A (lldb.process.internal-state):
+      BreakpointResolverScripted::WasHit -> Python -> SBFrame.EvaluateExpression
+        -> RunThreadPlan -> Halt -> WaitForProcessToStop
+        (holds mutex, waits for state change event)
+
+    Thread B (lldb.process.internal-state-override):
+      HandlePrivateEvent -> ShouldStop -> GetStackFrameList
+        -> LoadScriptedFrameProvider -> ScriptedFrameProvider.__init__
+        -> Python -> SBThread.__bool__ -> GetStoppedExecutionContext
+        -> recursive_mutex::lock (BLOCKED - mutex held by Thread A)
+
+    Thread A waits for the event that Thread B would deliver, but Thread B
+    is blocked on the mutex Thread A holds.
+"""
+
+import os
+import lldb
+import lldbsuite.test.lldbutil as lldbutil
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+
+
+class TestWasHitWithFrameProviderDeadlock(TestBase):
+    NO_DEBUG_INFO_TESTCASE = True
+
+    @expectedFailureAll(oslist=["windows"], bugnumber="llvm.org/pr24528")
+    def test_was_hit_with_frame_provider_no_deadlock(self):
+        """
+        Test that a scripted breakpoint doing EvaluateExpression in was_hit
+        does not deadlock when a scripted frame provider is registered.
+        """
+        self.build()
+
+        target, process, thread, _ = lldbutil.run_to_name_breakpoint(self, "main")
+
+        # Import both the breakpoint resolver and the frame provider scripts.
+        resolver_path = os.path.join(self.getSourceDir(), "bkpt_resolver.py")
+        provider_path = os.path.join(self.getSourceDir(), "frame_provider.py")
+        self.runCmd("command script import " + resolver_path)
+        self.runCmd("command script import " + provider_path)
+
+        # Register the scripted frame provider FIRST.
+        # This means that when any stop event is processed and the thread's
+        # stack frames are accessed, the provider will be loaded, calling
+        # its __init__ which accesses SBThread.
+        error = lldb.SBError()
+        provider_id = target.RegisterScriptedFrameProvider(
+            "frame_provider.SBAPIAccessInInitProvider",
+            lldb.SBStructuredData(),
+            error,
+        )
+        self.assertTrue(
+            error.Success(),
+            f"Should register frame provider: {error}",
+        )
+        self.assertNotEqual(provider_id, 0, "Provider ID should be non-zero")
+
+        # Create the scripted breakpoint that calls EvaluateExpression in was_hit.
+        extra_args = lldb.SBStructuredData()
+        stream = lldb.SBStream()
+        stream.Print('{"symbol": "target_func"}')
+        extra_args.SetFromJSON(stream)
+
+        bkpt = target.BreakpointCreateFromScript(
+            "bkpt_resolver.ExprEvalResolver",
+            extra_args,
+            lldb.SBFileSpecList(),
+            lldb.SBFileSpecList(),
+        )
+        self.assertTrue(bkpt.IsValid(), "Scripted breakpoint should be valid")
+        self.assertGreater(
+            bkpt.GetNumLocations(), 0, "Breakpoint should have locations"
+        )
+
+        # Continue the process. When the breakpoint is hit:
+        # 1. was_hit runs on private state thread, calls EvaluateExpression
+        # 2. EvaluateExpression -> RunThreadPlan -> Halt -> WaitForProcessToStop
+        #    (holds mutex, waits for state event)
+        # 3. Override state thread processes the stop, loads frame provider
+        # 4. Provider __init__ calls SBThread.__bool__ -> GetStoppedExecutionContext
+        #    -> tries to acquire mutex held by step 2 -> DEADLOCK
+        #
+        # If this test completes without hanging, the deadlock is fixed.
+        process.Continue()
+        self.assertState(process.GetState(), lldb.eStateStopped)
+
+        thread = process.GetSelectedThread()
+        self.assertTrue(thread.IsValid(), "Thread should be valid")
+        self.assertEqual(
+            thread.GetStopReason(),
+            lldb.eStopReasonBreakpoint,
+            "Should stop at breakpoint",
+        )
+
+        # Verify that EvaluateExpression in was_hit actually ran successfully
+        # by checking that g_value was incremented.
+        g_value = target.FindFirstGlobalVariable("g_value")
+        self.assertTrue(g_value.IsValid(), "Should find g_value")
+        self.assertGreater(
+            g_value.GetValueAsUnsigned(),
+            0,
+            "g_value should have been incremented by the was_hit callback",
+        )
+
+        # Continue to hit the breakpoint again to ensure it's not a one-off.
+        process.Continue()
+        self.assertState(process.GetState(), lldb.eStateStopped)
+        self.assertEqual(
+            thread.GetStopReason(),
+            lldb.eStopReasonBreakpoint,
+            "Should stop at breakpoint again",
+        )
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py
new file mode 100644
index 0000000000000..7fb7f8675f2a6
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/bkpt_resolver.py
@@ -0,0 +1,47 @@
+"""
+Scripted breakpoint resolver whose was_hit callback calls EvaluateExpression.
+
+This reproduces the deadlock seen in the sample where:
+1. BreakpointResolverScripted::WasHit runs on the private state thread
+2. was_hit calls SBFrame.EvaluateExpression -> RunThreadPlan -> Halt ->
+   WaitForProcessToStop (holds a mutex, waits for state event)
+3. The override private state thread handles the stop and loads a scripted
+   frame provider
+4. The provider's __init__ calls SBThread.__bool__ -> GetStoppedExecutionContext
+   -> tries to acquire the same mutex -> DEADLOCK
+"""
+
+import lldb
+
+
+class ExprEvalResolver:
+    """Scripted breakpoint resolver that evaluates an expression in was_hit."""
+
+    def __init__(self, bkpt, extra_args, dict):
+        self.bkpt = bkpt
+        sym_name = extra_args.GetValueForKey("symbol").GetStringValue(100)
+        self.sym_name = sym_name
+        self.facade_loc = None
+
+    def __callback__(self, sym_ctx):
+        sym = sym_ctx.module.FindSymbol(self.sym_name, lldb.eSymbolTypeCode)
+        if sym.IsValid():
+            self.bkpt.AddLocation(sym.GetStartAddress())
+            self.facade_loc = self.bkpt.AddFacadeLocation()
+
+    def get_short_help(self):
+        return f"ExprEvalResolver for {self.sym_name}"
+
+    def was_hit(self, frame, bp_loc):
+        # This runs on the private state thread. Calling EvaluateExpression
+        # here triggers RunThreadPlan -> Halt -> WaitForProcessToStop, which
+        # holds a mutex and waits for a state change event.
+        options = lldb.SBExpressionOptions()
+        options.SetStopOthers(True)
+        options.SetTryAllThreads(False)
+
+        result = frame.EvaluateExpression("increment()", options)
+        if not result.error.success:
+            return lldb.LLDB_INVALID_BREAK_ID
+
+        return self.facade_loc
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py
new file mode 100644
index 0000000000000..10d4fe3ab0fce
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/frame_provider.py
@@ -0,0 +1,41 @@
+"""
+Scripted frame provider that accesses the SB API in __init__.
+
+This provider checks SBThread validity during construction, which calls
+GetStoppedExecutionContext and tries to acquire a recursive_mutex.
+When combined with the scripted breakpoint's was_hit callback doing
+EvaluateExpression on the private state thread, this causes a deadlock:
+
+- Thread A (private state): holds the mutex in RunThreadPlan/WaitForProcessToStop
+- Thread B (override state): loads this provider, calls SBThread.__bool__ ->
+  GetStoppedExecutionContext -> tries to lock the same mutex -> DEADLOCK
+"""
+
+import lldb
+from lldb.plugins.scripted_frame_provider import ScriptedFrameProvider
+
+
+class SBAPIAccessInInitProvider(ScriptedFrameProvider):
+    """Provider that accesses SBThread (triggering mutex acquisition) in __init__."""
+
+    def __init__(self, input_frames, args):
+        super().__init__(input_frames, args)
+        # This is the key line that triggers the deadlock: checking if the
+        # thread is valid calls SBThread::operator bool() ->
+        # GetStoppedExecutionContext -> recursive_mutex::lock().
+        # When the private state thread holds that mutex (during
+        # RunThreadPlan/WaitForProcessToStop from a was_hit callback),
+        # this blocks forever.
+        if self.thread:
+            self.thread_is_valid = bool(self.thread)
+        else:
+            self.thread_is_valid = False
+
+    @staticmethod
+    def get_description():
+        return "Provider that accesses SB API in __init__ (deadlock reproducer)"
+
+    def get_frame_at_index(self, idx):
+        if idx < len(self.input_frames):
+            return idx
+        return None
diff --git a/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c
new file mode 100644
index 0000000000000..9ebeaf3380ed9
--- /dev/null
+++ b/lldb/test/API/functionalities/scripted_frame_provider/was_hit_deadlock/main.c
@@ -0,0 +1,18 @@
+#include <stdio.h>
+
+int g_value = 0;
+
+int increment() { return ++g_value; }
+
+int target_func() {
+  printf("target_func: %d\n", g_value);
+  return g_value;
+}
+
+int main() {
+  for (int i = 0; i < 10; i++) {
+    target_func();
+  }
+  increment();
+  return 0;
+}



More information about the lldb-commits mailing list