[Lldb-commits] [lldb] [lldb] Correctly annotate threads at a bp site as hitting it (PR #82709)

Jason Molenda via lldb-commits lldb-commits at lists.llvm.org
Thu Feb 22 15:50:42 PST 2024


https://github.com/jasonmolenda created https://github.com/llvm/llvm-project/pull/82709

This is next in my series of "fix the racey tests that fail on greendragon" addressing the failure of TestConcurrentManyBreakpoints.py where we set a breakpoint in a function that 100 threads execute, and we check that we hit the breakpoint 100 times.  But sometimes it is only hit 99 times, and the test fails.

When we hit a software breakpoint, the pc value for the thread is the address of the breakpoint instruction - as if it had not been hit yet.  And because a user might ADD a breakpoint for the current pc from the commandline, when we go to resume execution, any thread that is sitting at a breakpoint site will be silently advanced past the breakpoint instruction (disable bp, instruction step that thread, re-enable bp) before resuming.

What this test is exposing is that there is another corner case, a thread that is sitting at a breakpoint site but has not yet executed the breakpoint instruction.  The thread will have no stop reason, no mach exception, so it will not be recorded as having hit the breakpoint (because it hasn't yet).  But when we resume execution, because it is sitting at a breakpoint site, we advance past it and miss the breakpoint hit.

In 2016 Abhishek Aggarwal handled a similar issue with a patch in `ProcessGDBRemote::SetThreadStopInfo()`, adding a breakpoint StopInfo for a thread sitting at a breakpoint site that has no stop reason. debugserver's `jThreadsInfo` would not correctly execute Abhishek's code though because it would respond with `"reason":"none"` for a thread with no stop reason, and `SetThreadStopInfo()` expected an empty reason here.  The first part of my patch is to clear the `reason` if it is `"none"` so we flow through the code correctly.

On Darwin, though, our stop reply packet (Txx...) includes the `threads`, `thread-pcs`, and `jstopinfo` keys, which give us the tids for all current threads, the pc values for those threads, and `jstopinfo` has a JSON dictionary with the mach exceptions for all threads that have a mach exception.  In
`ProcessGDBRemote::CalculateThreadStopInfo()` we set the StopInfo for each thread for a private stop and if we have `jstopinfo` it is the source of all the StopInfos.  I have to add the same logic here, to give the thread a breakpoint StopInfo even though it hasn't executed the breakpoint yet.  In this case we are very early in thread construction and I only have the information in the Txx stop reply packet -- tids, pcs, and jstopinfo, so I can't use the normal general mechanisms of going through the RegisterContext to get the pc, it's a bit different.

If I hack debugserver to not issue `jstopinfo`, `CalculateThreadStopInfo` will fall back to sending `qThreadStopInfo` for each thread and going through `ProcessGDBRemote::SetThreadStopInfo()` to set the stop infos (and with the `reason:none` fix, use Abhishek's code).

rdar://110549165

>From 171613d26338919e40584656a7f88899ba6cc5ca Mon Sep 17 00:00:00 2001
From: Jason Molenda <jmolenda at apple.com>
Date: Thu, 22 Feb 2024 15:35:31 -0800
Subject: [PATCH] [lldb] Correctly annotate threads at a bp site as hitting it

This is next in my series of "fix the racey tests that fail on
greendragon" addressing the failure of TestConcurrentManyBreakpoints.py
where we set a breakpoint in a function that 100 threads execute,
and we check that we hit the breakpoint 100 times.  But sometimes
it is only hit 99 times, and the test fails.

When we hit a software breakpoint, the pc value for the thread is
the address of the breakpoint instruction - as if it had not been
hit yet.  And because a user might ADD a breakpoint for the current
pc from the commandline, when we go to resume execution, any thread
that is sitting at a breakpoint site will be silently advanced past
the breakpoint instruction (disable bp, instruction step that thread,
re-enable bp) before resuming.

What this test is exposing is that there is another corner case, a
thread that is sitting at a breakpoint site but has not yet executed
the breakpoint instruction.  The thread will have no stop reason,
no mach exception, so it will not be recorded as having hit the
breakpoint (because it hasn't yet).  But when we resume execution,
because it is sitting at a breakpoint site, we advance past it and
miss the breakpoint hit.

In 2016 Abhishek Aggarwal handled a similar issue with a patch in
`ProcessGDBRemote::SetThreadStopInfo()`, adding a breakpoint StopInfo
for a thread sitting at a breakpoint site that has no stop reason.
debugserver's `jThreadsInfo` would not correctly execute Abhishek's
code though because it would respond with `"reason":"none"` for a
thread with no stop reason, and `SetThreadStopInfo()` expected an
empty reason here.  The first part of my patch is to clear the
`reason` if it is `"none"` so we flow through the code correctly.

On Darwin, though, our stop reply packet (Txx...) includes the
`threads`, `thread-pcs`, and `jstopinfo` keys, which give us the
tids for all current threads, the pc values for those threads, and
`jstopinfo` has a JSON dictionary with the mach exceptions for all
threads that have a mach exception.  In
`ProcessGDBRemote::CalculateThreadStopInfo()` we set the StopInfo
for each thread for a private stop and if we have `jstopinfo` it
is the source of all the StopInfos.  I have to add the same logic
here, to give the thread a breakpoint StopInfo even though it hasn't
executed the breakpoint yet.  In this case we are very early in
thread construction and I only have the information in the Txx stop
reply packet -- tids, pcs, and jstopinfo, so I can't use the normal
general mechanisms of going through the RegisterContext to get the
pc, it's a bit different.

If I hack debugserver to not issue `jstopinfo`, `CalculateThreadStopInfo`
will fall back to sending `qThreadStopInfo` for each thread and
going through `ProcessGDBRemote::SetThreadStopInfo()` to set the
stop infos (and with the `reason:none` fix, use Abhishek's code).

rdar://110549165
---
 .../Process/gdb-remote/ProcessGDBRemote.cpp   | 33 +++++++++++++++----
 .../Process/gdb-remote/ProcessGDBRemote.h     |  2 +-
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index 629b191f3117aa..eabbc8ad433212 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -1600,6 +1600,26 @@ bool ProcessGDBRemote::CalculateThreadStopInfo(ThreadGDBRemote *thread) {
     // has no stop reason.
     thread->GetRegisterContext()->InvalidateIfNeeded(true);
     if (!GetThreadStopInfoFromJSON(thread, m_jstopinfo_sp)) {
+      // If a thread is stopped at a breakpoint site, set that as the stop
+      // reason even if it hasn't executed the breakpoint instruction yet.
+      // We will silently step over the breakpoint when we resume execution
+      // and miss the fact that this thread hit the breakpoint.
+      const size_t num_thread_ids = m_thread_ids.size();
+      for (size_t i = 0; i < num_thread_ids; i++) {
+        if (m_thread_ids[i] == thread->GetID() && m_thread_pcs.size() > i) {
+          addr_t pc = m_thread_pcs[i];
+          lldb::BreakpointSiteSP bp_site_sp =
+              thread->GetProcess()->GetBreakpointSiteList().FindByAddress(pc);
+          if (bp_site_sp) {
+            if (bp_site_sp->ValidForThisThread(*thread)) {
+              thread->SetStopInfo(
+                  StopInfo::CreateStopReasonWithBreakpointSiteID(
+                      *thread, bp_site_sp->GetID()));
+              return true;
+            }
+          }
+        }
+      }
       thread->SetStopInfo(StopInfoSP());
     }
     return true;
@@ -1630,7 +1650,7 @@ void ProcessGDBRemote::ParseExpeditedRegisters(
 
 ThreadSP ProcessGDBRemote::SetThreadStopInfo(
     lldb::tid_t tid, ExpeditedRegisterMap &expedited_register_map,
-    uint8_t signo, const std::string &thread_name, const std::string &reason,
+    uint8_t signo, const std::string &thread_name, std::string reason,
     const std::string &description, uint32_t exc_type,
     const std::vector<addr_t> &exc_data, addr_t thread_dispatch_qaddr,
     bool queue_vars_valid, // Set to true if queue_name, queue_kind and
@@ -1722,6 +1742,8 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
     } else {
       bool handled = false;
       bool did_exec = false;
+      if (reason == "none")
+        reason.clear();
       if (!reason.empty()) {
         if (reason == "trace") {
           addr_t pc = thread_sp->GetRegisterContext()->GetPC();
@@ -1864,11 +1886,10 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
         lldb::BreakpointSiteSP bp_site_sp =
             thread_sp->GetProcess()->GetBreakpointSiteList().FindByAddress(pc);
 
-        // If the current pc is a breakpoint site then the StopInfo should be
-        // set to Breakpoint even though the remote stub did not set it as such.
-        // This can happen when the thread is involuntarily interrupted (e.g.
-        // due to stops on other threads) just as it is about to execute the
-        // breakpoint instruction.
+        // If a thread is stopped at a breakpoint site, set that as the stop
+        // reason even if it hasn't executed the breakpoint instruction yet.
+        // We will silently step over the breakpoint when we resume execution
+        // and miss the fact that this thread hit the breakpoint.
         if (bp_site_sp && bp_site_sp->ValidForThisThread(*thread_sp)) {
           thread_sp->SetStopInfo(StopInfo::CreateStopReasonWithBreakpointSiteID(
               *thread_sp, bp_site_sp->GetID()));
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
index c1ea1cc7905587..f8b49c318180e3 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
@@ -366,7 +366,7 @@ class ProcessGDBRemote : public Process,
   lldb::ThreadSP
   SetThreadStopInfo(lldb::tid_t tid,
                     ExpeditedRegisterMap &expedited_register_map, uint8_t signo,
-                    const std::string &thread_name, const std::string &reason,
+                    const std::string &thread_name, std::string reason,
                     const std::string &description, uint32_t exc_type,
                     const std::vector<lldb::addr_t> &exc_data,
                     lldb::addr_t thread_dispatch_qaddr, bool queue_vars_valid,



More information about the lldb-commits mailing list