[PATCH] D144392: [lldb] Skip signal handlers for ignored signals on lldb-server's side when stepping

Mon Mar 13 06:05:18 PDT 2023

kpdev42 added a comment.

In D144392#4139001 <https://reviews.llvm.org/D144392#4139001>, @labath wrote:

> I'm afraid you're fixing this at the wrong end. This kind of complex thread control does not belong inside the debug stub.
>
> IIUC, the problematic sequence of events is:
>
> 1. lldb sends a `vCont:s` packet
> 2. lldb-server resumes (PTRACE_SINGLESTEP)
> 3. process immediatelly stops again (without stepping) due to a pending signal
> 4. lldb-server decides to inject the signal and resumes (steps) again
> 5. process stops inside the signal handler
> 6. confusion ensues
>
> If that's the case, then I think the bug is at step 4, specifically in this code:
>
>   // Check if debugger should stop at this signal or just ignore it and resume
>   // the inferior.
>   if (m_signals_to_ignore.contains(signo)) {
>      ResumeThread(thread, thread.GetState(), signo);
>      return;
>   }
>
> I believe we should not be attempting to inject the signal if the thread was stepping. I think we should change this to report the signal to the client and let it figure out what to do. I.e., change this code into something like:
>
>   if (m_signals_to_ignore.contains(signo) && thread.GetState() == eStateRunning) {
>      ResumeThread(thread, eStateRunning, signo);
>      return;
>   }
>   // else report the signal as usual
>
> If that is not enough to fix the bug, then we can figure out what needs to be done on the client. @jingham might have something to say about that.

The full problematic sequence is this:

- current pc is at a breakpoint, user hits continue (or step, etc.)
- `ThreadPlanStepOverBreakpoint` is pushed
- the plan wants `eStateStepping`, so lldb sens a `vCont;s` packet to the lldb-server
- lldb-server resumes with `PTRACE_SINGLESTEP`
- process stops due to a (pass=true, stop=false) signal

Then there are two possibilities, depending on signal filtering settings:

1. The signal is ignored (notify=false)
  - lldb-server injects the signal back to the process and resumes (steps again)
2. The signal is not ignored (notify=true)
  - lldb-server sends the stop reason with signal to the lldb client
  - lldb does not care, because it is configured to not stop, so wants to step again, sends the packet

After 1. and 2., the events are the same:

- process stops due to stepping into the signal handler, lldb client sees a successful step
- `StepOverBreakpoint` plan sees a pc != breakpoint address, thinks its job is done, pops off successfuly with an autocontinue
- process resumes, gets out of the handler right back to the breakpoint address
- the breakpoint hits, so the user sees a second breakpoint hit, but the instruction under that address was still not executed

Technically, this is correct, because the pc was at a breakpoint, the process did execute some instructions and went back to the breakpoint, and the program state could have changed. But this is very annoying when a lot of signals are received in the background (and the signal is not interesting to the user, as it, for example, comes from a low level library, the same reason real-time signals are `stop=false` `notify=false` by default right now: https://reviews.llvm.org/D12795)

So it does not depend on the signal filtering (it can be configured anyway), but the problem I would think is that client does not handle the situation with signals while stepping (at least stepping off a breakpoint) properly, and lldb-server does not help.

Gdb does this for example:

> GDB optimizes for stepping the mainline code. If a signal that has handle nostop and handle pass set arrives while a stepping command (e.g., stepi, step, next) is in progress, GDB lets the signal handler run and then resumes stepping the mainline code once the signal handler returns. In other words, GDB steps over the signal handler. This prevents signals that you’ve specified as not interesting (with handle nostop) from changing the focus of debugging unexpectedly.

(https://sourceware.org/gdb/onlinedocs/gdb/Signals.html), which seems reasonable.

If this logic does not belong in lldb-server, then it could be fixed right inside the `StepOverBreakpoint` plan, by enabling the breakpoint being stepped over when a signal is received, waiting for it to hit when the potential handler has returned, and then trying to step off again. But then doing a `step-into` from a line with a breakpoint will step over the signal handler, and from a line without a breakpoint will stop inside the handler, if a signal is received. Then probably creating a new `ThreadPlan` instead with the same logic, executing on top of the plan stack, is the way to go?

Anyway, in my attempts at fixing it on the client side, additionally, for some reason the following situation is often triggered:

- process steps onto a breakpoint, but right before executing the instruction and actually hitting (the `SIGTRAP` signal) another signal (`stop=false`) is delivered
- the process wants to automatically continue
- the following code (https://github.com/llvm/llvm-project/blob/959216f9b1f1fe0c8817a4e9104a38929247f987/lldb/source/Target/Thread.cpp#L639):

  BreakpointSiteSP bp_site_sp =
      GetProcess()->GetBreakpointSiteList().FindByAddress(thread_pc);
  if (bp_site_sp) {
  ...
      ThreadPlanSP step_bp_plan_sp(new ThreadPlanStepOverBreakpoint(*this));
      ...
          QueueThreadPlan(step_bp_plan_sp, false);

pushes a step over breakpoint plan, and the breakpoint is skipped. This should not happen, but I couldn't reproduce this without the patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144392/new/

https://reviews.llvm.org/D144392