[lldb-dev] Interrupting process while process is being traced

Mario Zechner badlogicgames at gmail.com
Tue Dec 16 09:44:29 PST 2014


Hi again,

i ported our "architecture" to a self-contained Python script and an
accompanying C file plus shell script to run the whole thing. Let me start
by explaining the architecture with links to the code.

The script is composed of a couple of classes. The most important on is
EventProcessor (
https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L157). The
EventProcessor consists of a simple event loop that is run in a separate
thread (https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L190).
Here's what this event loop does.

* Poll events from LLDB (
https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L233). We
need to use SBListener::PeekAtNextEvent so the thread is non-blocking and
can react to messages from other threads.
* Broadcast a received event to all registered EventListeners (
https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L293).
Other threads can listen in on events by registering an EventListener via
EventProcessor::addListener.
* Execute a submitted task that needs to run on the event loop thread and
requires the inferior to be stopped (
https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L244). If
the inferior is running, we need to stop it before the task can be
executed. A task may need to inspect program state, hence the need for
stopping the inferior. If the process isn't stopped, the
EventProcessor::executeTask method will vote for a suspension of the
process. It then waits until the process has been stopped (for whatever
reason), executes the task, and votes for resuming the process again.
* Evaluate suspension/resume votes from tasks and listeners and control the
process state accordingly (
https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L266). In
each iteration of the event loop, listeners, the current task and the task
execution method may cast votes for suspending and resuming the process.
The votes are counted and depending on the outcome the process is either
resumed or interrupted

The main function (
https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L428)
launches a process, attaches an SBDebugger and creates an EventProcessor
that will henceforth poll and process events from the debugger.

We also set up a ThreadListener (
https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L125). This
is an EventListener that registers breakpoints for two function that signal
that a thread was started/exited. This is the mechanism we use in our real
code to get to know new threads and get notified about threads being
destroyed. If the listener is invoked by the EventProcessor with a stop
event, it checks if it belongs to one of the two breakpoints it setup. If
that's the case, it will vote for resuming the process as the "user" is not
interested in these breakpoints.

The final piece of the puzzle is the infinite loop in the main function.
This simulates what our debugger client is doing. At any point during the
inferior's life, the debugger client may query information about the
inferior. The debugger client does not care if the process is stopped or
running, it may ask at any point in time. We thus submit tasks (in this
example only dummy tasks that do nothing). These tasks will force the
debugger to stop the inferior if necessary, perform their duty, and then
resume the inferior again if no other vote says otherwise.

With these mechanisms, we implement the Java Debugging Wire Protocol. The
ThreadListener is required so we can report the start/end of threads to the
JDWP client. Only reporting threads when a user breakpoint or signal is hit
is sadly not enough to implement JDWP sufficiently well. Hence the mess
with hook breakpoints and automatic resuming if such a breakpoint is hit.
The task mechanism is used to implement JDWP requests that query
information irrespective of whether the inferior is running or not. Since
LLDB can only read program state when the program is stopped, we need to
suspend the process, perform the task for the JDWP request, and resume it
again.

This leads me to the crux with this approach. It appears that LLDB doesn't
like this start/stop behaviour, especially if its performed at a high
frequency.

To reproduce the issue, execute the run.sh bash script. This will compile
the test.c app, then run the Python script. Test.c (
https://github.com/badlogic/lldb-issues/blob/master/test.c) spawns and
joins threads in a loop. Each thread lives for about 20 milliseconds. This
will trigger the ThreadListener when the thread is started/exiting. The
Python script's main method will send 5 tasks in a row to the event loop
thread. This will trigger the inferior to be stopped, which may or may not
take over a breakpoint event triggered by thread creation/destruction.
After a while, the inferior may do one of 3 things:

1) Exit. We receive a stop event, with a thread stop reason
eStopReaonTrace. The task gets executed, the process gets resumed and the
next event will be an exit event. The output will look like this:
https://gist.github.com/badlogic/82115fcfedd36792f1d3
2) Hang. We try to suspend the inferior due to a task, but we never get
another event. The output will look like this:
https://gist.github.com/badlogic/2c1f29c5e8c8a6bdbee3
3) No more threads are created. In this state, the inferior will run
indefinitely, tasks can be executed, but no more threads are created.

I'm frankly at a loss as to why this "architecture" behaves as above. I'd
be very greatful if someone on this list finds the time to read through
this wall of text and can give me any pointers about what i'm doing wrong.
I understand that LLDB wasn't designed for our use case, but i feel that
frequent process suspends/resumes should work. I must be doing something
wrong.

Thanks,
Mario

On Fri, Dec 12, 2014 at 1:13 PM, Mario Zechner <badlogicgames at gmail.com>
wrote:
>
> Hi,
>
> when we interrupt a process (SBProcess::Stop()) while a thread is
> currently being traced due to a breakpoint, the process will be halted and
> the thread's stop reason will be eStopReasonTrace. Upon resuming the
>  process, we get an eStateExited event and LLDB disconnects from the
> inferior.
>
> This is unlikely to ever pop up when using the LLDB cli client or XCode.
> We have to interrupt and resuming the process at a high frequency, which
> increases the likelihood of a thread being traced due to a breakpoint or
> other reason.
>
> My question boils down to:
>
> What could be the reason for receiving an eStateExited event after
> resuming on a eStopReasonTrace stop event?
>
> I'll try to setup a simple C app and a Python script calling into the LLDB
> APis that reflects what we do in our code to illustrate and reproduce the
> issue.
>
> Thanks,
> Mario
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20141216/e9e2d38d/attachment.html>


More information about the lldb-dev mailing list