<div dir="ltr">Hi again,<div><br></div><div>i ported our "architecture" to a self-contained Python script and an accompanying C file plus shell script to run the whole thing. Let me start by explaining the architecture with links to the code.</div><div><br></div><div>The script is composed of a couple of classes. The most important on is EventProcessor (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L157">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L157</a>). The EventProcessor consists of a simple event loop that is run in a separate thread (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L190">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L190</a>). Here's what this event loop does.</div><div><br></div><div>* Poll events from LLDB (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L233">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L233</a>). We need to use SBListener::PeekAtNextEvent so the thread is non-blocking and can react to messages from other threads.</div><div>* Broadcast a received event to all registered EventListeners (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L293">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L293</a>). Other threads can listen in on events by registering an EventListener via EventProcessor::addListener.</div><div>* Execute a submitted task that needs to run on the event loop thread and requires the inferior to be stopped (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L244">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L244</a>). If the inferior is running, we need to stop it before the task can be executed. A task may need to inspect program state, hence the need for stopping the inferior. If the process isn't stopped, the EventProcessor::executeTask method will vote for a suspension of the process. It then waits until the process has been stopped (for whatever reason), executes the task, and votes for resuming the process again.</div><div>* Evaluate suspension/resume votes from tasks and listeners and control the process state accordingly (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L266">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L266</a>). In each iteration of the event loop, listeners, the current task and the task execution method may cast votes for suspending and resuming the process. The votes are counted and depending on the outcome the process is either resumed or interrupted</div><div><br></div><div>The main function (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L428">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L428</a>) launches a process, attaches an SBDebugger and creates an EventProcessor that will henceforth poll and process events from the debugger.</div><div><br></div><div>We also set up a ThreadListener (<a href="https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L125">https://github.com/badlogic/lldb-issues/blob/master/issue219.py#L125</a>). This is an EventListener that registers breakpoints for two function that signal that a thread was started/exited. This is the mechanism we use in our real code to get to know new threads and get notified about threads being destroyed. If the listener is invoked by the EventProcessor with a stop event, it checks if it belongs to one of the two breakpoints it setup. If that's the case, it will vote for resuming the process as the "user" is not interested in these breakpoints.</div><div><br></div><div>The final piece of the puzzle is the infinite loop in the main function. This simulates what our debugger client is doing. At any point during the inferior's life, the debugger client may query information about the inferior. The debugger client does not care if the process is stopped or running, it may ask at any point in time. We thus submit tasks (in this example only dummy tasks that do nothing). These tasks will force the debugger to stop the inferior if necessary, perform their duty, and then resume the inferior again if no other vote says otherwise.</div><div><br></div><div>With these mechanisms, we implement the Java Debugging Wire Protocol. The ThreadListener is required so we can report the start/end of threads to the JDWP client. Only reporting threads when a user breakpoint or signal is hit is sadly not enough to implement JDWP sufficiently well. Hence the mess with hook breakpoints and automatic resuming if such a breakpoint is hit. The task mechanism is used to implement JDWP requests that query information irrespective of whether the inferior is running or not. Since LLDB can only read program state when the program is stopped, we need to suspend the process, perform the task for the JDWP request, and resume it again.</div><div><br></div><div>This leads me to the crux with this approach. It appears that LLDB doesn't like this start/stop behaviour, especially if its performed at a high frequency.</div><div><br></div><div>To reproduce the issue, execute the run.sh bash script. This will compile the test.c app, then run the Python script. Test.c (<a href="https://github.com/badlogic/lldb-issues/blob/master/test.c">https://github.com/badlogic/lldb-issues/blob/master/test.c</a>) spawns and joins threads in a loop. Each thread lives for about 20 milliseconds. This will trigger the ThreadListener when the thread is started/exiting. The Python script's main method will send 5 tasks in a row to the event loop thread. This will trigger the inferior to be stopped, which may or may not take over a breakpoint event triggered by thread creation/destruction. After a while, the inferior may do one of 3 things:</div><div><br></div><div>1) Exit. We receive a stop event, with a thread stop reason eStopReaonTrace. The task gets executed, the process gets resumed and the next event will be an exit event. The output will look like this: <a href="https://gist.github.com/badlogic/82115fcfedd36792f1d3">https://gist.github.com/badlogic/82115fcfedd36792f1d3</a></div><div>2) Hang. We try to suspend the inferior due to a task, but we never get another event. The output will look like this: <a href="https://gist.github.com/badlogic/2c1f29c5e8c8a6bdbee3">https://gist.github.com/badlogic/2c1f29c5e8c8a6bdbee3</a></div><div>3) No more threads are created. In this state, the inferior will run indefinitely, tasks can be executed, but no more threads are created.</div><div><br></div><div>I'm frankly at a loss as to why this "architecture" behaves as above. I'd be very greatful if someone on this list finds the time to read through this wall of text and can give me any pointers about what i'm doing wrong. I understand that LLDB wasn't designed for our use case, but i feel that frequent process suspends/resumes should work. I must be doing something wrong.</div><div><br></div><div>Thanks,</div><div>Mario</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 12, 2014 at 1:13 PM, Mario Zechner <span dir="ltr"><<a href="mailto:badlogicgames@gmail.com" target="_blank">badlogicgames@gmail.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>when we interrupt a process (SBProcess::Stop()) while a thread is currently being traced due to a breakpoint, the process will be halted and the thread's stop reason will be eStopReasonTrace. Upon resuming the process, we get an eStateExited event and LLDB disconnects from the inferior.</div><div><br></div><div>This is unlikely to ever pop up when using the LLDB cli client or XCode. We have to interrupt and resuming the process at a high frequency, which increases the likelihood of a thread being traced due to a breakpoint or other reason.</div><div><br></div><div>My question boils down to:</div><div><br></div><div>What could be the reason for receiving an eStateExited event after resuming on a eStopReasonTrace stop event?</div><div><br></div><div>I'll try to setup a simple C app and a Python script calling into the LLDB APis that reflects what we do in our code to illustrate and reproduce the issue.</div><div><br></div><div>Thanks,</div><div>Mario</div></div>
</blockquote></div></div>