[lldb-dev] strange behaviour at lldb cmd line

Mon Jun 30 11:39:45 PDT 2014

As soon as we attach with any GDB server, we expect one of two things:
1 - a process and all its threads are stopped. the $? packet should respond with any thread that has a stop reason _or_ if no threads have a stop reason, then report a $T packet for the first thread. You can probably respond with $T00 if there is no real signal or reason the thread stopped. Some debuggers like to respond with $T05 (SIGTRAP), but I would rather we don't lie and report a bogus SIGTRAP signal and tell the truth ($T00 or no signal).
2 - there is no process which means qProcessInfo respond with an invalid value or error, or for backward compatibility qC responds with an invalid value. This lets us know we don't have a process and no $? should be issued.

If you use the packet disassembler, you should be able to tell where this is stopping and see the packets that lead up to it to see what it is doing. If you have a trace of this issue happening, please send me a copy offline and I might be able to help.

Greg

> On Jun 29, 2014, at 12:18 AM, Todd Fiala <todd.fiala at gmail.com> wrote:
> 
> Er...
> 
> Ok - so that's not quite right.
> 
> In the llgs case, there is an initial probe with a $? command that asks where things are at after the inferior is launched.  One of the expedited registers (well, the PC) has the little-endian value for the address I see listed in the T response to the $? on startup.  I'm going to look and see if the $? is really supposed to respond with a T on startup.  I've noticed (and adhere to) not responding with a T on launch for the initial stop.  It might be that $? should likewise not respond with anything there.
> 
> So this still might be a different thing on llgs than on local Linux.
> 
> -Todd
> 
> 
> On Sun, Jun 29, 2014 at 12:11 AM, Todd Fiala <todd.fiala at gmail.com> wrote:
> Here's another interesting bit.  I have been combing through the NativeProcessLinux code in the llgs branch as I tighten everything up and work through known issues.
> 
> So for this part Shawn called out above:
> 
> (lldb) run
> 
> Process 4417 launching
> Process 4417 stopped
> * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason = trace
>     frame #0: 0x00007f3b99b9c2d0
> -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
>    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
>    0x7f3b99b9c2d8:  movq   %rax, %r12
>    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
> 
> Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
> 
> I have been seeing this in lldb connected to llgs as well as on local debugging.  It happens sometimes.  However, on lldb <=> llgs, I have complete view of the protocol (the gdb-remote packets), and I can see that the disassembled stop point is not ever coming up in the gdb-remote log as a stop notification (a T).  I am starting to suspect that there might be something in the stack unwinding, symbolication or something else that is perhaps racey or maybe sensitive to taking too long to perform some step.  The address listed in the llgs case is close to some addresses that are getting probed on the gdb-remote protocol.
> 
> I'm still tracking this down in the llgs case and I'll double check to make sure I'm not somehow misreading the gdb-remote packet.  More on this later if I find anything useful.
> 
> -Todd
> 
> 
> 
> On Thu, Jun 26, 2014 at 6:09 PM, Todd Fiala <tfiala at google.com> wrote:
> > This stopping at random locations seems like a racy bug in the ProcessLinux that we should really look into fixing.
> 
> (Just doing some correlation with the llgs NativeProcessLinux code, which is from the same lineage as the Linux ProcessMonitor but has diverged somewhat).
> 
> Some of these I have hit as well in the NativeProcessLinux code.  I happen to still see this one (the "sometimes dump stack info on startup, then go forward"), when creating a process on lldb over to llgs.  No doubt the same source since NativeProcessLinux started from the same code.
> 
> > This is bad, please use native types (::pid_t) for these locations so that this works correctly.
> 
> I've fixed most of those (I think all but I'm going to double chekc) as I got warnings on them in the NativeProcessLinux side, but I'll double check on that.
> 
> > 1 - on linux does a process start running first, then you quickly try to attach to it? If so, this could explain the difference you might be seeing when connecting to a process? On Darwin, our posix_spawn() has a non portable flag that stops the process at the entry point with a SIGSTOP so we are guaranteed to not have a race condition when we launch a process for debugging.
> 
> We have complete control of the launch - we fork a process, do an execve(), and the monitor gets a trap on the execve(), so shouldn't be a race.  This is the case for both local Linux debugging and NativeProcessLinux/llgs.
> 
> I've started diverging on signal handling fairly significantly in NativeProcessLinux, so it's possible some of the behavior will be different in some spots.  I have llgs working in cases where local Linux debugging is failing on my end now.
> 
> Shawn - make sure you look at these type of bugs in the llgs branch with an llgs - I'd rather you put effort into NativeProcessLinux rather than Linux ProcessMonitor.
> 
> For llgs, I do have the GDBRemoteCommunicationServer setup to be a delegate to get messages like process stops.  It gets installed as the delegate before llgs actually does the launch or attach call so that there's no chance of the delegate missing a startup event.  We must be doing something else funky (probably the same but possibly different root cause) on both the local Linux and NativeProcessLinux startup sequence to cause that race.
> 
> > 2 - The messages coming in out of order seem to be related to sending the eStateLaunching and eStateStopped not being delivered in the correct order. Your first example, they came through OK, and in the second cased we got a eStateStopped first followed by the eStateLaunching. I would take a look at who is sending these out of order. If you fix this out of order events, it might fix your random stopping at an wrong location?
> 
> As far as llgs goes, I do ensure the delegate (GDBRemoteCommunicationServer) gets those two in order, but at the moment I don't have llgs doing anything when it gets an eStateLaunching method.  Not sure if that's relevant there.  What does debugserver do when it gets an eStateLaunching internally?
> 
> 
> On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <gclayton at apple.com> wrote:
> 
> > On Jun 26, 2014, at 4:51 PM, Shawn Best <sbest at blueshiftinc.com> wrote:
> >
> > In addition to the (lldb) prompt out of order, I am also investigating some other strange messages when I run a simple application with no breakpoints.  It seems related to thread synchronization surrounding the startup/management of the inferior process.
> >
> > (lldb) run
> >
> > Process 4417 launching
> > Process 4417 stopped
> > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason = trace
> >     frame #0: 0x00007f3b99b9c2d0
> > -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
> >    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
> >    0x7f3b99b9c2d8:  movq   %rax, %r12
> >    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
> >
> > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
> > Hello world!
> > The string is  Test String :  5
> > Process 4417 exited with status = 0 (0x00000000)
> > (lldb)
> >
> > ------------- or ----------------
> >
> > (lldb) run
> >
> > Process 4454 launching
> > Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)
> > Process 4454 stopped
> > * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop reason = trace
> >     frame #0: 0x00007ffdec16c2d0
> > error: No such process
> >
> > Hello world!
> > The string is  Test String :  5
> > Process 4454 exited with status = 0 (0x00000000)
> > (lldb)
> >
> >
> > As it is launching the target application, it appears to stop in a random place (stop reason = trace), and then continue exectuting. When it momentarily stops, I see it pop/push an IOHandler.
> 
> Yes the Process IO Handler is pushed and popped on every _public_ stop. There are notions of public stops that the user finds out about, and private stops where the Process might be in the process of trying to single step over a source line and might start/stop the process many many times.
> 
> This stopping at random locations seems like a racy bug in the ProcessLinux that we should really look into fixing.
> >
> > I added some logging to ProcessPOSIX, and see it hitting RefreshAfterStop() and DoResume() many times. Is this normal/expected?
> 
> When you start a process, you will run/stop many times as the shared libraries get loaded. Normally a breakpoint is set in the dynamic loader that allows us to intercept when shared libraries are loaded/unloaded so that may explain a few stops you are seeing.
> 
> Other run/stop flurries can result when single stepping over a source line, stepping past a software breakpoint (disable bp, single instruction step, re-enable breakpoint, resume).
> 
> > I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate, HandleProcessEvent and see big differences in the order of events changing from run to run.
> 
> We have a lot of threading in LLDB so some of this will be normal, but other times in can indicate a bug much like you are seeing when the process stops at a random location 0x00007ffdec16c2d0. This could also be an uninitialized variable in ProcessLinux that gets a random value when ProcessLinux (or many other classes like ThreadLinux, etc) when a class instance is initialized. Please do try and track that down. To get a handle on process controls you can enable process and step logging:
> 
> (lldb) log enable -T -f /tmp/process.txt lldb process step
> 
> Then compare a good and bad run and see what differs.
> 
> >
> >
> > One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and checks the return code,
> >
> >     lldb::pid_t wpid;
> >     if ((wpid = waitpid(pid, &status, 0)) < 0)
> >     {
> >         args->m_error.SetErrorToErrno();
> >         goto FINISH;
> >     }
> >     else  ...
> >
> > lldb::pid_t is a uint64, while waitpid returns an int32, with negative numbers used for error codes.
> > This bug is repeated in a few places
> 
> This is bad, please use native types (::pid_t) for these locations so that this works correctly.
> 
> So a few things regarding your race conditions:
> 1 - on linux does a process start running first, then you quickly try to attach to it? If so, this could explain the difference you might be seeing when connecting to a process? On Darwin, our posix_spawn() has a non portable flag that stops the process at the entry point with a SIGSTOP so we are guaranteed to not have a race condition when we launch a process for debugging.
> 2 - The messages coming in out of order seem to be related to sending the eStateLaunching and eStateStopped not being delivered in the correct order. Your first example, they came through OK, and in the second cased we got a eStateStopped first followed by the eStateLaunching. I would take a look at who is sending these out of order. If you fix this out of order events, it might fix your random stopping at an wrong location?
> 
> Greg
> 
> >
> > Shawn.
> >
> > On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <gclayton at apple.com> wrote:
> >
> > > On Jun 19, 2014, at 7:27 PM, Ed Maste <emaste at freebsd.org> wrote:
> > >
> > > Hi Greg,
> > >
> > > As far as I can tell what's happening here is just that
> > > Process::Resume() completes and the next prompt is emitted (from the
> > > main-thread?) before the IOHandler gets pushed in another thread.
> > >
> > > Output from "log enable -n lldb process" with an added log printf
> > > where ::Resume returns:
> > >
> > > step
> > > main-thread Process::Resume -- locking run lock
> > > main-thread Process::PrivateResume() m_stop_id = 4, public state:
> > > stopped private state: stopped
> > > main-thread Process::SetPrivateState (running)
> > > main-thread Process thinks the process has resumed.
> > > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new
> > > state: running, last broadcast state: running - YES
> > > main-thread Process::PrivateResume() returning
> > > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)
> > > broadcasting new state running (old state stopped) to public
> > > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =
> > > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f
> > > (STOPPED), signal = 5, exit_state = 0
> > > internal-state(p PushIOHandler
> > > wait4(pid=15646) Process::SetPrivateState (stopped)
> > >
> > > As before, I don't see how we intend to enforce synchronization
> > > between those two threads.  It looks like my tiny usleep in
> > > ::PrivateResume delays the next prompt just long enough for the other
> > > IOHandler to be pushed.
> >
> > That will do it. It is tough because Process::Resume() might not succeed so we can't always push the ProcessIOHandler.
> >
> > I need to find a better way to coordinate the pushing of the ProcessIOHandler so it happens from the same thread that initiates the resume. Then we won't have this issue, but I need to carefully do this so it doesn't push it when the process won't be resumed (since it might already be resumed) or in other edge cases.
> >
> > Other ideas would be to have the Process::Resume() do some synchronization between the current thread and the internal-state thread so it waits for the internal-state thread to get to the running state before it returns from Process::Resume()...
> >
> > Greg
> >
> >
> 
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> 
> 
> 
> -- 
> Todd Fiala |	 Software Engineer |	 tfiala at google.com |	 650-943-3180
> 
> 
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> 
> 
> 
> 
> -- 
> -Todd
> 
> 
> 
> -- 
> -Todd