[lldb-dev] strange behaviour at lldb cmd line

Todd Fiala tfiala at google.com
Mon Jun 30 13:18:57 PDT 2014


Thanks, Greg.  I'll get back to you offline on this on the llgs side after
I finish the upstream.  Since we're seeing it with both local and
llgs-based linux debugging, I'm not going to hold the upstreaming for it.

-Todd


On Mon, Jun 30, 2014 at 11:39 AM, Greg Clayton <gclayton at apple.com> wrote:

> As soon as we attach with any GDB server, we expect one of two things:
> 1 - a process and all its threads are stopped. the $? packet should
> respond with any thread that has a stop reason _or_ if no threads have a
> stop reason, then report a $T packet for the first thread. You can probably
> respond with $T00 if there is no real signal or reason the thread stopped.
> Some debuggers like to respond with $T05 (SIGTRAP), but I would rather we
> don't lie and report a bogus SIGTRAP signal and tell the truth ($T00 or no
> signal).
> 2 - there is no process which means qProcessInfo respond with an invalid
> value or error, or for backward compatibility qC responds with an invalid
> value. This lets us know we don't have a process and no $? should be issued.
>
> If you use the packet disassembler, you should be able to tell where this
> is stopping and see the packets that lead up to it to see what it is doing.
> If you have a trace of this issue happening, please send me a copy offline
> and I might be able to help.
>
> Greg
>
> > On Jun 29, 2014, at 12:18 AM, Todd Fiala <todd.fiala at gmail.com> wrote:
> >
> > Er...
> >
> > Ok - so that's not quite right.
> >
> > In the llgs case, there is an initial probe with a $? command that asks
> where things are at after the inferior is launched.  One of the expedited
> registers (well, the PC) has the little-endian value for the address I see
> listed in the T response to the $? on startup.  I'm going to look and see
> if the $? is really supposed to respond with a T on startup.  I've noticed
> (and adhere to) not responding with a T on launch for the initial stop.  It
> might be that $? should likewise not respond with anything there.
> >
> > So this still might be a different thing on llgs than on local Linux.
> >
> > -Todd
> >
> >
> > On Sun, Jun 29, 2014 at 12:11 AM, Todd Fiala <todd.fiala at gmail.com>
> wrote:
> > Here's another interesting bit.  I have been combing through the
> NativeProcessLinux code in the llgs branch as I tighten everything up and
> work through known issues.
> >
> > So for this part Shawn called out above:
> >
> > (lldb) run
> >
> > Process 4417 launching
> > Process 4417 stopped
> > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason
> = trace
> >     frame #0: 0x00007f3b99b9c2d0
> > -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
> >    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
> >    0x7f3b99b9c2d8:  movq   %rax, %r12
> >    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
> >
> > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
> >
> > I have been seeing this in lldb connected to llgs as well as on local
> debugging.  It happens sometimes.  However, on lldb <=> llgs, I have
> complete view of the protocol (the gdb-remote packets), and I can see that
> the disassembled stop point is not ever coming up in the gdb-remote log as
> a stop notification (a T).  I am starting to suspect that there might be
> something in the stack unwinding, symbolication or something else that is
> perhaps racey or maybe sensitive to taking too long to perform some step.
>  The address listed in the llgs case is close to some addresses that are
> getting probed on the gdb-remote protocol.
> >
> > I'm still tracking this down in the llgs case and I'll double check to
> make sure I'm not somehow misreading the gdb-remote packet.  More on this
> later if I find anything useful.
> >
> > -Todd
> >
> >
> >
> > On Thu, Jun 26, 2014 at 6:09 PM, Todd Fiala <tfiala at google.com> wrote:
> > > This stopping at random locations seems like a racy bug in the
> ProcessLinux that we should really look into fixing.
> >
> > (Just doing some correlation with the llgs NativeProcessLinux code,
> which is from the same lineage as the Linux ProcessMonitor but has diverged
> somewhat).
> >
> > Some of these I have hit as well in the NativeProcessLinux code.  I
> happen to still see this one (the "sometimes dump stack info on startup,
> then go forward"), when creating a process on lldb over to llgs.  No doubt
> the same source since NativeProcessLinux started from the same code.
> >
> > > This is bad, please use native types (::pid_t) for these locations so
> that this works correctly.
> >
> > I've fixed most of those (I think all but I'm going to double chekc) as
> I got warnings on them in the NativeProcessLinux side, but I'll double
> check on that.
> >
> > > 1 - on linux does a process start running first, then you quickly try
> to attach to it? If so, this could explain the difference you might be
> seeing when connecting to a process? On Darwin, our posix_spawn() has a non
> portable flag that stops the process at the entry point with a SIGSTOP so
> we are guaranteed to not have a race condition when we launch a process for
> debugging.
> >
> > We have complete control of the launch - we fork a process, do an
> execve(), and the monitor gets a trap on the execve(), so shouldn't be a
> race.  This is the case for both local Linux debugging and
> NativeProcessLinux/llgs.
> >
> > I've started diverging on signal handling fairly significantly in
> NativeProcessLinux, so it's possible some of the behavior will be different
> in some spots.  I have llgs working in cases where local Linux debugging is
> failing on my end now.
> >
> > Shawn - make sure you look at these type of bugs in the llgs branch with
> an llgs - I'd rather you put effort into NativeProcessLinux rather than
> Linux ProcessMonitor.
> >
> > For llgs, I do have the GDBRemoteCommunicationServer setup to be a
> delegate to get messages like process stops.  It gets installed as the
> delegate before llgs actually does the launch or attach call so that
> there's no chance of the delegate missing a startup event.  We must be
> doing something else funky (probably the same but possibly different root
> cause) on both the local Linux and NativeProcessLinux startup sequence to
> cause that race.
> >
> > > 2 - The messages coming in out of order seem to be related to sending
> the eStateLaunching and eStateStopped not being delivered in the correct
> order. Your first example, they came through OK, and in the second cased we
> got a eStateStopped first followed by the eStateLaunching. I would take a
> look at who is sending these out of order. If you fix this out of order
> events, it might fix your random stopping at an wrong location?
> >
> > As far as llgs goes, I do ensure the delegate
> (GDBRemoteCommunicationServer) gets those two in order, but at the moment I
> don't have llgs doing anything when it gets an eStateLaunching method.  Not
> sure if that's relevant there.  What does debugserver do when it gets an
> eStateLaunching internally?
> >
> >
> > On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <gclayton at apple.com>
> wrote:
> >
> > > On Jun 26, 2014, at 4:51 PM, Shawn Best <sbest at blueshiftinc.com>
> wrote:
> > >
> > > In addition to the (lldb) prompt out of order, I am also investigating
> some other strange messages when I run a simple application with no
> breakpoints.  It seems related to thread synchronization surrounding the
> startup/management of the inferior process.
> > >
> > > (lldb) run
> > >
> > > Process 4417 launching
> > > Process 4417 stopped
> > > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop
> reason = trace
> > >     frame #0: 0x00007f3b99b9c2d0
> > > -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
> > >    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
> > >    0x7f3b99b9c2d8:  movq   %rax, %r12
> > >    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
> > >
> > > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
> > > Hello world!
> > > The string is  Test String :  5
> > > Process 4417 exited with status = 0 (0x00000000)
> > > (lldb)
> > >
> > > ------------- or ----------------
> > >
> > > (lldb) run
> > >
> > > Process 4454 launching
> > > Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)
> > > Process 4454 stopped
> > > * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop
> reason = trace
> > >     frame #0: 0x00007ffdec16c2d0
> > > error: No such process
> > >
> > > Hello world!
> > > The string is  Test String :  5
> > > Process 4454 exited with status = 0 (0x00000000)
> > > (lldb)
> > >
> > >
> > > As it is launching the target application, it appears to stop in a
> random place (stop reason = trace), and then continue exectuting. When it
> momentarily stops, I see it pop/push an IOHandler.
> >
> > Yes the Process IO Handler is pushed and popped on every _public_ stop.
> There are notions of public stops that the user finds out about, and
> private stops where the Process might be in the process of trying to single
> step over a source line and might start/stop the process many many times.
> >
> > This stopping at random locations seems like a racy bug in the
> ProcessLinux that we should really look into fixing.
> > >
> > > I added some logging to ProcessPOSIX, and see it hitting
> RefreshAfterStop() and DoResume() many times. Is this normal/expected?
> >
> > When you start a process, you will run/stop many times as the shared
> libraries get loaded. Normally a breakpoint is set in the dynamic loader
> that allows us to intercept when shared libraries are loaded/unloaded so
> that may explain a few stops you are seeing.
> >
> > Other run/stop flurries can result when single stepping over a source
> line, stepping past a software breakpoint (disable bp, single instruction
> step, re-enable breakpoint, resume).
> >
> > > I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate,
> HandleProcessEvent and see big differences in the order of events changing
> from run to run.
> >
> > We have a lot of threading in LLDB so some of this will be normal, but
> other times in can indicate a bug much like you are seeing when the process
> stops at a random location 0x00007ffdec16c2d0. This could also be an
> uninitialized variable in ProcessLinux that gets a random value when
> ProcessLinux (or many other classes like ThreadLinux, etc) when a class
> instance is initialized. Please do try and track that down. To get a handle
> on process controls you can enable process and step logging:
> >
> > (lldb) log enable -T -f /tmp/process.txt lldb process step
> >
> > Then compare a good and bad run and see what differs.
> >
> > >
> > >
> > > One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and
> checks the return code,
> > >
> > >     lldb::pid_t wpid;
> > >     if ((wpid = waitpid(pid, &status, 0)) < 0)
> > >     {
> > >         args->m_error.SetErrorToErrno();
> > >         goto FINISH;
> > >     }
> > >     else  ...
> > >
> > > lldb::pid_t is a uint64, while waitpid returns an int32, with negative
> numbers used for error codes.
> > > This bug is repeated in a few places
> >
> > This is bad, please use native types (::pid_t) for these locations so
> that this works correctly.
> >
> > So a few things regarding your race conditions:
> > 1 - on linux does a process start running first, then you quickly try to
> attach to it? If so, this could explain the difference you might be seeing
> when connecting to a process? On Darwin, our posix_spawn() has a non
> portable flag that stops the process at the entry point with a SIGSTOP so
> we are guaranteed to not have a race condition when we launch a process for
> debugging.
> > 2 - The messages coming in out of order seem to be related to sending
> the eStateLaunching and eStateStopped not being delivered in the correct
> order. Your first example, they came through OK, and in the second cased we
> got a eStateStopped first followed by the eStateLaunching. I would take a
> look at who is sending these out of order. If you fix this out of order
> events, it might fix your random stopping at an wrong location?
> >
> > Greg
> >
> > >
> > > Shawn.
> > >
> > > On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <gclayton at apple.com>
> wrote:
> > >
> > > > On Jun 19, 2014, at 7:27 PM, Ed Maste <emaste at freebsd.org> wrote:
> > > >
> > > > Hi Greg,
> > > >
> > > > As far as I can tell what's happening here is just that
> > > > Process::Resume() completes and the next prompt is emitted (from the
> > > > main-thread?) before the IOHandler gets pushed in another thread.
> > > >
> > > > Output from "log enable -n lldb process" with an added log printf
> > > > where ::Resume returns:
> > > >
> > > > step
> > > > main-thread Process::Resume -- locking run lock
> > > > main-thread Process::PrivateResume() m_stop_id = 4, public state:
> > > > stopped private state: stopped
> > > > main-thread Process::SetPrivateState (running)
> > > > main-thread Process thinks the process has resumed.
> > > > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new
> > > > state: running, last broadcast state: running - YES
> > > > main-thread Process::PrivateResume() returning
> > > > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)
> > > > broadcasting new state running (old state stopped) to public
> > > > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =
> > > > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f
> > > > (STOPPED), signal = 5, exit_state = 0
> > > > internal-state(p PushIOHandler
> > > > wait4(pid=15646) Process::SetPrivateState (stopped)
> > > >
> > > > As before, I don't see how we intend to enforce synchronization
> > > > between those two threads.  It looks like my tiny usleep in
> > > > ::PrivateResume delays the next prompt just long enough for the other
> > > > IOHandler to be pushed.
> > >
> > > That will do it. It is tough because Process::Resume() might not
> succeed so we can't always push the ProcessIOHandler.
> > >
> > > I need to find a better way to coordinate the pushing of the
> ProcessIOHandler so it happens from the same thread that initiates the
> resume. Then we won't have this issue, but I need to carefully do this so
> it doesn't push it when the process won't be resumed (since it might
> already be resumed) or in other edge cases.
> > >
> > > Other ideas would be to have the Process::Resume() do some
> synchronization between the current thread and the internal-state thread so
> it waits for the internal-state thread to get to the running state before
> it returns from Process::Resume()...
> > >
> > > Greg
> > >
> > >
> >
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >
> >
> >
> > --
> > Todd Fiala |   Software Engineer |     tfiala at google.com |
> 650-943-3180
> >
> >
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >
> >
> >
> >
> > --
> > -Todd
> >
> >
> >
> > --
> > -Todd
>
>


-- 
Todd Fiala | Software Engineer | tfiala at google.com | 650-943-3180
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140630/2e8236f8/attachment.html>


More information about the lldb-dev mailing list