[lldb-dev] strange behaviour at lldb cmd line

Todd Fiala tfiala at google.com
Thu Jun 26 18:09:42 PDT 2014


> This stopping at random locations seems like a racy bug in the
ProcessLinux that we should really look into fixing.

(Just doing some correlation with the llgs NativeProcessLinux code, which
is from the same lineage as the Linux ProcessMonitor but has diverged
somewhat).

Some of these I have hit as well in the NativeProcessLinux code.  I happen
to still see this one (the "sometimes dump stack info on startup, then go
forward"), when creating a process on lldb over to llgs.  No doubt the same
source since NativeProcessLinux started from the same code.

> This is bad, please use native types (::pid_t) for these locations so
that this works correctly.

I've fixed most of those (I think all but I'm going to double chekc) as I
got warnings on them in the NativeProcessLinux side, but I'll double check
on that.

> 1 - on linux does a process start running first, then you quickly try to
attach to it? If so, this could explain the difference you might be seeing
when connecting to a process? On Darwin, our posix_spawn() has a non
portable flag that stops the process at the entry point with a SIGSTOP so
we are guaranteed to not have a race condition when we launch a process for
debugging.

We have complete control of the launch - we fork a process, do an execve(),
and the monitor gets a trap on the execve(), so shouldn't be a race.  This
is the case for both local Linux debugging and NativeProcessLinux/llgs.

I've started diverging on signal handling fairly significantly in
NativeProcessLinux, so it's possible some of the behavior will be different
in some spots.  I have llgs working in cases where local Linux debugging is
failing on my end now.

Shawn - make sure you look at these type of bugs in the llgs branch with an
llgs - I'd rather you put effort into NativeProcessLinux rather than Linux
ProcessMonitor.

For llgs, I do have the GDBRemoteCommunicationServer setup to be a delegate
to get messages like process stops.  It gets installed as the delegate
before llgs actually does the launch or attach call so that there's no
chance of the delegate missing a startup event.  We must be doing something
else funky (probably the same but possibly different root cause) on both
the local Linux and NativeProcessLinux startup sequence to cause that race.

> 2 - The messages coming in out of order seem to be related to sending the
eStateLaunching and eStateStopped not being delivered in the correct order.
Your first example, they came through OK, and in the second cased we got a
eStateStopped first followed by the eStateLaunching. I would take a look at
who is sending these out of order. If you fix this out of order events, it
might fix your random stopping at an wrong location?

As far as llgs goes, I do ensure the delegate
(GDBRemoteCommunicationServer) gets those two in order, but at the moment I
don't have llgs doing anything when it gets an eStateLaunching method.  Not
sure if that's relevant there.  What does debugserver do when it gets an
eStateLaunching internally?

On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <gclayton at apple.com> wrote:

>
> > On Jun 26, 2014, at 4:51 PM, Shawn Best <sbest at blueshiftinc.com> wrote:
> >
> > In addition to the (lldb) prompt out of order, I am also investigating
> some other strange messages when I run a simple application with no
> breakpoints.  It seems related to thread synchronization surrounding the
> startup/management of the inferior process.
> >
> > (lldb) run
> >
> > Process 4417 launching
> > Process 4417 stopped
> > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason
> = trace
> >     frame #0: 0x00007f3b99b9c2d0
> > -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
> >    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
> >    0x7f3b99b9c2d8:  movq   %rax, %r12
> >    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
> >
> > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
> > Hello world!
> > The string is  Test String :  5
> > Process 4417 exited with status = 0 (0x00000000)
> > (lldb)
> >
> > ------------- or ----------------
> >
> > (lldb) run
> >
> > Process 4454 launching
> > Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)
> > Process 4454 stopped
> > * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop reason
> = trace
> >     frame #0: 0x00007ffdec16c2d0
> > error: No such process
> >
> > Hello world!
> > The string is  Test String :  5
> > Process 4454 exited with status = 0 (0x00000000)
> > (lldb)
> >
> >
> > As it is launching the target application, it appears to stop in a
> random place (stop reason = trace), and then continue exectuting. When it
> momentarily stops, I see it pop/push an IOHandler.
>
> Yes the Process IO Handler is pushed and popped on every _public_ stop.
> There are notions of public stops that the user finds out about, and
> private stops where the Process might be in the process of trying to single
> step over a source line and might start/stop the process many many times.
>
> This stopping at random locations seems like a racy bug in the
> ProcessLinux that we should really look into fixing.
> >
> > I added some logging to ProcessPOSIX, and see it hitting
> RefreshAfterStop() and DoResume() many times. Is this normal/expected?
>
> When you start a process, you will run/stop many times as the shared
> libraries get loaded. Normally a breakpoint is set in the dynamic loader
> that allows us to intercept when shared libraries are loaded/unloaded so
> that may explain a few stops you are seeing.
>
> Other run/stop flurries can result when single stepping over a source
> line, stepping past a software breakpoint (disable bp, single instruction
> step, re-enable breakpoint, resume).
>
> > I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate,
> HandleProcessEvent and see big differences in the order of events changing
> from run to run.
>
> We have a lot of threading in LLDB so some of this will be normal, but
> other times in can indicate a bug much like you are seeing when the process
> stops at a random location 0x00007ffdec16c2d0. This could also be an
> uninitialized variable in ProcessLinux that gets a random value when
> ProcessLinux (or many other classes like ThreadLinux, etc) when a class
> instance is initialized. Please do try and track that down. To get a handle
> on process controls you can enable process and step logging:
>
> (lldb) log enable -T -f /tmp/process.txt lldb process step
>
> Then compare a good and bad run and see what differs.
>
> >
> >
> > One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and
> checks the return code,
> >
> >     lldb::pid_t wpid;
> >     if ((wpid = waitpid(pid, &status, 0)) < 0)
> >     {
> >         args->m_error.SetErrorToErrno();
> >         goto FINISH;
> >     }
> >     else  ...
> >
> > lldb::pid_t is a uint64, while waitpid returns an int32, with negative
> numbers used for error codes.
> > This bug is repeated in a few places
>
> This is bad, please use native types (::pid_t) for these locations so that
> this works correctly.
>
> So a few things regarding your race conditions:
> 1 - on linux does a process start running first, then you quickly try to
> attach to it? If so, this could explain the difference you might be seeing
> when connecting to a process? On Darwin, our posix_spawn() has a non
> portable flag that stops the process at the entry point with a SIGSTOP so
> we are guaranteed to not have a race condition when we launch a process for
> debugging.
> 2 - The messages coming in out of order seem to be related to sending the
> eStateLaunching and eStateStopped not being delivered in the correct order.
> Your first example, they came through OK, and in the second cased we got a
> eStateStopped first followed by the eStateLaunching. I would take a look at
> who is sending these out of order. If you fix this out of order events, it
> might fix your random stopping at an wrong location?
>
> Greg
>
> >
> > Shawn.
> >
> > On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <gclayton at apple.com>
> wrote:
> >
> > > On Jun 19, 2014, at 7:27 PM, Ed Maste <emaste at freebsd.org> wrote:
> > >
> > > Hi Greg,
> > >
> > > As far as I can tell what's happening here is just that
> > > Process::Resume() completes and the next prompt is emitted (from the
> > > main-thread?) before the IOHandler gets pushed in another thread.
> > >
> > > Output from "log enable -n lldb process" with an added log printf
> > > where ::Resume returns:
> > >
> > > step
> > > main-thread Process::Resume -- locking run lock
> > > main-thread Process::PrivateResume() m_stop_id = 4, public state:
> > > stopped private state: stopped
> > > main-thread Process::SetPrivateState (running)
> > > main-thread Process thinks the process has resumed.
> > > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new
> > > state: running, last broadcast state: running - YES
> > > main-thread Process::PrivateResume() returning
> > > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)
> > > broadcasting new state running (old state stopped) to public
> > > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =
> > > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f
> > > (STOPPED), signal = 5, exit_state = 0
> > > internal-state(p PushIOHandler
> > > wait4(pid=15646) Process::SetPrivateState (stopped)
> > >
> > > As before, I don't see how we intend to enforce synchronization
> > > between those two threads.  It looks like my tiny usleep in
> > > ::PrivateResume delays the next prompt just long enough for the other
> > > IOHandler to be pushed.
> >
> > That will do it. It is tough because Process::Resume() might not succeed
> so we can't always push the ProcessIOHandler.
> >
> > I need to find a better way to coordinate the pushing of the
> ProcessIOHandler so it happens from the same thread that initiates the
> resume. Then we won't have this issue, but I need to carefully do this so
> it doesn't push it when the process won't be resumed (since it might
> already be resumed) or in other edge cases.
> >
> > Other ideas would be to have the Process::Resume() do some
> synchronization between the current thread and the internal-state thread so
> it waits for the internal-state thread to get to the running state before
> it returns from Process::Resume()...
> >
> > Greg
> >
> >
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>



-- 
Todd Fiala | Software Engineer | tfiala at google.com | 650-943-3180
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140626/78e0d80c/attachment.html>


More information about the lldb-dev mailing list