[lldb-dev] strange behaviour at lldb cmd line

Todd Fiala todd.fiala at gmail.com
Sun Jun 29 00:11:18 PDT 2014


Here's another interesting bit.  I have been combing through the
NativeProcessLinux code in the llgs branch as I tighten everything up and
work through known issues.

So for this part Shawn called out above:

(lldb) run

Process 4417 launching
Process 4417 stopped
* thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason =
trace
    frame #0: 0x00007f3b99b9c2d0
-> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
   0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
   0x7f3b99b9c2d8:  movq   %rax, %r12
   0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax

Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)


I have been seeing this in lldb connected to llgs as well as on local
debugging.  It happens sometimes.  However, on lldb <=> llgs, I have
complete view of the protocol (the gdb-remote packets), and I can see that
the disassembled stop point is not ever coming up in the gdb-remote log as
a stop notification (a T).  I am starting to suspect that there might be
something in the stack unwinding, symbolication or something else that is
perhaps racey or maybe sensitive to taking too long to perform some step.
 The address listed in the llgs case is close to some addresses that are
getting probed on the gdb-remote protocol.

I'm still tracking this down in the llgs case and I'll double check to make
sure I'm not somehow misreading the gdb-remote packet.  More on this later
if I find anything useful.

-Todd



On Thu, Jun 26, 2014 at 6:09 PM, Todd Fiala <tfiala at google.com> wrote:

> > This stopping at random locations seems like a racy bug in the
> ProcessLinux that we should really look into fixing.
>
> (Just doing some correlation with the llgs NativeProcessLinux code, which
> is from the same lineage as the Linux ProcessMonitor but has diverged
> somewhat).
>
> Some of these I have hit as well in the NativeProcessLinux code.  I happen
> to still see this one (the "sometimes dump stack info on startup, then go
> forward"), when creating a process on lldb over to llgs.  No doubt the same
> source since NativeProcessLinux started from the same code.
>
> > This is bad, please use native types (::pid_t) for these locations so
> that this works correctly.
>
> I've fixed most of those (I think all but I'm going to double chekc) as I
> got warnings on them in the NativeProcessLinux side, but I'll double check
> on that.
>
> > 1 - on linux does a process start running first, then you quickly try
> to attach to it? If so, this could explain the difference you might be
> seeing when connecting to a process? On Darwin, our posix_spawn() has a non
> portable flag that stops the process at the entry point with a SIGSTOP so
> we are guaranteed to not have a race condition when we launch a process for
> debugging.
>
> We have complete control of the launch - we fork a process, do an
> execve(), and the monitor gets a trap on the execve(), so shouldn't be a
> race.  This is the case for both local Linux debugging and
> NativeProcessLinux/llgs.
>
> I've started diverging on signal handling fairly significantly in
> NativeProcessLinux, so it's possible some of the behavior will be different
> in some spots.  I have llgs working in cases where local Linux debugging is
> failing on my end now.
>
> Shawn - make sure you look at these type of bugs in the llgs branch with
> an llgs - I'd rather you put effort into NativeProcessLinux rather than
> Linux ProcessMonitor.
>
> For llgs, I do have the GDBRemoteCommunicationServer setup to be a
> delegate to get messages like process stops.  It gets installed as the
> delegate before llgs actually does the launch or attach call so that
> there's no chance of the delegate missing a startup event.  We must be
> doing something else funky (probably the same but possibly different root
> cause) on both the local Linux and NativeProcessLinux startup sequence to
> cause that race.
>
> > 2 - The messages coming in out of order seem to be related to sending
> the eStateLaunching and eStateStopped not being delivered in the correct
> order. Your first example, they came through OK, and in the second cased we
> got a eStateStopped first followed by the eStateLaunching. I would take a
> look at who is sending these out of order. If you fix this out of order
> events, it might fix your random stopping at an wrong location?
>
> As far as llgs goes, I do ensure the delegate
> (GDBRemoteCommunicationServer) gets those two in order, but at the moment I
> don't have llgs doing anything when it gets an eStateLaunching method.  Not
> sure if that's relevant there.  What does debugserver do when it gets an
> eStateLaunching internally?
>
>
> On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <gclayton at apple.com> wrote:
>
>>
>> > On Jun 26, 2014, at 4:51 PM, Shawn Best <sbest at blueshiftinc.com> wrote:
>> >
>> > In addition to the (lldb) prompt out of order, I am also investigating
>> some other strange messages when I run a simple application with no
>> breakpoints.  It seems related to thread synchronization surrounding the
>> startup/management of the inferior process.
>> >
>> > (lldb) run
>> >
>> > Process 4417 launching
>> > Process 4417 stopped
>> > * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop
>> reason = trace
>> >     frame #0: 0x00007f3b99b9c2d0
>> > -> 0x7f3b99b9c2d0:  movq   %rsp, %rdi
>> >    0x7f3b99b9c2d3:  callq  0x7f3b99b9fa70
>> >    0x7f3b99b9c2d8:  movq   %rax, %r12
>> >    0x7f3b99b9c2db:  movl   0x221b17(%rip), %eax
>> >
>> > Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
>> > Hello world!
>> > The string is  Test String :  5
>> > Process 4417 exited with status = 0 (0x00000000)
>> > (lldb)
>> >
>> > ------------- or ----------------
>> >
>> > (lldb) run
>> >
>> > Process 4454 launching
>> > Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)
>> > Process 4454 stopped
>> > * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop
>> reason = trace
>> >     frame #0: 0x00007ffdec16c2d0
>> > error: No such process
>> >
>> > Hello world!
>> > The string is  Test String :  5
>> > Process 4454 exited with status = 0 (0x00000000)
>> > (lldb)
>> >
>> >
>> > As it is launching the target application, it appears to stop in a
>> random place (stop reason = trace), and then continue exectuting. When it
>> momentarily stops, I see it pop/push an IOHandler.
>>
>> Yes the Process IO Handler is pushed and popped on every _public_ stop.
>> There are notions of public stops that the user finds out about, and
>> private stops where the Process might be in the process of trying to single
>> step over a source line and might start/stop the process many many times.
>>
>> This stopping at random locations seems like a racy bug in the
>> ProcessLinux that we should really look into fixing.
>> >
>> > I added some logging to ProcessPOSIX, and see it hitting
>> RefreshAfterStop() and DoResume() many times. Is this normal/expected?
>>
>> When you start a process, you will run/stop many times as the shared
>> libraries get loaded. Normally a breakpoint is set in the dynamic loader
>> that allows us to intercept when shared libraries are loaded/unloaded so
>> that may explain a few stops you are seeing.
>>
>> Other run/stop flurries can result when single stepping over a source
>> line, stepping past a software breakpoint (disable bp, single instruction
>> step, re-enable breakpoint, resume).
>>
>> > I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate,
>> HandleProcessEvent and see big differences in the order of events changing
>> from run to run.
>>
>> We have a lot of threading in LLDB so some of this will be normal, but
>> other times in can indicate a bug much like you are seeing when the process
>> stops at a random location 0x00007ffdec16c2d0. This could also be an
>> uninitialized variable in ProcessLinux that gets a random value when
>> ProcessLinux (or many other classes like ThreadLinux, etc) when a class
>> instance is initialized. Please do try and track that down. To get a handle
>> on process controls you can enable process and step logging:
>>
>> (lldb) log enable -T -f /tmp/process.txt lldb process step
>>
>> Then compare a good and bad run and see what differs.
>>
>> >
>> >
>> > One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and
>> checks the return code,
>> >
>> >     lldb::pid_t wpid;
>> >     if ((wpid = waitpid(pid, &status, 0)) < 0)
>> >     {
>> >         args->m_error.SetErrorToErrno();
>> >         goto FINISH;
>> >     }
>> >     else  ...
>> >
>> > lldb::pid_t is a uint64, while waitpid returns an int32, with negative
>> numbers used for error codes.
>> > This bug is repeated in a few places
>>
>> This is bad, please use native types (::pid_t) for these locations so
>> that this works correctly.
>>
>> So a few things regarding your race conditions:
>> 1 - on linux does a process start running first, then you quickly try to
>> attach to it? If so, this could explain the difference you might be seeing
>> when connecting to a process? On Darwin, our posix_spawn() has a non
>> portable flag that stops the process at the entry point with a SIGSTOP so
>> we are guaranteed to not have a race condition when we launch a process for
>> debugging.
>> 2 - The messages coming in out of order seem to be related to sending the
>> eStateLaunching and eStateStopped not being delivered in the correct order.
>> Your first example, they came through OK, and in the second cased we got a
>> eStateStopped first followed by the eStateLaunching. I would take a look at
>> who is sending these out of order. If you fix this out of order events, it
>> might fix your random stopping at an wrong location?
>>
>> Greg
>>
>> >
>> > Shawn.
>> >
>> > On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <gclayton at apple.com>
>> wrote:
>> >
>> > > On Jun 19, 2014, at 7:27 PM, Ed Maste <emaste at freebsd.org> wrote:
>> > >
>> > > Hi Greg,
>> > >
>> > > As far as I can tell what's happening here is just that
>> > > Process::Resume() completes and the next prompt is emitted (from the
>> > > main-thread?) before the IOHandler gets pushed in another thread.
>> > >
>> > > Output from "log enable -n lldb process" with an added log printf
>> > > where ::Resume returns:
>> > >
>> > > step
>> > > main-thread Process::Resume -- locking run lock
>> > > main-thread Process::PrivateResume() m_stop_id = 4, public state:
>> > > stopped private state: stopped
>> > > main-thread Process::SetPrivateState (running)
>> > > main-thread Process thinks the process has resumed.
>> > > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new
>> > > state: running, last broadcast state: running - YES
>> > > main-thread Process::PrivateResume() returning
>> > > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)
>> > > broadcasting new state running (old state stopped) to public
>> > > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =
>> > > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f
>> > > (STOPPED), signal = 5, exit_state = 0
>> > > internal-state(p PushIOHandler
>> > > wait4(pid=15646) Process::SetPrivateState (stopped)
>> > >
>> > > As before, I don't see how we intend to enforce synchronization
>> > > between those two threads.  It looks like my tiny usleep in
>> > > ::PrivateResume delays the next prompt just long enough for the other
>> > > IOHandler to be pushed.
>> >
>> > That will do it. It is tough because Process::Resume() might not
>> succeed so we can't always push the ProcessIOHandler.
>> >
>> > I need to find a better way to coordinate the pushing of the
>> ProcessIOHandler so it happens from the same thread that initiates the
>> resume. Then we won't have this issue, but I need to carefully do this so
>> it doesn't push it when the process won't be resumed (since it might
>> already be resumed) or in other edge cases.
>> >
>> > Other ideas would be to have the Process::Resume() do some
>> synchronization between the current thread and the internal-state thread so
>> it waits for the internal-state thread to get to the running state before
>> it returns from Process::Resume()...
>> >
>> > Greg
>> >
>> >
>>
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>
>
>
>
> --
> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>
>


-- 
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140629/2cc027bf/attachment.html>


More information about the lldb-dev mailing list