<div dir="ltr">Er...<div><br></div><div>Ok - so that's not quite right.</div><div><br></div><div>In the llgs case, there is an initial probe with a $? command that asks where things are at after the inferior is launched. One of the expedited registers (well, the PC) has the little-endian value for the address I see listed in the T response to the $? on startup. I'm going to look and see if the $? is really supposed to respond with a T on startup. I've noticed (and adhere to) not responding with a T on launch for the initial stop. It might be that $? should likewise not respond with anything there.</div>
<div><br></div><div>So this still might be a different thing on llgs than on local Linux.</div><div><br></div><div>-Todd</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Jun 29, 2014 at 12:11 AM, Todd Fiala <span dir="ltr"><<a href="mailto:todd.fiala@gmail.com" target="_blank">todd.fiala@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Here's another interesting bit. I have been combing through the NativeProcessLinux code in the llgs branch as I tighten everything up and work through known issues.<div>
<br></div><div>So for this part Shawn called out above:</div><div class="">
<div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><font size="1" face="courier new, monospace">(lldb) run</font></div><div><font size="1" face="courier new, monospace"><br></font></div><div>
<font size="1" face="courier new, monospace">Process 4417 launching</font></div><div><font size="1" face="courier new, monospace">Process 4417 stopped</font></div><div><font size="1" face="courier new, monospace">* thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason = trace</font></div>
<div><font size="1" face="courier new, monospace"> frame #0: 0x00007f3b99b9c2d0</font></div><div><font size="1" face="courier new, monospace">-> 0x7f3b99b9c2d0: movq %rsp, %rdi</font></div><div><font size="1" face="courier new, monospace"> 0x7f3b99b9c2d3: callq 0x7f3b99b9fa70</font></div>
<div><font size="1" face="courier new, monospace"> 0x7f3b99b9c2d8: movq %rax, %r12</font></div><div><font size="1" face="courier new, monospace"> 0x7f3b99b9c2db: movl 0x221b17(%rip), %eax</font></div><div><font size="1" face="courier new, monospace"><br>
</font></div><div><font size="1" face="courier new, monospace">Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)</font></div></blockquote><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br>
</span></div></div><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">I have been seeing this in lldb connected to llgs as well as on local debugging. It happens sometimes. However, on lldb <=> llgs, I have complete view of the protocol (the gdb-remote packets), and I can see that the disassembled stop point is not ever coming up in the gdb-remote log as a stop notification (a T). I am starting to suspect that there might be something in the stack unwinding, symbolication or something else that is perhaps racey or maybe sensitive to taking too long to perform some step. The address listed in the llgs case is close to some addresses that are getting probed on the gdb-remote protocol.</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">I'm still tracking this down in the llgs case and I'll double check to make sure I'm not somehow misreading the gdb-remote packet. More on this later if I find anything useful.</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">-Todd</span></div><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br>
</span></div></div><div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">On Thu, Jun 26, 2014 at 6:09 PM, Todd Fiala <span dir="ltr"><<a href="mailto:tfiala@google.com" target="_blank">tfiala@google.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">> This stopping at random locations seems like a racy bug in the ProcessLinux that we should really look into fixing.</span><br style="font-family:arial,sans-serif;font-size:12.727272033691406px">
<div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div></div><div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">(Just doing some correlation with the llgs NativeProcessLinux code, which is from the same lineage as the Linux ProcessMonitor but has diverged somewhat).</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div><div class="gmail_extra">Some of these I have hit as well in the NativeProcessLinux code. I happen to still see this one (the "sometimes dump stack info on startup, then go forward"), when creating a process on lldb over to llgs. No doubt the same source since NativeProcessLinux started from the same code.</div>
<div>
<div class="gmail_extra"><br></div><div class="gmail_extra">> <span style="font-family:arial,sans-serif;font-size:12.727272033691406px">This is bad, please use native types (::pid_t) for these locations so that this works correctly.</span></div>
<div class="gmail_extra"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></span></div></div><div class="gmail_extra"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">I've fixed most of those (I think all but I'm going to double chekc) as I got warnings on them in the NativeProcessLinux side, but I'll double check on that.</span></div>
<div class="gmail_extra"><br></div><div class="gmail_extra"><div>> <span style="font-family:arial,sans-serif;font-size:12.727272033691406px">1 - on linux does a process start running first, then you quickly try to attach to it? If so, this could explain the difference you might be seeing when connecting to a process? On Darwin, our posix_spawn() has a non portable flag that stops the process at the entry point with a SIGSTOP so we are guaranteed to not have a race condition when we launch a process for debugging.</span><br style="font-family:arial,sans-serif;font-size:12.727272033691406px">
<br></div>We have complete control of the launch - we fork a process, do an execve(), and the monitor gets a trap on the execve(), so shouldn't be a race. This is the case for both local Linux debugging and NativeProcessLinux/llgs.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">I've started diverging on signal handling fairly significantly in NativeProcessLinux, so it's possible some of the behavior will be different in some spots. I have llgs working in cases where local Linux debugging is failing on my end now.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">Shawn - make sure you look at these type of bugs in the llgs branch with an llgs - I'd rather you put effort into NativeProcessLinux rather than Linux ProcessMonitor.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">For llgs, I do have the GDBRemoteCommunicationServer setup to be a delegate to get messages like process stops. It gets installed as the delegate before llgs actually does the launch or attach call so that there's no chance of the delegate missing a startup event. We must be doing something else funky (probably the same but possibly different root cause) on both the local Linux and NativeProcessLinux startup sequence to cause that race.</div>
<div>
<div class="gmail_extra"><br></div><div class="gmail_extra">> <span style="font-family:arial,sans-serif;font-size:12.727272033691406px">2 - The messages coming in out of order seem to be related to sending the eStateLaunching and eStateStopped not being delivered in the correct order. Your first example, they came through OK, and in the second cased we got a eStateStopped first followed by the eStateLaunching. I would take a look at who is sending these out of order. If you fix this out of order events, it might fix your random stopping at an wrong location?</span></div>
<div style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px"></div></div><div class="gmail_extra">As far as llgs goes, I do ensure the delegate (GDBRemoteCommunicationServer) gets those two in order, but at the moment I don't have llgs doing anything when it gets an eStateLaunching method. Not sure if that's relevant there. What does debugserver do when it gets an eStateLaunching internally?<div>
<div><br>
<br><div class="gmail_quote">On Thu, Jun 26, 2014 at 5:21 PM, Greg Clayton <span dir="ltr"><<a href="mailto:gclayton@apple.com" target="_blank">gclayton@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div><div><br>
> On Jun 26, 2014, at 4:51 PM, Shawn Best <<a href="mailto:sbest@blueshiftinc.com" target="_blank">sbest@blueshiftinc.com</a>> wrote:<br>
><br>
> In addition to the (lldb) prompt out of order, I am also investigating some other strange messages when I run a simple application with no breakpoints. It seems related to thread synchronization surrounding the startup/management of the inferior process.<br>
><br>
> (lldb) run<br>
><br>
> Process 4417 launching<br>
> Process 4417 stopped<br>
> * thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason = trace<br>
> frame #0: 0x00007f3b99b9c2d0<br>
> -> 0x7f3b99b9c2d0: movq %rsp, %rdi<br>
> 0x7f3b99b9c2d3: callq 0x7f3b99b9fa70<br>
> 0x7f3b99b9c2d8: movq %rax, %r12<br>
> 0x7f3b99b9c2db: movl 0x221b17(%rip), %eax<br>
><br>
> Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)<br>
> Hello world!<br>
> The string is Test String : 5<br>
> Process 4417 exited with status = 0 (0x00000000)<br>
> (lldb)<br>
><br>
> ------------- or ----------------<br>
><br>
> (lldb) run<br>
><br>
> Process 4454 launching<br>
> Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)<br>
> Process 4454 stopped<br>
> * thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop reason = trace<br>
> frame #0: 0x00007ffdec16c2d0<br>
> error: No such process<br>
><br>
> Hello world!<br>
> The string is Test String : 5<br>
> Process 4454 exited with status = 0 (0x00000000)<br>
> (lldb)<br>
><br>
><br>
> As it is launching the target application, it appears to stop in a random place (stop reason = trace), and then continue exectuting. When it momentarily stops, I see it pop/push an IOHandler.<br>
<br>
</div></div>Yes the Process IO Handler is pushed and popped on every _public_ stop. There are notions of public stops that the user finds out about, and private stops where the Process might be in the process of trying to single step over a source line and might start/stop the process many many times.<br>
<br>
This stopping at random locations seems like a racy bug in the ProcessLinux that we should really look into fixing.<br>
<div>><br>
> I added some logging to ProcessPOSIX, and see it hitting RefreshAfterStop() and DoResume() many times. Is this normal/expected?<br>
<br>
</div>When you start a process, you will run/stop many times as the shared libraries get loaded. Normally a breakpoint is set in the dynamic loader that allows us to intercept when shared libraries are loaded/unloaded so that may explain a few stops you are seeing.<br>
<br>
Other run/stop flurries can result when single stepping over a source line, stepping past a software breakpoint (disable bp, single instruction step, re-enable breakpoint, resume).<br>
<div><br>
> I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate, HandleProcessEvent and see big differences in the order of events changing from run to run.<br>
<br>
</div>We have a lot of threading in LLDB so some of this will be normal, but other times in can indicate a bug much like you are seeing when the process stops at a random location 0x00007ffdec16c2d0. This could also be an uninitialized variable in ProcessLinux that gets a random value when ProcessLinux (or many other classes like ThreadLinux, etc) when a class instance is initialized. Please do try and track that down. To get a handle on process controls you can enable process and step logging:<br>
<br>
(lldb) log enable -T -f /tmp/process.txt lldb process step<br>
<br>
Then compare a good and bad run and see what differs.<br>
<div><br>
><br>
><br>
> One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and checks the return code,<br>
><br>
> lldb::pid_t wpid;<br>
> if ((wpid = waitpid(pid, &status, 0)) < 0)<br>
> {<br>
> args->m_error.SetErrorToErrno();<br>
> goto FINISH;<br>
> }<br>
> else ...<br>
><br>
> lldb::pid_t is a uint64, while waitpid returns an int32, with negative numbers used for error codes.<br>
> This bug is repeated in a few places<br>
<br>
</div>This is bad, please use native types (::pid_t) for these locations so that this works correctly.<br>
<br>
So a few things regarding your race conditions:<br>
1 - on linux does a process start running first, then you quickly try to attach to it? If so, this could explain the difference you might be seeing when connecting to a process? On Darwin, our posix_spawn() has a non portable flag that stops the process at the entry point with a SIGSTOP so we are guaranteed to not have a race condition when we launch a process for debugging.<br>
2 - The messages coming in out of order seem to be related to sending the eStateLaunching and eStateStopped not being delivered in the correct order. Your first example, they came through OK, and in the second cased we got a eStateStopped first followed by the eStateLaunching. I would take a look at who is sending these out of order. If you fix this out of order events, it might fix your random stopping at an wrong location?<br>
<span><font color="#888888"><br>
Greg<br>
</font></span><div><div><br>
><br>
> Shawn.<br>
><br>
> On Fri, Jun 20, 2014 at 3:34 PM, Greg Clayton <<a href="mailto:gclayton@apple.com" target="_blank">gclayton@apple.com</a>> wrote:<br>
><br>
> > On Jun 19, 2014, at 7:27 PM, Ed Maste <<a href="mailto:emaste@freebsd.org" target="_blank">emaste@freebsd.org</a>> wrote:<br>
> ><br>
> > Hi Greg,<br>
> ><br>
> > As far as I can tell what's happening here is just that<br>
> > Process::Resume() completes and the next prompt is emitted (from the<br>
> > main-thread?) before the IOHandler gets pushed in another thread.<br>
> ><br>
> > Output from "log enable -n lldb process" with an added log printf<br>
> > where ::Resume returns:<br>
> ><br>
> > step<br>
> > main-thread Process::Resume -- locking run lock<br>
> > main-thread Process::PrivateResume() m_stop_id = 4, public state:<br>
> > stopped private state: stopped<br>
> > main-thread Process::SetPrivateState (running)<br>
> > main-thread Process thinks the process has resumed.<br>
> > internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new<br>
> > state: running, last broadcast state: running - YES<br>
> > main-thread Process::PrivateResume() returning<br>
> > (lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)<br>
> > broadcasting new state running (old state stopped) to public<br>
> > wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =<br>
> > 15646, &status, options = 0) => pid = -15646, status = 0x0000057f<br>
> > (STOPPED), signal = 5, exit_state = 0<br>
> > internal-state(p PushIOHandler<br>
> > wait4(pid=15646) Process::SetPrivateState (stopped)<br>
> ><br>
> > As before, I don't see how we intend to enforce synchronization<br>
> > between those two threads. It looks like my tiny usleep in<br>
> > ::PrivateResume delays the next prompt just long enough for the other<br>
> > IOHandler to be pushed.<br>
><br>
> That will do it. It is tough because Process::Resume() might not succeed so we can't always push the ProcessIOHandler.<br>
><br>
> I need to find a better way to coordinate the pushing of the ProcessIOHandler so it happens from the same thread that initiates the resume. Then we won't have this issue, but I need to carefully do this so it doesn't push it when the process won't be resumed (since it might already be resumed) or in other edge cases.<br>
><br>
> Other ideas would be to have the Process::Resume() do some synchronization between the current thread and the internal-state thread so it waits for the internal-state thread to get to the running state before it returns from Process::Resume()...<br>
><br>
> Greg<br>
><br>
><br>
<br>
</div></div><div><div>_______________________________________________<br>
lldb-dev mailing list<br>
<a href="mailto:lldb-dev@cs.uiuc.edu" target="_blank">lldb-dev@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span><font color="#888888">-- <br><div dir="ltr"><table cellspacing="0" cellpadding="0" style="color:rgb(136,136,136);font-family:'Times New Roman'">
<tbody><tr style="color:rgb(85,85,85);font-family:sans-serif;font-size:small">
<td nowrap style="border-top-style:solid;border-top-color:rgb(213,15,37);border-top-width:2px">Todd Fiala |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(51,105,232);border-top-width:2px"> Software Engineer |</td>
<td nowrap style="border-top-style:solid;border-top-color:rgb(0,153,57);border-top-width:2px"> <a href="mailto:tfiala@google.com" style="color:rgb(17,85,204)" target="_blank"><span style="color:rgb(34,34,34);background-color:rgb(255,255,204)">tfiala@google.com</span></a> |</td>
<td nowrap style="border-top-style:solid;border-top-color:rgb(238,178,17);border-top-width:2px"><font color="#1155cc"> <a>650-943-3180</a></font></td></tr></tbody></table><br></div>
</font></span></div></div>
<br>_______________________________________________<br>
lldb-dev mailing list<br>
<a href="mailto:lldb-dev@cs.uiuc.edu" target="_blank">lldb-dev@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br></div></div><span class="HOEnZb"><font color="#888888"><div dir="ltr">-Todd</div>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">-Todd</div>
</div>