[lldb-dev] More linux process control and IOHandler races
Todd Fiala
tfiala at google.com
Wed Oct 1 07:52:47 PDT 2014
Thanks, Matthew.
I'll have a look at Shawn's changes and get them in after reviewing.
On Tue, Sep 30, 2014 at 11:43 PM, Matthew Gardiner <mg11 at csr.com> wrote:
> Hi Shawn,
>
> It would be great for you to fix this and all the other lldb prompt
> bugs! Yes, lldb's rotten prompt it bothers me as much as you.
>
> My main problem with this issue, is that I am consuming all my time with
> trying to get CSRs kalimba chips debuggable by lldb. Particularly
> troublesome being those variants with non-8-bit bytes... I wish I could
> spend more this bug, but at the moment it is a peripheral concern for
> me.
>
> I have no pride associated with any previous fixes/bodges
> (sleeps/m_first_stop etc.) that I've done to "address" this issue. So
> please feel free to solve this exactly as you feel fit.
>
> I'm guessing that Todd is in a much closer time-zone to you, so it may
> be better if he applies any of your patches for you. Sorry Todd!
>
> thanks for the update
> Matt
>
> On Mon, 2014-09-29 at 15:36 -0700, Shawn Best wrote:
> > Hi All,
> >
> >
> > I have spent some time digging into this again and understand it a
> > little better now.
> >
> >
> > Matt, the reason your patch hangs on TestHello world, is because the
> > test script is launching the inferior program first then has lldb do
> > an attach. This is a different code path than a regular launch. The
> > one-shot you put in there for 'm_first_stop' to prevent it from
> > broadcasting the first stop, causes it to never get restarted.
> >
> >
> > On OSX, the application is launched first and then lldb attaches to
> > it. There is also a private stop message generated. The reason it is
> > not broadcast, is because of a special class member
> > 'm_resume_requested' and some special machinery ( class
> > AttachCompletionHandler : public NextEventAction ) that is used in the
> > case of launching to set it.
> >
> >
> > I could create a similar class "LaunchCompletionHandler" and get that
> > working to prevent the stop message from being broadcast.
> >
> >
> > The simplest solution is still my original 2 line fix.
> >
> >
> > I would like to get this fixed as it bothers me every time I see the
> > extra spew when debugging. There were also some unit tests failing on
> > linux that this patch fixed.
> >
> >
> > Shawn.
> >
> >
> >
> >
> > On Mon, Aug 18, 2014 at 3:08 AM, Matthew Gardiner <mg11 at csr.com>
> > wrote:
> > Hi Shawn,
> >
> > Have you spent anymore time looking at this prompt issue
> > (where you see the (lldb) prompt even when you use "process
> > launch")? I tried fixing it with the attached patch. This
> > fixes the issue fine at the lldb command line, but results in
> > the python TestHelloWorld test hanging the call to
> > Process::WaitForProcessToStop. I wondered if you had spent
> > anymore time on this lately and knew of a complete/better
> > fix...
> >
> > thanks
> > Matt
> >
> >
> > Greg Clayton wrote:
> > Jim and I have discussed the right thing to do which
> > we plan to implement as soon as we get the chance.
> >
> > Current we have two process states: public and
> > private. The private state is always up to date and is
> > used to track what the process is currently doing. The
> > public state gets set when we believe the end user
> > should see an event that will update the GUI or TTY
> > with the current state if needed (stopped or exited).
> > There is a lot of tricky code that uses the private
> > process state to manage things like the thread plans.
> > Things like source level single step might involve
> > starting and stopping and hitting breakpoints and
> > single instruction stepping, but the user only wants
> > to see the final eStateStopped when the source level
> > step is done. Expressions are also tricky, the public
> > state of the process is stopped, but when we run an
> > expression, we might resume the process many times and
> > the user will always see that the public state if
> > stopped. Internally though the thread plans are doing
> > a whole bunch of stuff to the process and the user
> > never hears anything about these (no events go
> > public).
> >
> > So the current public and private stuff uses
> > broadcaster hijacking and all sorts of other tricks to
> > avoid letting the end user know about the starts and
> > stops that they shouldn't know about. We eventually
> > want to have a new class, lets call in
> > lldb_private::ProcessState for now, that manages the
> > current process state and always receives the process
> > events. There would be no more public and private
> > events. We would have one version of the ProcessState
> > subclass that would cause the GUI/TTY to update
> > (replacing the public events) and one ProcessState
> > subclass that would manage running thread plans. There
> > would be stack of these and this will help us avoid
> > the whole public/private/hijacking stuff we have now.
> >
> > Launching a process would end up pushing a new
> > ProcessState class that could handle all the necessary
> > starts and stop we need to get the process to a
> > quiescent state that is presentable to the user. The
> > ProcessState subclass could eat all the events it
> > needs to while launching and then propagate the
> > stopped or running event when it gets popped off the
> > stack.
> >
> > So we should make some simple fixes for now to work
> > around the issue and get things working, but it would
> > also be great to get the new system up and running so
> > we can make this easier in the future for people that
> > want to make changes.
> >
> > Greg
> >
> >
> > On Aug 8, 2014, at 1:08 AM, Matthew Gardiner
> > <mg11 at csr.com> wrote:
> >
> > Folks,
> >
> > Regarding "launching using the shell", I
> > don't think applies in the buggy case that
> > myself and Shawn are looking at.
> >
> > I do:
> >
> > (lldb) target create ./test
> > ...
> > (lldb) process launch
> > ...
> >
> > and when I inspected the call to exec, I see
> > that my exe name is the program passed.
> >
> > Thanks for the insight into the broadcasting
> > of stop events. That explains why I see in the
> > ShouldBroadcastEvent, the ShouldStop and
> > ShouldReportStop calls.
> >
> > However, it would be nice to know if the first
> > stop event should be broadcast for "process
> > launch". I think it's an implementation
> > detail, and therefore should not. That would
> > help to fix this issue.
> >
> > However, Shawn's original suggestion to fix
> > this issue circumvents the above debate, by
> > replacing the call of HandlePrivateEvent with
> > SetPublicState. So which fix is best? Calling
> > SetPublicState rather than HandlePrivateEvent
> > is certainly more expedient, and avoids this
> > debate... but is it more
> > portable/future-proof?
> >
> > Matt
> >
> >
> > jingham at apple.com wrote:
> > If you are launching using the shell,
> > you'll see more stops before you get
> > to the executable you are actually
> > trying to launch. In that case,
> > instead of just running the binary
> > directly you effectively do:
> >
> > /bin/bash exec <binary> arg1 arg2
> >
> > so that you can get bash (or whatever
> > is set in $SHELL) to do the argument
> > expansion for you.
> > GetResumeCountForLaunchInfo calculates
> > this, then it is stuffed into the
> > ProcessLaunchInfo (SetResumeCount).
> > On Mac OS X we always let the Platform
> > launch, then attach, so in that case
> > the AttachCompletionHandler does the
> > extra resumes. I'm not all that
> > familiar with how the Linux side work,
> > but it also seems to use the
> > ProcessLaunchInfo's resume count.
> >
> > Note that in general in lldb not all
> > publicly broadcast stop messages are
> > going to result in a stop. For
> > instance, all the breakpoint command &
> > condition handling goes on as a result
> > of the broadcast of the public stop
> > event, but the process might just turn
> > around an continue based on that.
> > Whether to suppress the broadcast and
> > continue from the private state thread
> > or broadcast the event and let the
> > upper levels of lldb take care of what
> > happens from there on really depends
> > on where it makes sense to handle the
> > stop. So for the case of breakpoint
> > stops (or stop hooks, another
> > example), those end up being
> > equivalent to user typed commands,
> > just done for the user automatically
> > by the system. So having them happen
> > in a world where the public state
> > wasn't sync'ed up to the private state
> > ended up being very awkward.
> >
> > I haven't looked at the launching code
> > in detail recently so I am not sure
> > whether it makes sense for the first
> > stop to be handled as it is.
> >
> > Jim
> >
> >
> > On Aug 7, 2014, at 6:29 AM,
> > Matthew Gardiner
> > <mg11 at csr.com> wrote:
> >
> > Hi Shawn,
> >
> > I spent some time today
> > looking at how to arrange for
> > ShouldBroadcast to return
> > false for this first stop. I
> > managed to produce a quick
> > hack for this (i.e. just
> > counted the number of stops),
> > but due to other distractions
> > (from the rest of my job) I
> > didn't get that far into
> > discovering a nice way of
> > achieving this...
> >
> > What I did discover is that
> > with my build just doing
> > "process launch" results in 3
> > stops (and 3 private resumes).
> > That in itself I find
> > surprising, since I was under
> > the impression that I should
> > see the inferior stop just
> > once when exec is trapped by
> > PTRACE_ME.
> >
> > I then discovered that for
> > each of these 3 stops
> > ShouldBroadcast calls
> > Thread::ShouldStop, the
> > Thread::ShouldStop returns
> > true for the first stop and
> > false for the other 2. Looking
> > into these behaviour
> > differences I then found that
> > from within Thread::ShouldStop
> > we then call into the
> > following:
> >
> > StopInfoSP
> > private_stop_info
> > (GetPrivateStopInfo());
> > if (private_stop_info &&
> >
> private_stop_info->ShouldStopSynchronous(event_ptr) == false)
> > {
> >
> > and also
> >
> > bool over_ride_stop =
> >
> current_plan->ShouldAutoContinue(event_ptr);
> >
> > the results from either of
> > these, it seems providing the
> > reasoning behind the different
> > true/false returns. I'll
> > return to spend a bit more
> > time on this tomorrow. Let me
> > know if you get any further on
> > a similar vein!
> >
> > thanks
> > Matt
> >
> >
> >
> > Shawn Best wrote:
> > Matt,
> >
> > I think you are
> > probably right,
> > although there are
> > other places where it
> > directly calls
> > SetPublicState(). I
> > was wondering about
> > the possibility there
> > could be other
> > listeners waiting for
> > a broadcast public
> > Stop event. Is that a
> > possibility?
> >
> > Some others here were
> > investigating some
> > unit tests that were
> > failing intermittently
> > (StopHook). Their
> > description of the
> > problem sounds
> > unrelated to the
> > launch code, but this
> > patch also magically
> > fixes that.
> >
> > Shawn.
> >
> > On 8/6/2014 6:26 AM,
> > Matthew Gardiner
> > wrote:
> > Shawn,
> >
> > Like I said
> > earlier your
> > patch worked.
> > However I
> > think the
> > right fix is
> > to arrange
> > that
> > ShouldBroadcast
> returns false for this first stop. I believe this, because firstly no stops
> should be reported here since the user is only interested in launching a
> program, and additionally because it enables us to fix lldb without
> removing the call to HandlePrivateEvent. This, I think, is important to
> preserve as the central point for process state change handling.
> >
> > Matt
> >
> >
> >
> > Shawn Best
> > wrote:
> > Hi
> > Matthew,
> >
> > I have
> > also
> > been
> > tracking
> this bug. I believe there are other bugs in the unit tests failing
> indirectly because of this. I also have a patch that will fix it, but was
> sitting on it until the other one landed. These bugs do not show up on OSX
> since the inferiors are launched separately then attached to.
> >
> > The
> > first
> > odd
> > thing
> > the
> >
> launching code does is push an IOHandler when it sees the state transition
> to 'launching'. This is odd because I believe the launching program will
> always come up in a stopped state which will immediately pop the IOHandler.
> >
> > At
> > launch,
> the process comes up in the stopped state. The launch code manually calls
> HandlePrivateEvent() with the stop event, which then broadcasts the Event.
> When HandleProcessEvent gets the public stop, it dumps out the current
> thread state just as if an executing inferior hit a breakpoint and stopped.
> >
> > One
> > way to
> > fix
> > this
> > would
> > be:
> >
> > 1.
> > Don't
> > push
> > io
> > handler
> when state is 'launching'
> > 2.
> > Instead
> of manually calling HandlePrivateEvent, call SetPublicState().
> >
> >
> Alternately, we could try and debug why ShouldBroadcast() returns true,
> but that appears to be by design since it is expecting the public stop
> event to pop the IOHandler that had been pushed when launching.
> >
> > I have
> > attached
> a patch demonstrating this. In conjunction with the other patch for
> IOHandler race condition, it will fix a bunch of this kind of behaviour.
> >
> > Shawn.
> >
> > On
> > 8/5/2014
> 6:59 AM, Matthew Gardiner wrote:
> >
> Jim,
> >
> >
> I've been trying to debug an issue (I see it on 64-bit linux) where, I do
> "target create" and "process launch" and despite not requesting *stop at
> entry*, the first stop (which I believe is just the initial ptrace attach
> stop) is reported to the lldb command line. I added some fprintf to
> Process::HandlePrivateEvent, which counts the number of eStoppedState
> events seen and whether ShouldBroadcastEvent returns true for this event.
> Here's the output from my program with diagnostic:
> >
> >
> (lldb) target create ~/me/i64-hello.elf
> >
> Current executable set to '~/me/i64-hello.elf' (x86_64).
> >
> (lldb) process launch
> >
> MG Process::HandlePrivateEvent launching stopped_count 0 should_broadcast 1
> >
> Process 31393 launching
> >
> MG Process::HandlePrivateEvent stopped stopped_count 1 should_broadcast 1
> >
> MG Process::HandlePrivateEvent running stopped_count 1 should_broadcast 1
> >
> Process 31393 launched: 'i64-hello.elf' (x86_64)
> >
> Process 31393 stopped
> >
> * thread #1: tid = 31393, 0x0000003675a011f0, name = 'i64-hello.elf', stop
> reason = trace
> >
> >
> frame #0: 0x0000003675a011f0
> >
> -> 0x3675a011f0: movq %rsp, %rdi
> >
> 0x3675a011f3: callq 0x3675a046e0
> >
> 0x3675a011f8: movq %rax, %r12
> >
> 0x3675a011fb: movl 0x21eb97(%rip), %eax
> >
> (lldb) MG Process::HandlePrivateEvent stopped stopped_count 2
> should_broadcast 0
> >
> MG Process::HandlePrivateEvent running stopped_count 2 should_broadcast 0
> >
> MG Process::HandlePrivateEvent stopped stopped_count 3 should_broadcast 0
> >
> MG Process::HandlePrivateEvent running stopped_count 3 should_broadcast 0
> >
> >
> In summary, lldb reports the inferior to be stopped (even though
> /proc/pid/status and lldb "target list" say it is running). Clearly this is
> wrong (hence my earlier post).
> >
> >
> Am I correct in assuming that when ShouldBroadcastEvent returns true this
> means that lldb should show this event to the debug user? (And thus hide
> other events where ShouldBroadcastEvent=false).
> >
> >
> What puzzled me was why ShouldBroadcastEvent return true for this very
> first stop. Is this a bug?
> >
> >
> I also spent sometime at ShouldBroadcastEvent and saw that this:
> >
> >
> >
> case eStateStopped:
> >
> >
> case eStateCrashed:
> >
> >
> case eStateSuspended:
> >
> > {
> >
> ....
> >
> >
> if (was_restarted || should_resume || m_resume_requested)
> >
> > {
> >
> >
> evaluates as false, and hence the PrivateResume code is not called... does
> this seem buggy to you for this very first stop?
> >
> >
> I thought I'd try asking you, since in a previous mail from Greg, he cited
> you as being a thread-plan expert. (Hope that's ok!). I'd really appreciate
> your help in clarifying the above questions for me, and if you have time,
> giving me some ideas as to how to trace this one further e.g. how
> m_thread_list.ShouldStop and m_thread_list.ShouldReportStop should behave,
> etc.
> >
> >
> thanks for your help
> >
> Matt
> >
> >
> Matthew Gardiner wrote:
> >
> Folks,
> >
> >
> In addition to the overlapping prompt race Shawn Best and myself are
> looking at, I'm seeing another issue where if I launch a process, I get a
> stop (presumably the in) being reported to the UI.
> >
> >
> (lldb) target create ~/mydir/i64-hello.elf
> >
> Current executable set to '~/mydir/i64-hello.elf' (x86_64).
> >
> (lldb) process launch
> >
> Process 27238 launching
> >
> Process 27238 launched: '64-hello.elf' (x86_64)
> >
> Process 27238 stopped
> >
> * thread #1: tid = 27238, 0x0000003675a011f0, name = 'i64-hello.elf'
> >
> >
> frame #0:
> >
> (lldb) target list
> >
> Current targets:
> >
> * target #0: i64-hello.elf ( arch=x86_64-unknown-linux,
> platform=host, pid=27238, state=running )
> >
> (lldb)
> >
> >
> As you can see the "target list" reflects that the process is
> running. Which I confirmed by looking at /proc/27238/status.
> >
> >
> Anyone else seeing this?
> >
> >
> thanks
> >
> Matt
> >
> >
> >
> >
> Member of the CSR plc group of companies. CSR plc registered in
> England and Wales, registered number 4187346, registered office Churchill
> House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United
> Kingdom
> >
> More information can be found at www.csr.com <http://www.csr.com>.
> Keep up to date with CSR on our technical blog, www.csr.com/blog <
> http://www.csr.com/blog>, CSR people blog, www.csr.com/people <
> http://www.csr.com/people>, YouTube, www.youtube.com/user/CSRplc <
> http://www.youtube.com/user/CSRplc>, Facebook,
> www.facebook.com/pages/CSR/191038434253534 <
> http://www.facebook.com/pages/CSR/191038434253534>, or follow us on
> Twitter at www.twitter.com/CSR_plc <http://www.twitter.com/CSR_plc>.
> >
> New for 2014, you can now access the wide range of products powered
> by aptX at www.aptx.com <http://www.aptx.com>.
> >
> _______________________________________________
> >
> lldb-dev mailing list
> >
> lldb-dev at cs.uiuc.edu <mailto:lldb-dev at cs.uiuc.edu>
> >
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >
> >
> >
> To report this email as spam click
> https://www.mailcontrol.com/sr/EjKNgqvIx0TGX2PQPOmvUj!GOBh06pKKNwnTW0ZqkNYNbZeofOurgZMo6Cl2EgPiaCw7kl6fPUTCXaTERp6oIw==
> <
> https://www.mailcontrol.com/sr/EjKNgqvIx0TGX2PQPOmvUj%21GOBh06pKKNwnTW0ZqkNYNbZeofOurgZMo6Cl2EgPiaCw7kl6fPUTCXaTERp6oIw==>
> .
> >
> _______________________________________________
> >
> lldb-dev mailing list
> >
> lldb-dev at cs.uiuc.edu <mailto:lldb-dev at cs.uiuc.edu>
> >
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> > _______________________________________________
> > lldb-dev mailing list
> > lldb-dev at cs.uiuc.edu
> >
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >
> >
> >
> >
>
>
>
--
Todd Fiala | Software Engineer | tfiala at google.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20141001/ed05b3b4/attachment.html>
More information about the lldb-dev
mailing list