[lldb-dev] Continuing from a breakpoint with multiple threads

Mon Jun 1 16:47:11 PDT 2015

Thanks!  This has been very helpful.  For reference, the reason why that
SetResumeState() was ever there to begin with is because when I was first
trying to make all this work I used the ProcessPOSIX as a model, and
POSIXThread does the same thing.  There's even a comment saying that it
doesn't seem right, so I guess I should have looked more closely.  But
everything was still new to me at the time, so I wasn't sure what the
implications of fiddling with it were.

On Mon, Jun 1, 2015 at 4:38 PM Jim Ingham <jingham at apple.com> wrote:

> One other bit, just in case you're looking at this more closely... The
> other place where we call SetResumeState in lldb is in "process continue"
> and "thread continue".  "process continue" sets all the states to running.
> The idea is that "continue" should get all the threads that aren't
> user-suspended running, but I don't think the code there actually does
> anything useful.  I'll play around with taking that out.
>
> The other place is in "thread continue", where it sets all the specified
> threads to running, and all the others to suspended.  That's appropriate
> since thread continue is really a composite command that suspends all the
> unspecified threads, resumes all the specified ones, and continues the
> process.  So we're posing as the user here.
>
> Jim
>
>
> > On Jun 1, 2015, at 3:53 PM, Jim Ingham <jingham at apple.com> wrote:
> >
> >
> >> On Jun 1, 2015, at 3:07 PM, Jim Ingham <jingham at apple.com> wrote:
> >>
> >> One thing to keep in mind is that there are two kinds of "resume
> state".  There's the user-specified resume state, which is set by
> Thread::SetResumeState, and is stored in m_resume_state in the thread.  It
> should start out "running" and never change unless some user command does
> so.
> >>
> >> I don't know why you are ever calling "SetResumeState" on Windows
> threads, that's not something you should have to do.  Who is changing this
> value?  ProcessGDBRemote calls this in one place, but that is when we are
> restarting a process to kill it, and we're just trying to keep all the
> other threads from doing something annoying like crashing while we're
> trying to kill it.  In general, though, internal to lldb you should never
> need to change this.
> >>
> >> The more relevant state is the current thread resume state, which
> records what the thread is going to do on this particular resume. This
> state is stored in m_temporary_resume_state.  That will control the state
> of this thread on resume, but note that should only ever get consulted on
> the "should stop" side of the thread plan logic.
> >
> > This comment wasn't very clear.  What I meant was that it is legit to
> ask "did this thread do anything interesting on last resume" when you stop,
> for instance to decide whether you need to re-fetch its stop reason, etc.
> But you shouldn't decide whether the thread is going to run or not on the
> NEXT resume based on what it did in the previous resume.
> >
> >> And of course if m_resume_state is eStateSuspended, that should
> short-circuit any of our normal logic about this thread.
> >>
> >> There is another tricky bit about the whole resume negotiation, which
> is that you may not need to resume the target at all to perform the
> continue action.  For instance, if you have nested inline functions which
> begin on the same address, then "step in" on that thread just moves the
> virtual "inline depth" to the next younger stack frame, but doesn't move
> the PC at all.  So ShouldResume can return false.
> >>
> >> So the first step is to take out the call to SetResumeState, that's not
> something you should ever have to do.  Then if that causes problems, we
> should figure out what's going wrong on Windows.
> >>
> >> Jim
> >>
> >>
> >>> On Jun 1, 2015, at 1:30 PM, Zachary Turner <zturner at google.com> wrote:
> >>>
> >>> I'm not quite ready to throw the blanket over this one yet :)
> >>>
> >>> What was the value of resume_state when it called WillResume()?  It
> sounds like it was eStateSuspended, which if that's the case, then it still
> seems like something deeper inside of LLDB's thread plans is confused about
> something, because calling WillResume(eStateSuspended) means "The process
> is seriously about to resume, and when it does, this thread is not going to
> remain suspended".
> >>>
> >>> But it's possible I'm not understanding something about the
> interaction between resume states and the thread plans.
> >>>
> >>> Do you have any insight here Jim?
> >>>
> >>> On Mon, Jun 1, 2015 at 1:26 PM Adrian McCarthy <amccarth at google.com>
> wrote:
> >>> I think Zach's right.  The only plugin that calls SetResumeState
> inside WillResume is POSIXThread, and that seems to be overridden by
> FreeBSDThread.
> >>>
> >>> If I remove the call of SetResumeState from
> TargetThreadWindows::WillResume, everything starts to work.
> >>>
> >>> I'll look into adding some logging in ThreadList::WillResume.
> >>>
> >>> Thanks everyone.
> >>>
> >>> On Mon, Jun 1, 2015 at 12:58 PM, Zachary Turner <zturner at google.com>
> wrote:
> >>> Currently ThreadWindows::WillResume() looks like this:
> >>>
> >>>
> >>> void
> >>> TargetThreadWindows::WillResume(lldb::StateType resume_state)
> >>> {
> >>>   SetResumeState(resume_state);
> >>> }
> >>>
> >>> I originally put this code in because that's what one or two of the
> other plugins did and I wasn't sure what the "correct" thing to do was.
> I'm not sure if it's correct though, or if it could be a cause for the
> bug.  But if the resume state is eStateSuspended as you say, then that
> suggests that something lower level already decided that this thread should
> continue to be suspended after the user continues.  So the bug might
> actually still be earlier.
> >>>
> >>> There's not a lot of logging in ThreadList::WillResume, I wonder if it
> would be worth adding some?
> >>>
> >>> On Mon, Jun 1, 2015 at 12:44 PM Adrian McCarthy <amccarth at google.com>
> wrote:
> >>>> The way this works is that when we go to resume the process, all the
> thread's get asked whether they need to stop other threads to implement
> whatever strategy they are currently pursuing.  That query ends up calling
> the currently active thread plan's "StopOthers" method.
> >>>
> >>> Right, and since the ThreadPlanStepOverBreakpoint responds true to
> StopOthers, all the other threads get suspended.  Once the breakpoint is
> restored and stepped over and the ThreadPlanStepOverBreakpoint is popped,
> the rest of the threads are still suspended.
> >>>
> >>>> But ThreadPlanBase::ShouldStop returns false, so if all your threads
> are running just the ThreadPlanBase, then they should all resume.
> >>>
> >>> Except that ShouldStop is not called for threads that are already
> suspended  (Thread::ShouldStop has an early out if the resume state is
> eStateSuspended).  So I still don't see how those threads can ever get out
> of the suspended state.
> >>>
> >>>
> >>>
> >>> On Mon, Jun 1, 2015 at 11:51 AM, Jim Ingham <jingham at apple.com> wrote:
> >>> The way this works is that when we go to resume the process, all the
> thread's get asked whether they need to stop other threads to implement
> whatever strategy they are currently pursuing.  That query ends up calling
> the currently active thread plan's "StopOthers" method.  If one thread
> returns true to StopOthers, then that thread will get to run solo.  If more
> than one thread returns true I do a little round robin to pick which one
> gets to go.  But ThreadPlanBase::ShouldStop returns false, so if all your
> threads are running just the ThreadPlanBase, then they should all resume.
> >>>
> >>> This all happens in ThreadList::WillResume.
> >>>
> >>> Note I started to add some commands to manipulate the thread plans -
> of which "thread plan list" is the relevant one.  The work isn't done yet
> (for instance I should actually DO something with the --internal and
> --verbose options, but for now I only print user visible plans, not
> implementation only plans.  Anyway, if you are poking around in this area
> that might be of some use.
> >>>
> >>> Jim
> >>>
> >>>
> >>>> On Jun 1, 2015, at 8:04 AM, Adrian McCarthy <amccarth at google.com>
> wrote:
> >>>>
> >>>> Thanks for the info.
> >>>>
> >>>> This is not theoretical.  I'm trying to get TestBreakAfterJoin to
> pass on Windows.  Step 1 was to convert it use <thread> instead of
> <pthreads.h>.  Step 2 was to fix some minor issues in TargetThreadWindows.
> >>>>
> >>>> But now the inferior deadlocks because the one thread that's not
> suspended is waiting on the ones that are.  Once the
> ThreadPlanStepOverBreakpoint plan is popped, the current plan is
> ThreadPlanBase, which, as far as I can tell, does nothing to resume
> suspended threads.
> >>>>
> >>>> I'll compare this to what happens on another platform to see if there
> should be some other thread plan in use.
> >>>>
> >>>> Adrian.
> >>>>
> >>>> On Fri, May 29, 2015 at 4:50 PM, Jim Ingham <jingham at apple.com>
> wrote:
> >>>> When stepping over a breakpoint, the ThreadPlanStepOverBreakpoint
> gets pushed, handles the single instruction step - during which it does
> suspend the other threads - then it gets popped.  When you next resume,
> whatever plan was handling the stepping before the breakpoint was hit will
> resume with whatever policy for running other threads it was using.
> >>>>
> >>>> So it's up to the plan that was on the stack before the
> "StepOverBreakpoint" was pushed to decide this.  Most of the more complex
> plans (like step-over/step-into) try to keep the other threads from running
> if possible (unless the user instructed otherwise) but they also will let
> all the threads run if there's something going on that might deadlock.  For
> instance, if you are doing "next" and we step through straight-line
> instructions in a function, we will only run the one thread you are
> stepping in (by default, you can control this with options to the "thread
> step-over" command.   But if we step into a function, we set a breakpoint
> on the return address and then run with all threads resumed because
> stepping out of a function could run arbitrary code.
> >>>>
> >>>> Anyway, was this a theoretical question, or do you have some instance
> where you are actually seeing a deadlock?
> >>>>
> >>>> Jim
> >>>>
> >>>>
> >>>>> On May 29, 2015, at 1:59 PM, Adrian McCarthy <amccarth at google.com>
> wrote:
> >>>>>
> >>>>> [I'm trying to make TestBreakAfterJoin work on Windows.]
> >>>>>
> >>>>> I'm unclear how continuing from a breakpoint in a multi-threaded
> inferior is supposed to work.
> >>>>>
> >>>>> A breakpoint is set, and the inferior runs until one of its threads
> hits the breakpoint.  The user then selects continue.
> >>>>>
> >>>>> The thread that had hit the breakpoint has a thread plan type of
> ThreadPlanStepOverBreakpoint, which causes all of the other threads to be
> set to state eStateSuspended.  The thread that had hit the breakpoint then
> steps beyond the breakpoint, and the breakpoint is restored.  The thread is
> then resumed again.
> >>>>>
> >>>>> But the other threads are all still suspended, causing the inferior
> to deadlock.
> >>>>>
> >>>>> The question is:  Where should the other threads have their resume
> states set back to a running state?
> >>>>>
> >>>>> Adrian
> >>>>> _______________________________________________
> >>>>> lldb-dev mailing list
> >>>>> lldb-dev at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >>>>
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> lldb-dev mailing list
> >>> lldb-dev at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> >>>
> >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20150601/dd2af8ef/attachment.html>