[lldb-dev] More linux process control and IOHandler races

Todd Fiala tfiala at google.com
Mon Aug 18 07:51:34 PDT 2014

Hey Matt,

Shawn's out on vacation until end of week.


On Mon, Aug 18, 2014 at 3:08 AM, Matthew Gardiner <mg11 at csr.com> wrote:

> Hi Shawn,
> Have you spent anymore time looking at this prompt issue (where you see
> the (lldb) prompt even when you use "process launch")? I tried fixing it
> with the attached patch. This fixes the issue fine at the lldb command
> line, but results in the python TestHelloWorld test hanging the call to
> Process::WaitForProcessToStop. I wondered if you had spent anymore time on
> this lately and knew of a complete/better fix...
> thanks
> Matt
> Greg Clayton wrote:
>> Jim and I have discussed the right thing to do which we plan to implement
>> as soon as we get the chance.
>> Current we have two process states: public and private. The private state
>> is always up to date and is used to track what the process is currently
>> doing. The public state gets set when we believe the end user should see an
>> event that will update the GUI or TTY with the current state if needed
>> (stopped or exited). There is a lot of tricky code that uses the private
>> process state to manage things like the thread plans. Things like source
>> level single step might involve starting and stopping and hitting
>> breakpoints and single instruction stepping, but the user only wants to see
>> the final eStateStopped when the source level step is done. Expressions are
>> also tricky, the public state of the process is stopped, but when we run an
>> expression, we might resume the process many times and the user will always
>> see that the public state if stopped. Internally though the thread plans
>> are doing a whole bunch of stuff to the process and the user never hears
>> anything about these (no events go public).
>> So the current public and private stuff uses broadcaster hijacking and
>> all sorts of other tricks to avoid letting the end user know about the
>> starts and stops that they shouldn't know about. We eventually want to have
>> a new class, lets call in lldb_private::ProcessState for now, that manages
>> the current process state and always receives the process events. There
>> would be no more public and private events. We would have one version of
>> the ProcessState subclass that would cause the GUI/TTY to update (replacing
>> the public events) and one ProcessState subclass that would manage running
>> thread plans. There would be stack of these and this will help us avoid the
>> whole public/private/hijacking stuff we have now.
>> Launching a process would end up pushing a new ProcessState class that
>> could handle all the necessary starts and stop we need to get the process
>> to a quiescent state that is presentable to the user. The ProcessState
>> subclass could eat all the events it needs to while launching and then
>> propagate the stopped or running event when it gets popped off the stack.
>> So we should make some simple fixes for now to work around the issue and
>> get things working, but it would also be great to get the new system up and
>> running so we can make this easier in the future for people that want to
>> make changes.
>> Greg
>>  On Aug 8, 2014, at 1:08 AM, Matthew Gardiner <mg11 at csr.com> wrote:
>>> Folks,
>>> Regarding "launching using the shell",  I don't think applies in the
>>> buggy case that myself and Shawn are looking at.
>>> I do:
>>> (lldb) target create ./test
>>> ...
>>> (lldb) process launch
>>> ...
>>> and when I inspected the call to exec, I see that my exe name is the
>>> program passed.
>>> Thanks for the insight into the broadcasting of stop events. That
>>> explains why I see in the ShouldBroadcastEvent, the ShouldStop and
>>> ShouldReportStop calls.
>>> However, it would be nice to know if the first stop event should be
>>> broadcast for "process launch".  I think it's an implementation detail, and
>>> therefore should not. That would help to fix this issue.
>>> However, Shawn's original suggestion to fix this issue circumvents the
>>> above debate, by replacing the call of HandlePrivateEvent with
>>> SetPublicState. So which fix is best? Calling SetPublicState rather than
>>> HandlePrivateEvent is certainly more expedient, and avoids this debate...
>>> but is it more portable/future-proof?
>>> Matt
>>> jingham at apple.com wrote:
>>>> If you are launching using the shell, you'll see more stops before you
>>>> get to the executable you are actually trying to launch.  In that case,
>>>> instead of just running the binary directly you effectively do:
>>>> /bin/bash exec <binary> arg1 arg2
>>>> so that you can get bash (or whatever is set in $SHELL) to do the
>>>> argument expansion for you.  GetResumeCountForLaunchInfo calculates this,
>>>> then it is stuffed into the ProcessLaunchInfo (SetResumeCount).  On Mac OS
>>>> X we always let the Platform launch, then attach, so in that case the
>>>> AttachCompletionHandler does the extra resumes.  I'm not all that familiar
>>>> with how the Linux side work, but it also seems to use the
>>>> ProcessLaunchInfo's resume count.
>>>> Note that in general in lldb not all publicly broadcast stop messages
>>>> are going to result in a stop.  For instance, all the breakpoint command &
>>>> condition handling goes on as a result of the broadcast of the public stop
>>>> event, but the process might just turn around an continue based on that.
>>>> Whether to suppress the broadcast and continue from the private state
>>>> thread or broadcast the event and let the upper levels of lldb take care of
>>>> what happens from there on really depends on where it makes sense to handle
>>>> the stop.  So for the case of breakpoint stops (or stop hooks, another
>>>> example), those end up being equivalent to user typed commands, just done
>>>> for the user automatically by the system.  So having them happen in a world
>>>> where the public state wasn't sync'ed up to the private state ended up
>>>> being very awkward.
>>>> I haven't looked at the launching code in detail recently so I am not
>>>> sure whether it makes sense for the first stop to be handled as it is.
>>>> Jim
>>>>  On Aug 7, 2014, at 6:29 AM, Matthew Gardiner <mg11 at csr.com> wrote:
>>>>> Hi Shawn,
>>>>> I spent some time today looking at how to arrange for ShouldBroadcast
>>>>> to return false for this first stop. I managed to produce a quick hack for
>>>>> this (i.e. just counted the number of stops), but due to other distractions
>>>>> (from the rest of my job) I didn't get that far into discovering a nice way
>>>>> of achieving this...
>>>>> What I did discover is that with my build just doing "process launch"
>>>>> results in 3 stops (and 3 private resumes). That in itself I find
>>>>> surprising, since I was under the impression that I should see the inferior
>>>>> stop just once when exec is trapped by PTRACE_ME.
>>>>> I then discovered that for each of these 3 stops ShouldBroadcast calls
>>>>> Thread::ShouldStop, the Thread::ShouldStop returns true for the first stop
>>>>> and false for the other 2. Looking into these behaviour differences I then
>>>>> found that from within Thread::ShouldStop we then call into the following:
>>>>>     StopInfoSP private_stop_info (GetPrivateStopInfo());
>>>>>     if (private_stop_info && private_stop_info->
>>>>> ShouldStopSynchronous(event_ptr) == false)
>>>>>     {
>>>>> and also
>>>>> bool over_ride_stop = current_plan->ShouldAutoContinue(event_ptr);
>>>>> the results from either of these, it seems providing the reasoning
>>>>> behind the different true/false returns. I'll return to spend a bit more
>>>>> time on this tomorrow. Let me know if you get any further on a similar vein!
>>>>> thanks
>>>>> Matt
>>>>> Shawn Best wrote:
>>>>>> Matt,
>>>>>> I think you are probably right, although there are other places where
>>>>>> it directly calls SetPublicState().   I was wondering about the possibility
>>>>>> there could be other listeners waiting for a broadcast public Stop event.
>>>>>> Is that a possibility?
>>>>>> Some others here were investigating some unit tests that were failing
>>>>>> intermittently (StopHook).  Their description of the problem sounds
>>>>>> unrelated to the launch code, but this patch also magically fixes that.
>>>>>> Shawn.
>>>>>> On 8/6/2014 6:26 AM, Matthew Gardiner wrote:
>>>>>>> Shawn,
>>>>>>> Like I said earlier your patch worked. However I think the right fix
>>>>>>> is to arrange that ShouldBroadcast returns false for this first stop. I
>>>>>>> believe this, because firstly no stops should be reported here since the
>>>>>>> user is only interested in launching a program, and additionally because it
>>>>>>> enables us to fix lldb without removing the call to HandlePrivateEvent.
>>>>>>> This, I think, is important to preserve as the central point for process
>>>>>>> state change handling.
>>>>>>> Matt
>>>>>>> Shawn Best wrote:
>>>>>>>> Hi Matthew,
>>>>>>>> I have also been tracking this bug.  I believe there are other bugs
>>>>>>>> in the unit tests failing indirectly because of this.  I also have a patch
>>>>>>>> that will fix it, but was sitting on it until the other one landed.  These
>>>>>>>> bugs do not show up on OSX since the inferiors are launched separately then
>>>>>>>> attached to.
>>>>>>>> The first odd thing the launching code does is push an IOHandler
>>>>>>>> when it sees the state transition to 'launching'. This is odd because I
>>>>>>>> believe the launching program will always come up in a stopped state which
>>>>>>>> will immediately pop the IOHandler.
>>>>>>>> At launch, the process comes up in the stopped state.  The launch
>>>>>>>> code manually calls HandlePrivateEvent() with the stop event, which then
>>>>>>>> broadcasts the Event.  When HandleProcessEvent gets the public stop, it
>>>>>>>> dumps out the current thread state just as if an executing inferior hit a
>>>>>>>> breakpoint and stopped.
>>>>>>>> One way to fix this would be:
>>>>>>>> 1. Don't push io handler when state is 'launching'
>>>>>>>> 2. Instead of manually calling HandlePrivateEvent, call
>>>>>>>> SetPublicState().
>>>>>>>> Alternately, we could try and debug why ShouldBroadcast() returns
>>>>>>>> true, but that appears to be by design since it is expecting the public
>>>>>>>> stop event to pop the IOHandler that had been pushed when launching.
>>>>>>>> I have attached a patch demonstrating this.  In conjunction with
>>>>>>>> the other patch for IOHandler race condition, it will fix a bunch of this
>>>>>>>> kind of behaviour.
>>>>>>>> Shawn.
>>>>>>>> On 8/5/2014 6:59 AM, Matthew Gardiner wrote:
>>>>>>>>> Jim,
>>>>>>>>> I've been trying to debug an issue (I see it on 64-bit linux)
>>>>>>>>> where, I do "target create" and "process launch" and despite not requesting
>>>>>>>>> *stop at entry*, the first stop (which I believe is just the initial ptrace
>>>>>>>>> attach stop) is reported to the lldb command line. I added some fprintf to
>>>>>>>>> Process::HandlePrivateEvent, which counts the number of eStoppedState
>>>>>>>>> events seen and whether ShouldBroadcastEvent returns true for this event.
>>>>>>>>> Here's the output from my program with diagnostic:
>>>>>>>>> (lldb) target create ~/me/i64-hello.elf
>>>>>>>>> Current executable set to '~/me/i64-hello.elf' (x86_64).
>>>>>>>>> (lldb) process launch
>>>>>>>>> MG Process::HandlePrivateEvent launching stopped_count 0
>>>>>>>>> should_broadcast 1
>>>>>>>>> Process 31393 launching
>>>>>>>>> MG Process::HandlePrivateEvent stopped stopped_count 1
>>>>>>>>> should_broadcast 1
>>>>>>>>> MG Process::HandlePrivateEvent running stopped_count 1
>>>>>>>>> should_broadcast 1
>>>>>>>>> Process 31393 launched: 'i64-hello.elf' (x86_64)
>>>>>>>>> Process 31393 stopped
>>>>>>>>> * thread #1: tid = 31393, 0x0000003675a011f0, name =
>>>>>>>>> 'i64-hello.elf', stop reason = trace
>>>>>>>>>     frame #0: 0x0000003675a011f0
>>>>>>>>> -> 0x3675a011f0:  movq   %rsp, %rdi
>>>>>>>>>    0x3675a011f3:  callq  0x3675a046e0
>>>>>>>>>    0x3675a011f8:  movq   %rax, %r12
>>>>>>>>>    0x3675a011fb:  movl   0x21eb97(%rip), %eax
>>>>>>>>> (lldb) MG Process::HandlePrivateEvent stopped stopped_count 2
>>>>>>>>> should_broadcast 0
>>>>>>>>> MG Process::HandlePrivateEvent running stopped_count 2
>>>>>>>>> should_broadcast 0
>>>>>>>>> MG Process::HandlePrivateEvent stopped stopped_count 3
>>>>>>>>> should_broadcast 0
>>>>>>>>> MG Process::HandlePrivateEvent running stopped_count 3
>>>>>>>>> should_broadcast 0
>>>>>>>>> In summary, lldb reports the inferior to be stopped (even though
>>>>>>>>> /proc/pid/status and lldb "target list" say it is running). Clearly this is
>>>>>>>>> wrong (hence my earlier post).
>>>>>>>>> Am I correct in assuming that when  ShouldBroadcastEvent returns
>>>>>>>>> true this means that lldb should show this event to the debug user? (And
>>>>>>>>> thus hide other events where ShouldBroadcastEvent=false).
>>>>>>>>> What puzzled me was why ShouldBroadcastEvent return true for this
>>>>>>>>> very first stop. Is this a bug?
>>>>>>>>> I also spent sometime at ShouldBroadcastEvent and saw that this:
>>>>>>>>>         case eStateStopped:
>>>>>>>>>         case eStateCrashed:
>>>>>>>>>         case eStateSuspended:
>>>>>>>>>         {
>>>>>>>>>          ....
>>>>>>>>>                 if (was_restarted || should_resume ||
>>>>>>>>> m_resume_requested)
>>>>>>>>>                 {
>>>>>>>>> evaluates as false, and hence the PrivateResume code is not
>>>>>>>>> called... does this seem buggy to you for this very first stop?
>>>>>>>>> I thought I'd try asking you, since in a previous mail from Greg,
>>>>>>>>> he cited you as being a thread-plan expert. (Hope that's ok!). I'd really
>>>>>>>>> appreciate your help in clarifying the above questions for me, and if you
>>>>>>>>> have time, giving me some ideas as to how to trace this one further e.g.
>>>>>>>>> how m_thread_list.ShouldStop and m_thread_list.ShouldReportStop should
>>>>>>>>> behave, etc.
>>>>>>>>> thanks for your help
>>>>>>>>> Matt
>>>>>>>>> Matthew Gardiner wrote:
>>>>>>>>>> Folks,
>>>>>>>>>> In addition to the overlapping prompt race Shawn Best and myself
>>>>>>>>>> are looking at, I'm seeing another issue where if I launch a process, I get
>>>>>>>>>> a stop (presumably the in) being reported to the UI.
>>>>>>>>>> (lldb) target create ~/mydir/i64-hello.elf
>>>>>>>>>> Current executable set to '~/mydir/i64-hello.elf' (x86_64).
>>>>>>>>>> (lldb) process launch
>>>>>>>>>> Process 27238 launching
>>>>>>>>>> Process 27238 launched: '64-hello.elf' (x86_64)
>>>>>>>>>> Process 27238 stopped
>>>>>>>>>> * thread #1: tid = 27238, 0x0000003675a011f0, name =
>>>>>>>>>> 'i64-hello.elf'
>>>>>>>>>>     frame #0:
>>>>>>>>>> (lldb) target list
>>>>>>>>>> Current targets:
>>>>>>>>>> * target #0: i64-hello.elf ( arch=x86_64-unknown-linux,
>>>>>>>>>> platform=host, pid=27238, state=running )
>>>>>>>>>> (lldb)
>>>>>>>>>> As you can see the "target list" reflects that the process is
>>>>>>>>>> running. Which I confirmed by looking at /proc/27238/status.
>>>>>>>>>> Anyone else seeing this?
>>>>>>>>>> thanks
>>>>>>>>>> Matt
>>>>>>>>>> Member of the CSR plc group of companies. CSR plc registered in
>>>>>>>>>> England and Wales, registered number 4187346, registered office Churchill
>>>>>>>>>> House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United
>>>>>>>>>> Kingdom
>>>>>>>>>> More information can be found at www.csr.com <http://www.csr.com>.
>>>>>>>>>> Keep up to date with CSR on our technical blog, www.csr.com/blog
>>>>>>>>>> <http://www.csr.com/blog>, CSR people blog, www.csr.com/people <
>>>>>>>>>> http://www.csr.com/people>, YouTube, www.youtube.com/user/CSRplc
>>>>>>>>>> <http://www.youtube.com/user/CSRplc>, Facebook,
>>>>>>>>>> www.facebook.com/pages/CSR/191038434253534 <
>>>>>>>>>> http://www.facebook.com/pages/CSR/191038434253534>, or follow us
>>>>>>>>>> on Twitter at www.twitter.com/CSR_plc <
>>>>>>>>>> http://www.twitter.com/CSR_plc>.
>>>>>>>>>> New for 2014, you can now access the wide range of products
>>>>>>>>>> powered by aptX at www.aptx.com <http://www.aptx.com>.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> lldb-dev mailing list
>>>>>>>>>> lldb-dev at cs.uiuc.edu <mailto:lldb-dev at cs.uiuc.edu>
>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>>>> To report this email as spam click https://www.mailcontrol.com/
>>>>>>>>>> sr/EjKNgqvIx0TGX2PQPOmvUj!GOBh06pKKNwnTW0ZqkNYNbZeofOurg
>>>>>>>>>> ZMo6Cl2EgPiaCw7kl6fPUTCXaTERp6oIw== <https://www.mailcontrol.com/
>>>>>>>>>> sr/EjKNgqvIx0TGX2PQPOmvUj%21GOBh06pKKNwnTW0ZqkNYNbZeofOu
>>>>>>>>>> rgZMo6Cl2EgPiaCw7kl6fPUTCXaTERp6oIw==> .
>>>>>>>>> _______________________________________________
>>>>>>>>> lldb-dev mailing list
>>>>>>>>> lldb-dev at cs.uiuc.edu <mailto:lldb-dev at cs.uiuc.edu>
>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Todd Fiala | Software Engineer | tfiala at google.com | 650-943-3180
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140818/1c42c7cf/attachment.html>

More information about the lldb-dev mailing list