[Lldb-commits] [PATCH]race condition calling PushProcessIOHandler

Matthew Gardiner mg11 at csr.com
Mon Aug 4 22:01:35 PDT 2014


Thanks for this update, Jim!

Hopefully, we'll have it all fixed for when he returns ;-)

Matt


jingham at apple.com wrote:
> Note, Greg is on vacation this week.  Dunno if he'll chime in from the beach, but he's sure to have opinions.  Just wanted to make sure folks didn't think his silence indicated he didn't...
>
> Jim
>
>> On Aug 4, 2014, at 5:41 AM, Matthew Gardiner <mg11 at csr.com> wrote:
>>
>> Folks,
>>
>> I have been testing Shawn's patch some more today.
>>
>> (lldb) target create ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>> ...
>> (lldb) process launch -s
>> ...
>> lldb) process continue
>> ... lots of output ...
>>
>> Control-c pressed
>> (lldb)
>>
>> Above I found that if I launch the process with "stop at entry", then I continue, then when I control-c to interrupt the prompt returns correctly. With the above session I found I could process continue and Ctrl-c repeatedly and the prompt would return as expected. I also found that using "next" to step the inferior also resulted in the debugger's prompt returning.
>>
>> However, if I launched without -s
>>
>> (lldb) target create ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>> ...
>> (lldb) process launch
>> ... lots of output ...
>>
>> Control-c pressed
>>
>> then the prompt was absent, and hence broken.
>>
>> Also if I create my target, set a breakpoint, then launch:
>>
>> (lldb) target create ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>> ...
>> (lldb) br s -f forever.c -l 12
>> ...
>> (lldb) process launch
>> Process 10292 launching
>> Process 10292 launched: '/home/mg11/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf' (x86_64)
>> Process 10292 stopped
>> * thread #1: tid = 10292, 0x0000003675a011f0, name = 'i64-hello.elf'
>>     frame #0:
>> Process 10292 stopped
>> * thread #1: tid = 10292, 0x0000000000400548 i64-hello.elf`f2(x=8, y=0) + 24 at forever.c:12, name = 'i64-hello.elf', stop reason = breakpoint 1.1
>>     frame #0: 0x0000000000400548 i64-hello.elf`f2(x=8, y=0) + 24 at forever.c:12
>>    9
>>    10          x += 8;
>>    11          y *= x;
>> -> 12          buff[0] = y;
>>    13
>>    14          x = buff[0] + buff[1];
>>    15          return x;
>>
>> Again, no prompt.
>>
>> Attaching to a stopped process using gdb-remote, also results in me having no prompt:
>>
>> (lldb) target create ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>> ...
>> (lldb) gdb-remote 7777
>> Process 10327 stopped
>> * thread #1: tid = 10327, 0x0000003675a011f0, name = 'i64-hello.elf', stop reason = signal SIGSTOP
>>     frame #0: 0x0000003675a011f0
>> -> 0x3675a011f0:  movq   %rsp, %rdi
>>    0x3675a011f3:  callq  0x3675a046e0
>>    0x3675a011f8:  movq   %rax, %r12
>>    0x3675a011fb:  movl   0x21eb97(%rip), %eax
>>
>> So, as I alluded to previously, there are a lot of scenarios where the IOHandler prompt bug manifests. Now that I have read the code, I'm surprised that it does not happen more often and on more architectures. Out of interest my test inferior basically spins producing lots of output. It *may* one reason why I can provoke these bugs so easily. I'm attaching my test code if this helps others witness all of the issues which I'm seeing. I'll see if there are any simple tweaks I can make (in the spirit of Shawn's patch) to fix the above bugs.
>>
>> Matt
>>
>>
>> Matthew Gardiner wrote:
>>> "I think the last suggestion of moving the sync point to Debugger::HandleProcessEvent() would defeat the purpose of the patch since it is called from the EventHandler thread.  The race condition is the main thread returning up the call stack to start another round of commandIO handling before PushProcessIOHandler() gets called."
>>>
>>> I agree with this remark of Shawn's. The whole point of this issue is to block the main thread i.e. the one which calls IOHandlerEditline::GetLine, which is where the (lldb) prompt is printed, i.e.
>>>
>>> #0  lldb_private::IOHandlerEditline::GetLine
>>> #1  0x00007ffff770aceb in lldb_private::IOHandlerEditline::Run
>>> #2  0x00007ffff76f546a in lldb_private::Debugger::ExecuteIOHanders
>>> #3  0x00007ffff77d78bb in lldb_private::CommandInterpreter::RunCommandInterpreter
>>> #4  0x00007ffff7a47955 in lldb::SBDebugger::RunCommandInterpreter
>>> #5  0x0000000000409234 in Driver::MainLoop
>>> #6  0x0000000000409572 in main
>>>
>>> I spent quite a while looking at this problem and the surrounding architecture this morning. The observed problem occurs when the Editline's IOHandler object does not have it's m_active flag cleared before it gets it's Run method called again (i.e. after the last user command is processed). So we have to block this main thread until another IOHandler can be pushed (i.e. the process one). It is the act of pushing an IOHandler which has the effect of clearing the m_active flag in the previous topmost handler.
>>>
>>> Regarding Greg's comment:
>>>
>>> "not having to have every command add a call to process->WaitForPushedIOHandlerSync(...)."
>>>
>>> The problem is that it is only the individual "Command" object implementations that know whether asynchronous processing will occur, and hence whether this "block main thread until IOHandler pushed" action is necessary. And therefore we may need to add this kind of logic piece-meal. It's not great, but with my limited view of lldb's architecture, I can't currently think of a better way.
>>>
>>> Matt
>>>
>>>
>>>
>>> Shawn Best wrote:
>>>> Thanks for the feedback Greg.  I have attached a revised patch based on your suggestions.
>>>>
>>>> - renamed m_pushed_IOHandler to m_iohandler_sync
>>>>
>>>> - got rid of method ClearPushedIOHandlerSync() and make calls directly to m_iohandler_sync.SetValue(false..)
>>>>
>>>> - renamed WaitForPushedIOHandlerSync() to SyncIOHandler()
>>>>
>>>> - only wait in case where m_process_input_reader != NULL
>>>>
>>>> I put the calls to reset the sync flag in both PrivateEventThread (after it has seen a public stop), and after the call to SyncIOHandler() is completed.  As far as I can tell it should work fine, but my preference is normally to reset the flag immediately before using it instead of relying on it set to false.  Having it internal does clean up the code though.
>>>>
>>>> I think the last suggestion of moving the sync point to Debugger::HandleProcessEvent() would defeat the purpose of the patch since it is called from the EventHandler thread.  The race condition is the main thread returning up the call stack to start another round of commandIO handling before PushProcessIOHandler() gets called.
>>>>
>>>>
>>>> On Fri, Aug 1, 2014 at 10:39 AM, Greg Clayton <gclayton at apple.com <mailto:gclayton at apple.com>> wrote:
>>>>
>>>>     Comments:
>>>>
>>>>     Why should anyone outside of lldb_private::Process have to call
>>>>     ClearPushedIOHandlerSync() manually? Can we get rid of this
>>>>     function and just have Process.cpp do it at the right times by
>>>>     directly calling m_pushed_IOHandler.SetValue(false, eBroadcastNever);?
>>>>
>>>>     If so then the following fixes apply:
>>>>     1 - remove Process::ClearPushedIOHandlerSync() since it will be
>>>>     done internally within process.
>>>>     2 - rename m_pushed_IOHandler to m_iohandler_sync
>>>>     3 - rename WaitForPushedIOHandlerSync() to SyncIOHandler
>>>>
>>>>     Other general fixes:
>>>>     1 - the WaitForPushedIOHandlerSync should do nothing if there is
>>>>     no process IOHandler (no stdio or we attached to a process. This
>>>>     can be done by testing m_process_input_reader with "if
>>>>     (m_process_input_reader) { ... }"
>>>>
>>>>     I would also like the fix this sync issue by not having to have
>>>>     every command add a call to
>>>>     process->WaitForPushedIOHandlerSync(...). Can't we sync this in
>>>>     the Debugger::HandleProcessEvent()?
>>>>
>>>>     > On Jul 31, 2014, at 2:57 PM, Shawn Best <sbest at blueshiftinc.com
>>>>     <mailto:sbest at blueshiftinc.com>> wrote:
>>>>     >
>>>>     > oops, with the attachment this time.
>>>>     >
>>>>     >
>>>>     > On Thu, Jul 31, 2014 at 2:56 PM, Shawn Best
>>>>     <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>     > Thanks everyone for the feedback.  I have reworked the patch to
>>>>     use Predicate <bool>, it reads much cleaner now.
>>>>     >
>>>>     > Shawn.
>>>>     >
>>>>     > On Wed, Jul 30, 2014 at 6:34 PM, Greg Clayton
>>>>     <gclayton at apple.com <mailto:gclayton at apple.com>> wrote:
>>>>     > You will want to use a Predicate<bool> here in stead of what you
>>>>     have since it is exactly what we use a predicate for. The following:
>>>>     >
>>>>     > +    bool  m_process_running_sync;         // used with
>>>>     WaitForProcessRunning() synchronization
>>>>     > +    std::condition_variable m_condition_process_running;    //
>>>>     used with WaitForProcessRunning() synchronization
>>>>     > +    std::mutex  m_mutex_process_running;        // used with
>>>>     WaitForProcessRunning() synchronization
>>>>     >
>>>>     > Is exactly what the Predicate class does: protect a value with a
>>>>     mutex and condition.
>>>>     >
>>>>     > The above code should be replaced with:
>>>>     >
>>>>     >      Predicate<bool> m_process_running_sync;
>>>>     >
>>>>     > The API on Predicate should do what you want. See the header
>>>>     file at "lldb/Host/Predicate.h" and also look for other places
>>>>     that use this class to wait for a value to be equal to another
>>>>     value, or wait for a value to not be equal to something.
>>>>     >
>>>>     > Let me know when you have a patch that uses Predicate and we
>>>>     will look at that.
>>>>     >
>>>>     > Greg
>>>>     >
>>>>     > > On Jul 30, 2014, at 4:03 PM, Shawn Best
>>>>     <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>     > >
>>>>     > > I have reworked the patch to use std::condition_variable.
>>>>      This particular sync mechanism was new to me, I hope I used it
>>>>     correctly.  Is it portable across all target platforms/compilers?
>>>>      I tested on linux and OSX.
>>>>     > >
>>>>     > > The timeout is pretty small (1ms) but seems ample based on the
>>>>     measurements I made.
>>>>     > >
>>>>     > >
>>>>     > > On Tue, Jul 29, 2014 at 9:58 PM, Matthew Gardiner
>>>>     <mg11 at csr.com <mailto:mg11 at csr.com>> wrote:
>>>>     > > Cool, let us know how you get on!
>>>>     > > Matt
>>>>     > >
>>>>     > > Shawn Best wrote:
>>>>     > > Thanks for the feedback guys.
>>>>     > >
>>>>     > > Studying the code, I had figured going with a straight int
>>>>     would in practice be most efficient and not run into
>>>>     multi-threaded problems, even if initially appearing a bit risky.
>>>>      I will rework it to use a std::condition_variable.  That will be
>>>>     more robust and readable.
>>>>     > >
>>>>     > > Shawn.
>>>>     > >
>>>>     > > On 7/29/2014 10:53 AM, Zachary Turner wrote:
>>>>     > > Even better would be an std::condition_variable
>>>>     > >
>>>>     > >
>>>>     > > On Mon, Jul 28, 2014 at 10:30 PM, Matthew Gardiner
>>>>     <mg11 at csr.com <mailto:mg11 at csr.com> <mailto:mg11 at csr.com
>>>>     <mailto:mg11 at csr.com>>> wrote:
>>>>     > >
>>>>     > >     Hi Shawn,
>>>>     > >
>>>>     > >     I use 64-bit linux and I see this issue a lot. It usually
>>>>     > >     manifests itself as the prompt just not being printed (or
>>>>     perhaps
>>>>     > >     it just gets overwritten) - regardless - I invoke a
>>>>     command, and
>>>>     > >     I don't see an (lldb) prompt when I should. So I'm well
>>>>     pleased
>>>>     > >     that you are looking at this!
>>>>     > >
>>>>     > >     Would it not be more robust to use a semaphore than usleep to
>>>>     > >     synchronise the problematic threads?
>>>>     > >
>>>>     > >     Although I've not looked too deeply into this particular
>>>>     issue,
>>>>     > >     whenever I've seen similar races, I found that it's almost
>>>>     > >     impossible to pick the right value when using a sleep
>>>>     command. A
>>>>     > >     semaphore, though, should always ensure the waiting thread
>>>>     will
>>>>     > >     wake precisely.
>>>>     > >
>>>>     > >     I'd be happy to help to test such a fix.
>>>>     > >
>>>>     > >     Matt
>>>>     > >
>>>>     > >
>>>>     > >     Shawn Best wrote:
>>>>     > >
>>>>     > >         Hi,
>>>>     > >
>>>>     > >         I have attached a patch which addresses 3 related race
>>>>     > >         conditions that cause the command line (lldb) prompt
>>>>     to get
>>>>     > >         displayed inappropriately and make it appear it is not
>>>>     > >         working correctly.  This issue can be seen on linux and
>>>>     > >         FreeBSD.  I can also artificailly induce the problem
>>>>     on OSX.
>>>>     > >
>>>>     > >         The issue happens when the command handler (in the main
>>>>     > >         thread) issues a command such as run, step or continue.
>>>>     > >          After the command finishes initiating its action, it
>>>>     returns
>>>>     > >         up the call stack and goes back into the main command loop
>>>>     > >         waiting for user input.  Simultaneously, as the inferior
>>>>     > >         process starts up, the MonitorChildProcess thread picks up
>>>>     > >         the change and posts to the PrivateEvent thread.
>>>>     > >          HandePrivateEvent() then calls PushProcessIOHandler()
>>>>     which
>>>>     > >         will disable the command IO handler and give the inferior
>>>>     > >         control of the TTY.  To observe this on OSX, put a
>>>>     > >
>>>>     > >         usleep(100);
>>>>     > >
>>>>     > >         immediately prior the PushProcessIOHandler() in
>>>>     > >         HandlePrivateEvent.
>>>>     > >
>>>>     > >
>>>>     > >         My proposed solution is that after a 'run', 'step', or
>>>>     > >         'continue' command, insert a synchronization point and
>>>>     wait
>>>>     > >         until HandlePrivateEvent knows the inferior process is
>>>>     > >         running and has pushed the IO handler. One context switch
>>>>     > >         (<100us) is usually all the time it takes on my
>>>>     machine.  As
>>>>     > >         an additional safety, I have a timeout (currently 1ms)
>>>>     so it
>>>>     > >         will never hang the main thread.
>>>>     > >
>>>>     > >         Any thoughts, or suggestions would be appreciated.
>>>>     > >
>>>>     > >         Regards,
>>>>     > >         Shawn.
>>>>     > >
>>>>     > >
>>>>     > >         To report this email as spam click here
>>>>     > > <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==>.
>>>>     > >
>>>>     > >
>>>>     > >
>>>>     > > _______________________________________________
>>>>     > >         lldb-commits mailing list
>>>>     > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>     <mailto:lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>>
>>>>     > >
>>>>     > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>     > >
>>>>     > >
>>>>     > >
>>>>     > >
>>>>     > >     Member of the CSR plc group of companies. CSR plc
>>>>     registered in
>>>>     > >     England and Wales, registered number 4187346, registered
>>>>     office
>>>>     > >     Churchill House, Cambridge Business Park, Cowley Road,
>>>>     Cambridge,
>>>>     > >     CB4 0WZ, United Kingdom
>>>>     > >     More information can be found at www.csr.com
>>>>     <http://www.csr.com>
>>>>     > >     <http://www.csr.com>. Keep up to date with CSR on our
>>>>     technical
>>>>     > >     blog, www.csr.com/blog <http://www.csr.com/blog>
>>>>     <http://www.csr.com/blog>, CSR people
>>>>     > >     blog, www.csr.com/people <http://www.csr.com/people>
>>>>     <http://www.csr.com/people>, YouTube,
>>>>     > > www.youtube.com/user/CSRplc
>>>>     <http://www.youtube.com/user/CSRplc>
>>>>     <http://www.youtube.com/user/CSRplc>,
>>>>     > >     Facebook, www.facebook.com/pages/CSR/191038434253534
>>>>     <http://www.facebook.com/pages/CSR/191038434253534>
>>>>     > > <http://www.facebook.com/pages/CSR/191038434253534>, or
>>>>     follow us
>>>>     > >     on Twitter at www.twitter.com/CSR_plc
>>>>     <http://www.twitter.com/CSR_plc>
>>>>     > >     <http://www.twitter.com/CSR_plc>.
>>>>     > >
>>>>     > >     New for 2014, you can now access the wide range of products
>>>>     > >     powered by aptX at www.aptx.com <http://www.aptx.com>
>>>>     <http://www.aptx.com>.
>>>>     > > _______________________________________________
>>>>     > >     lldb-commits mailing list
>>>>     > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>     <mailto:lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>>
>>>>     > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>     > >
>>>>     > >
>>>>     > >
>>>>     > >
>>>>     > >
>>>>     > >
>>>> <sbest_iohandler_race_rev_04.diff>_______________________________________________
>>>>     > > lldb-commits mailing list
>>>>     > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>     > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>     >
>>>>     >
>>>>     >
>>>>     > <sbest_iohandler_race_rev_05.diff>
>>>>
>>>>
>>> _______________________________________________
>>> lldb-commits mailing list
>>> lldb-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>> <forever.c>_______________________________________________
>> lldb-commits mailing list
>> lldb-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits




More information about the lldb-commits mailing list