[Lldb-commits] [PATCH]race condition calling PushProcessIOHandler

Thu Aug 7 23:35:49 PDT 2014

Shawn,

To make this kind of solution robust in all scheduling scenarios, for 
example on a PC with a lot of additional load/other processes etc. I 
think we should make this timeout a lot bigger, if we know that the 
ProcessIOHandler will *always* be pushed, I'm thinking at least in terms 
of seconds. And if the timeout *does* occur then what class of error 
does this represent?

I can currently think of 2 reasons:
1. running event does not occur, presumably due to a runtime error
2. running event occurs, but a logic bug makes ShouldBroadcast return false

Matt

Shawn Best wrote:
> Matt,
>
> You asked earlier about the significance of the 2000us timeout. When I 
> was testing an earlier patch:
>
> do
> {
>     usleep(0);        // force context switch
>
>     if(state==stopped)
>         break;
>
> } while(curTime < timeout)
>
> I put some timers and counters in the loop to see how it was 
> behaving.  Most of the time, a single context switch was enough time 
> to fix the race condition.  Occasionally it would take 3 context 
> switches ( in the neighborhood of a few 100us) for flag to be set.
>
> The 2000us timeout number was my swag at a number that would give an 
> order of magnitude more time to cover the race, but minimize the 
> downside to lldb in case I had not anticipated some condition where it 
> would actually hit the timeout.  I never saw it hitting the timeout, 
> so it could theoretically be much larger.
>
> Shawn.
> On 8/4/2014 10:01 PM, Matthew Gardiner wrote:
>> Thanks for this update, Jim!
>>
>> Hopefully, we'll have it all fixed for when he returns ;-)
>>
>> Matt
>>
>>
>> jingham at apple.com wrote:
>>> Note, Greg is on vacation this week. Dunno if he'll chime in from 
>>> the beach, but he's sure to have opinions.  Just wanted to make sure 
>>> folks didn't think his silence indicated he didn't...
>>>
>>> Jim
>>>
>>>> On Aug 4, 2014, at 5:41 AM, Matthew Gardiner <mg11 at csr.com> wrote:
>>>>
>>>> Folks,
>>>>
>>>> I have been testing Shawn's patch some more today.
>>>>
>>>> (lldb) target create 
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) process launch -s
>>>> ...
>>>> lldb) process continue
>>>> ... lots of output ...
>>>>
>>>> Control-c pressed
>>>> (lldb)
>>>>
>>>> Above I found that if I launch the process with "stop at entry", 
>>>> then I continue, then when I control-c to interrupt the prompt 
>>>> returns correctly. With the above session I found I could process 
>>>> continue and Ctrl-c repeatedly and the prompt would return as 
>>>> expected. I also found that using "next" to step the inferior also 
>>>> resulted in the debugger's prompt returning.
>>>>
>>>> However, if I launched without -s
>>>>
>>>> (lldb) target create 
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) process launch
>>>> ... lots of output ...
>>>>
>>>> Control-c pressed
>>>>
>>>> then the prompt was absent, and hence broken.
>>>>
>>>> Also if I create my target, set a breakpoint, then launch:
>>>>
>>>> (lldb) target create 
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) br s -f forever.c -l 12
>>>> ...
>>>> (lldb) process launch
>>>> Process 10292 launching
>>>> Process 10292 launched: 
>>>> '/home/mg11/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf' 
>>>> (x86_64)
>>>> Process 10292 stopped
>>>> * thread #1: tid = 10292, 0x0000003675a011f0, name = 'i64-hello.elf'
>>>>     frame #0:
>>>> Process 10292 stopped
>>>> * thread #1: tid = 10292, 0x0000000000400548 i64-hello.elf`f2(x=8, 
>>>> y=0) + 24 at forever.c:12, name = 'i64-hello.elf', stop reason = 
>>>> breakpoint 1.1
>>>>     frame #0: 0x0000000000400548 i64-hello.elf`f2(x=8, y=0) + 24 at 
>>>> forever.c:12
>>>>    9
>>>>    10          x += 8;
>>>>    11          y *= x;
>>>> -> 12          buff[0] = y;
>>>>    13
>>>>    14          x = buff[0] + buff[1];
>>>>    15          return x;
>>>>
>>>> Again, no prompt.
>>>>
>>>> Attaching to a stopped process using gdb-remote, also results in me 
>>>> having no prompt:
>>>>
>>>> (lldb) target create 
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) gdb-remote 7777
>>>> Process 10327 stopped
>>>> * thread #1: tid = 10327, 0x0000003675a011f0, name = 
>>>> 'i64-hello.elf', stop reason = signal SIGSTOP
>>>>     frame #0: 0x0000003675a011f0
>>>> -> 0x3675a011f0:  movq   %rsp, %rdi
>>>>    0x3675a011f3:  callq  0x3675a046e0
>>>>    0x3675a011f8:  movq   %rax, %r12
>>>>    0x3675a011fb:  movl   0x21eb97(%rip), %eax
>>>>
>>>> So, as I alluded to previously, there are a lot of scenarios where 
>>>> the IOHandler prompt bug manifests. Now that I have read the code, 
>>>> I'm surprised that it does not happen more often and on more 
>>>> architectures. Out of interest my test inferior basically spins 
>>>> producing lots of output. It *may* one reason why I can provoke 
>>>> these bugs so easily. I'm attaching my test code if this helps 
>>>> others witness all of the issues which I'm seeing. I'll see if 
>>>> there are any simple tweaks I can make (in the spirit of Shawn's 
>>>> patch) to fix the above bugs.
>>>>
>>>> Matt
>>>>
>>>>
>>>> Matthew Gardiner wrote:
>>>>> "I think the last suggestion of moving the sync point to 
>>>>> Debugger::HandleProcessEvent() would defeat the purpose of the 
>>>>> patch since it is called from the EventHandler thread.  The race 
>>>>> condition is the main thread returning up the call stack to start 
>>>>> another round of commandIO handling before PushProcessIOHandler() 
>>>>> gets called."
>>>>>
>>>>> I agree with this remark of Shawn's. The whole point of this issue 
>>>>> is to block the main thread i.e. the one which calls 
>>>>> IOHandlerEditline::GetLine, which is where the (lldb) prompt is 
>>>>> printed, i.e.
>>>>>
>>>>> #0  lldb_private::IOHandlerEditline::GetLine
>>>>> #1  0x00007ffff770aceb in lldb_private::IOHandlerEditline::Run
>>>>> #2  0x00007ffff76f546a in lldb_private::Debugger::ExecuteIOHanders
>>>>> #3  0x00007ffff77d78bb in 
>>>>> lldb_private::CommandInterpreter::RunCommandInterpreter
>>>>> #4  0x00007ffff7a47955 in lldb::SBDebugger::RunCommandInterpreter
>>>>> #5  0x0000000000409234 in Driver::MainLoop
>>>>> #6  0x0000000000409572 in main
>>>>>
>>>>> I spent quite a while looking at this problem and the surrounding 
>>>>> architecture this morning. The observed problem occurs when the 
>>>>> Editline's IOHandler object does not have it's m_active flag 
>>>>> cleared before it gets it's Run method called again (i.e. after 
>>>>> the last user command is processed). So we have to block this main 
>>>>> thread until another IOHandler can be pushed (i.e. the process 
>>>>> one). It is the act of pushing an IOHandler which has the effect 
>>>>> of clearing the m_active flag in the previous topmost handler.
>>>>>
>>>>> Regarding Greg's comment:
>>>>>
>>>>> "not having to have every command add a call to 
>>>>> process->WaitForPushedIOHandlerSync(...)."
>>>>>
>>>>> The problem is that it is only the individual "Command" object 
>>>>> implementations that know whether asynchronous processing will 
>>>>> occur, and hence whether this "block main thread until IOHandler 
>>>>> pushed" action is necessary. And therefore we may need to add this 
>>>>> kind of logic piece-meal. It's not great, but with my limited view 
>>>>> of lldb's architecture, I can't currently think of a better way.
>>>>>
>>>>> Matt
>>>>>
>>>>>
>>>>>
>>>>> Shawn Best wrote:
>>>>>> Thanks for the feedback Greg.  I have attached a revised patch 
>>>>>> based on your suggestions.
>>>>>>
>>>>>> - renamed m_pushed_IOHandler to m_iohandler_sync
>>>>>>
>>>>>> - got rid of method ClearPushedIOHandlerSync() and make calls 
>>>>>> directly to m_iohandler_sync.SetValue(false..)
>>>>>>
>>>>>> - renamed WaitForPushedIOHandlerSync() to SyncIOHandler()
>>>>>>
>>>>>> - only wait in case where m_process_input_reader != NULL
>>>>>>
>>>>>> I put the calls to reset the sync flag in both PrivateEventThread 
>>>>>> (after it has seen a public stop), and after the call to 
>>>>>> SyncIOHandler() is completed.  As far as I can tell it should 
>>>>>> work fine, but my preference is normally to reset the flag 
>>>>>> immediately before using it instead of relying on it set to 
>>>>>> false.  Having it internal does clean up the code though.
>>>>>>
>>>>>> I think the last suggestion of moving the sync point to 
>>>>>> Debugger::HandleProcessEvent() would defeat the purpose of the 
>>>>>> patch since it is called from the EventHandler thread.  The race 
>>>>>> condition is the main thread returning up the call stack to start 
>>>>>> another round of commandIO handling before PushProcessIOHandler() 
>>>>>> gets called.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 10:39 AM, Greg Clayton <gclayton at apple.com 
>>>>>> <mailto:gclayton at apple.com>> wrote:
>>>>>>
>>>>>>     Comments:
>>>>>>
>>>>>>     Why should anyone outside of lldb_private::Process have to call
>>>>>>     ClearPushedIOHandlerSync() manually? Can we get rid of this
>>>>>>     function and just have Process.cpp do it at the right times by
>>>>>>     directly calling m_pushed_IOHandler.SetValue(false, 
>>>>>> eBroadcastNever);?
>>>>>>
>>>>>>     If so then the following fixes apply:
>>>>>>     1 - remove Process::ClearPushedIOHandlerSync() since it will be
>>>>>>     done internally within process.
>>>>>>     2 - rename m_pushed_IOHandler to m_iohandler_sync
>>>>>>     3 - rename WaitForPushedIOHandlerSync() to SyncIOHandler
>>>>>>
>>>>>>     Other general fixes:
>>>>>>     1 - the WaitForPushedIOHandlerSync should do nothing if there is
>>>>>>     no process IOHandler (no stdio or we attached to a process. This
>>>>>>     can be done by testing m_process_input_reader with "if
>>>>>>     (m_process_input_reader) { ... }"
>>>>>>
>>>>>>     I would also like the fix this sync issue by not having to have
>>>>>>     every command add a call to
>>>>>>     process->WaitForPushedIOHandlerSync(...). Can't we sync this in
>>>>>>     the Debugger::HandleProcessEvent()?
>>>>>>
>>>>>>     > On Jul 31, 2014, at 2:57 PM, Shawn Best 
>>>>>> <sbest at blueshiftinc.com
>>>>>>     <mailto:sbest at blueshiftinc.com>> wrote:
>>>>>>     >
>>>>>>     > oops, with the attachment this time.
>>>>>>     >
>>>>>>     >
>>>>>>     > On Thu, Jul 31, 2014 at 2:56 PM, Shawn Best
>>>>>>     <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>>>     > Thanks everyone for the feedback.  I have reworked the 
>>>>>> patch to
>>>>>>     use Predicate <bool>, it reads much cleaner now.
>>>>>>     >
>>>>>>     > Shawn.
>>>>>>     >
>>>>>>     > On Wed, Jul 30, 2014 at 6:34 PM, Greg Clayton
>>>>>>     <gclayton at apple.com <mailto:gclayton at apple.com>> wrote:
>>>>>>     > You will want to use a Predicate<bool> here in stead of 
>>>>>> what you
>>>>>>     have since it is exactly what we use a predicate for. The 
>>>>>> following:
>>>>>>     >
>>>>>>     > +    bool  m_process_running_sync;         // used with
>>>>>>     WaitForProcessRunning() synchronization
>>>>>>     > +    std::condition_variable 
>>>>>> m_condition_process_running;    //
>>>>>>     used with WaitForProcessRunning() synchronization
>>>>>>     > +    std::mutex  m_mutex_process_running; // used with
>>>>>>     WaitForProcessRunning() synchronization
>>>>>>     >
>>>>>>     > Is exactly what the Predicate class does: protect a value 
>>>>>> with a
>>>>>>     mutex and condition.
>>>>>>     >
>>>>>>     > The above code should be replaced with:
>>>>>>     >
>>>>>>     >      Predicate<bool> m_process_running_sync;
>>>>>>     >
>>>>>>     > The API on Predicate should do what you want. See the header
>>>>>>     file at "lldb/Host/Predicate.h" and also look for other places
>>>>>>     that use this class to wait for a value to be equal to another
>>>>>>     value, or wait for a value to not be equal to something.
>>>>>>     >
>>>>>>     > Let me know when you have a patch that uses Predicate and we
>>>>>>     will look at that.
>>>>>>     >
>>>>>>     > Greg
>>>>>>     >
>>>>>>     > > On Jul 30, 2014, at 4:03 PM, Shawn Best
>>>>>>     <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>>>     > >
>>>>>>     > > I have reworked the patch to use std::condition_variable.
>>>>>>      This particular sync mechanism was new to me, I hope I used it
>>>>>>     correctly.  Is it portable across all target 
>>>>>> platforms/compilers?
>>>>>>      I tested on linux and OSX.
>>>>>>     > >
>>>>>>     > > The timeout is pretty small (1ms) but seems ample based 
>>>>>> on the
>>>>>>     measurements I made.
>>>>>>     > >
>>>>>>     > >
>>>>>>     > > On Tue, Jul 29, 2014 at 9:58 PM, Matthew Gardiner
>>>>>>     <mg11 at csr.com <mailto:mg11 at csr.com>> wrote:
>>>>>>     > > Cool, let us know how you get on!
>>>>>>     > > Matt
>>>>>>     > >
>>>>>>     > > Shawn Best wrote:
>>>>>>     > > Thanks for the feedback guys.
>>>>>>     > >
>>>>>>     > > Studying the code, I had figured going with a straight int
>>>>>>     would in practice be most efficient and not run into
>>>>>>     multi-threaded problems, even if initially appearing a bit 
>>>>>> risky.
>>>>>>      I will rework it to use a std::condition_variable. That will be
>>>>>>     more robust and readable.
>>>>>>     > >
>>>>>>     > > Shawn.
>>>>>>     > >
>>>>>>     > > On 7/29/2014 10:53 AM, Zachary Turner wrote:
>>>>>>     > > Even better would be an std::condition_variable
>>>>>>     > >
>>>>>>     > >
>>>>>>     > > On Mon, Jul 28, 2014 at 10:30 PM, Matthew Gardiner
>>>>>>     <mg11 at csr.com <mailto:mg11 at csr.com> <mailto:mg11 at csr.com
>>>>>>     <mailto:mg11 at csr.com>>> wrote:
>>>>>>     > >
>>>>>>     > >     Hi Shawn,
>>>>>>     > >
>>>>>>     > >     I use 64-bit linux and I see this issue a lot. It 
>>>>>> usually
>>>>>>     > >     manifests itself as the prompt just not being printed 
>>>>>> (or
>>>>>>     perhaps
>>>>>>     > >     it just gets overwritten) - regardless - I invoke a
>>>>>>     command, and
>>>>>>     > >     I don't see an (lldb) prompt when I should. So I'm well
>>>>>>     pleased
>>>>>>     > >     that you are looking at this!
>>>>>>     > >
>>>>>>     > >     Would it not be more robust to use a semaphore than 
>>>>>> usleep to
>>>>>>     > >     synchronise the problematic threads?
>>>>>>     > >
>>>>>>     > >     Although I've not looked too deeply into this particular
>>>>>>     issue,
>>>>>>     > >     whenever I've seen similar races, I found that it's 
>>>>>> almost
>>>>>>     > >     impossible to pick the right value when using a sleep
>>>>>>     command. A
>>>>>>     > >     semaphore, though, should always ensure the waiting 
>>>>>> thread
>>>>>>     will
>>>>>>     > >     wake precisely.
>>>>>>     > >
>>>>>>     > >     I'd be happy to help to test such a fix.
>>>>>>     > >
>>>>>>     > >     Matt
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >     Shawn Best wrote:
>>>>>>     > >
>>>>>>     > >         Hi,
>>>>>>     > >
>>>>>>     > >         I have attached a patch which addresses 3 related 
>>>>>> race
>>>>>>     > >         conditions that cause the command line (lldb) prompt
>>>>>>     to get
>>>>>>     > >         displayed inappropriately and make it appear it 
>>>>>> is not
>>>>>>     > >         working correctly.  This issue can be seen on 
>>>>>> linux and
>>>>>>     > >         FreeBSD.  I can also artificailly induce the problem
>>>>>>     on OSX.
>>>>>>     > >
>>>>>>     > >         The issue happens when the command handler (in 
>>>>>> the main
>>>>>>     > >         thread) issues a command such as run, step or 
>>>>>> continue.
>>>>>>     > >          After the command finishes initiating its 
>>>>>> action, it
>>>>>>     returns
>>>>>>     > >         up the call stack and goes back into the main 
>>>>>> command loop
>>>>>>     > >         waiting for user input. Simultaneously, as the 
>>>>>> inferior
>>>>>>     > >         process starts up, the MonitorChildProcess thread 
>>>>>> picks up
>>>>>>     > >         the change and posts to the PrivateEvent thread.
>>>>>>     > >          HandePrivateEvent() then calls 
>>>>>> PushProcessIOHandler()
>>>>>>     which
>>>>>>     > >         will disable the command IO handler and give the 
>>>>>> inferior
>>>>>>     > >         control of the TTY.  To observe this on OSX, put a
>>>>>>     > >
>>>>>>     > >         usleep(100);
>>>>>>     > >
>>>>>>     > >         immediately prior the PushProcessIOHandler() in
>>>>>>     > >         HandlePrivateEvent.
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >         My proposed solution is that after a 'run', 
>>>>>> 'step', or
>>>>>>     > >         'continue' command, insert a synchronization 
>>>>>> point and
>>>>>>     wait
>>>>>>     > >         until HandlePrivateEvent knows the inferior 
>>>>>> process is
>>>>>>     > >         running and has pushed the IO handler. One 
>>>>>> context switch
>>>>>>     > >         (<100us) is usually all the time it takes on my
>>>>>>     machine.  As
>>>>>>     > >         an additional safety, I have a timeout (currently 
>>>>>> 1ms)
>>>>>>     so it
>>>>>>     > >         will never hang the main thread.
>>>>>>     > >
>>>>>>     > >         Any thoughts, or suggestions would be appreciated.
>>>>>>     > >
>>>>>>     > >         Regards,
>>>>>>     > >         Shawn.
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >         To report this email as spam click here
>>>>>>     > > <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==>.
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >
>>>>>>     > > _______________________________________________
>>>>>>     > >         lldb-commits mailing list
>>>>>>     > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>>>     <mailto:lldb-commits at cs.uiuc.edu 
>>>>>> <mailto:lldb-commits at cs.uiuc.edu>>
>>>>>>     > >
>>>>>>     > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >     Member of the CSR plc group of companies. CSR plc
>>>>>>     registered in
>>>>>>     > >     England and Wales, registered number 4187346, registered
>>>>>>     office
>>>>>>     > >     Churchill House, Cambridge Business Park, Cowley Road,
>>>>>>     Cambridge,
>>>>>>     > >     CB4 0WZ, United Kingdom
>>>>>>     > >     More information can be found at www.csr.com
>>>>>>     <http://www.csr.com>
>>>>>>     > >     <http://www.csr.com>. Keep up to date with CSR on our
>>>>>>     technical
>>>>>>     > >     blog, www.csr.com/blog <http://www.csr.com/blog>
>>>>>>     <http://www.csr.com/blog>, CSR people
>>>>>>     > >     blog, www.csr.com/people <http://www.csr.com/people>
>>>>>>     <http://www.csr.com/people>, YouTube,
>>>>>>     > > www.youtube.com/user/CSRplc
>>>>>>     <http://www.youtube.com/user/CSRplc>
>>>>>>     <http://www.youtube.com/user/CSRplc>,
>>>>>>     > >     Facebook, www.facebook.com/pages/CSR/191038434253534
>>>>>> <http://www.facebook.com/pages/CSR/191038434253534>
>>>>>>     > > <http://www.facebook.com/pages/CSR/191038434253534>, or
>>>>>>     follow us
>>>>>>     > >     on Twitter at www.twitter.com/CSR_plc
>>>>>>     <http://www.twitter.com/CSR_plc>
>>>>>>     > > <http://www.twitter.com/CSR_plc>.
>>>>>>     > >
>>>>>>     > >     New for 2014, you can now access the wide range of 
>>>>>> products
>>>>>>     > >     powered by aptX at www.aptx.com <http://www.aptx.com>
>>>>>>     <http://www.aptx.com>.
>>>>>>     > > _______________________________________________
>>>>>>     > >     lldb-commits mailing list
>>>>>>     > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>>>     <mailto:lldb-commits at cs.uiuc.edu 
>>>>>> <mailto:lldb-commits at cs.uiuc.edu>>
>>>>>>     > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >
>>>>>>     > >
>>>>>> <sbest_iohandler_race_rev_04.diff>_______________________________________________ 
>>>>>>
>>>>>>     > > lldb-commits mailing list
>>>>>>     > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>>>     > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>>>     >
>>>>>>     >
>>>>>>     >
>>>>>>     > <sbest_iohandler_race_rev_05.diff>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> lldb-commits mailing list
>>>>> lldb-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>> <forever.c>_______________________________________________
>>>> lldb-commits mailing list
>>>> lldb-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>
>