[Lldb-commits] [PATCH]race condition calling PushProcessIOHandler
Matthew Gardiner
mg11 at csr.com
Thu Aug 7 23:35:49 PDT 2014
Shawn,
To make this kind of solution robust in all scheduling scenarios, for
example on a PC with a lot of additional load/other processes etc. I
think we should make this timeout a lot bigger, if we know that the
ProcessIOHandler will *always* be pushed, I'm thinking at least in terms
of seconds. And if the timeout *does* occur then what class of error
does this represent?
I can currently think of 2 reasons:
1. running event does not occur, presumably due to a runtime error
2. running event occurs, but a logic bug makes ShouldBroadcast return false
Matt
Shawn Best wrote:
> Matt,
>
> You asked earlier about the significance of the 2000us timeout. When I
> was testing an earlier patch:
>
> do
> {
> usleep(0); // force context switch
>
> if(state==stopped)
> break;
>
> } while(curTime < timeout)
>
> I put some timers and counters in the loop to see how it was
> behaving. Most of the time, a single context switch was enough time
> to fix the race condition. Occasionally it would take 3 context
> switches ( in the neighborhood of a few 100us) for flag to be set.
>
> The 2000us timeout number was my swag at a number that would give an
> order of magnitude more time to cover the race, but minimize the
> downside to lldb in case I had not anticipated some condition where it
> would actually hit the timeout. I never saw it hitting the timeout,
> so it could theoretically be much larger.
>
> Shawn.
> On 8/4/2014 10:01 PM, Matthew Gardiner wrote:
>> Thanks for this update, Jim!
>>
>> Hopefully, we'll have it all fixed for when he returns ;-)
>>
>> Matt
>>
>>
>> jingham at apple.com wrote:
>>> Note, Greg is on vacation this week. Dunno if he'll chime in from
>>> the beach, but he's sure to have opinions. Just wanted to make sure
>>> folks didn't think his silence indicated he didn't...
>>>
>>> Jim
>>>
>>>> On Aug 4, 2014, at 5:41 AM, Matthew Gardiner <mg11 at csr.com> wrote:
>>>>
>>>> Folks,
>>>>
>>>> I have been testing Shawn's patch some more today.
>>>>
>>>> (lldb) target create
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) process launch -s
>>>> ...
>>>> lldb) process continue
>>>> ... lots of output ...
>>>>
>>>> Control-c pressed
>>>> (lldb)
>>>>
>>>> Above I found that if I launch the process with "stop at entry",
>>>> then I continue, then when I control-c to interrupt the prompt
>>>> returns correctly. With the above session I found I could process
>>>> continue and Ctrl-c repeatedly and the prompt would return as
>>>> expected. I also found that using "next" to step the inferior also
>>>> resulted in the debugger's prompt returning.
>>>>
>>>> However, if I launched without -s
>>>>
>>>> (lldb) target create
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) process launch
>>>> ... lots of output ...
>>>>
>>>> Control-c pressed
>>>>
>>>> then the prompt was absent, and hence broken.
>>>>
>>>> Also if I create my target, set a breakpoint, then launch:
>>>>
>>>> (lldb) target create
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) br s -f forever.c -l 12
>>>> ...
>>>> (lldb) process launch
>>>> Process 10292 launching
>>>> Process 10292 launched:
>>>> '/home/mg11/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf'
>>>> (x86_64)
>>>> Process 10292 stopped
>>>> * thread #1: tid = 10292, 0x0000003675a011f0, name = 'i64-hello.elf'
>>>> frame #0:
>>>> Process 10292 stopped
>>>> * thread #1: tid = 10292, 0x0000000000400548 i64-hello.elf`f2(x=8,
>>>> y=0) + 24 at forever.c:12, name = 'i64-hello.elf', stop reason =
>>>> breakpoint 1.1
>>>> frame #0: 0x0000000000400548 i64-hello.elf`f2(x=8, y=0) + 24 at
>>>> forever.c:12
>>>> 9
>>>> 10 x += 8;
>>>> 11 y *= x;
>>>> -> 12 buff[0] = y;
>>>> 13
>>>> 14 x = buff[0] + buff[1];
>>>> 15 return x;
>>>>
>>>> Again, no prompt.
>>>>
>>>> Attaching to a stopped process using gdb-remote, also results in me
>>>> having no prompt:
>>>>
>>>> (lldb) target create
>>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>>> ...
>>>> (lldb) gdb-remote 7777
>>>> Process 10327 stopped
>>>> * thread #1: tid = 10327, 0x0000003675a011f0, name =
>>>> 'i64-hello.elf', stop reason = signal SIGSTOP
>>>> frame #0: 0x0000003675a011f0
>>>> -> 0x3675a011f0: movq %rsp, %rdi
>>>> 0x3675a011f3: callq 0x3675a046e0
>>>> 0x3675a011f8: movq %rax, %r12
>>>> 0x3675a011fb: movl 0x21eb97(%rip), %eax
>>>>
>>>> So, as I alluded to previously, there are a lot of scenarios where
>>>> the IOHandler prompt bug manifests. Now that I have read the code,
>>>> I'm surprised that it does not happen more often and on more
>>>> architectures. Out of interest my test inferior basically spins
>>>> producing lots of output. It *may* one reason why I can provoke
>>>> these bugs so easily. I'm attaching my test code if this helps
>>>> others witness all of the issues which I'm seeing. I'll see if
>>>> there are any simple tweaks I can make (in the spirit of Shawn's
>>>> patch) to fix the above bugs.
>>>>
>>>> Matt
>>>>
>>>>
>>>> Matthew Gardiner wrote:
>>>>> "I think the last suggestion of moving the sync point to
>>>>> Debugger::HandleProcessEvent() would defeat the purpose of the
>>>>> patch since it is called from the EventHandler thread. The race
>>>>> condition is the main thread returning up the call stack to start
>>>>> another round of commandIO handling before PushProcessIOHandler()
>>>>> gets called."
>>>>>
>>>>> I agree with this remark of Shawn's. The whole point of this issue
>>>>> is to block the main thread i.e. the one which calls
>>>>> IOHandlerEditline::GetLine, which is where the (lldb) prompt is
>>>>> printed, i.e.
>>>>>
>>>>> #0 lldb_private::IOHandlerEditline::GetLine
>>>>> #1 0x00007ffff770aceb in lldb_private::IOHandlerEditline::Run
>>>>> #2 0x00007ffff76f546a in lldb_private::Debugger::ExecuteIOHanders
>>>>> #3 0x00007ffff77d78bb in
>>>>> lldb_private::CommandInterpreter::RunCommandInterpreter
>>>>> #4 0x00007ffff7a47955 in lldb::SBDebugger::RunCommandInterpreter
>>>>> #5 0x0000000000409234 in Driver::MainLoop
>>>>> #6 0x0000000000409572 in main
>>>>>
>>>>> I spent quite a while looking at this problem and the surrounding
>>>>> architecture this morning. The observed problem occurs when the
>>>>> Editline's IOHandler object does not have it's m_active flag
>>>>> cleared before it gets it's Run method called again (i.e. after
>>>>> the last user command is processed). So we have to block this main
>>>>> thread until another IOHandler can be pushed (i.e. the process
>>>>> one). It is the act of pushing an IOHandler which has the effect
>>>>> of clearing the m_active flag in the previous topmost handler.
>>>>>
>>>>> Regarding Greg's comment:
>>>>>
>>>>> "not having to have every command add a call to
>>>>> process->WaitForPushedIOHandlerSync(...)."
>>>>>
>>>>> The problem is that it is only the individual "Command" object
>>>>> implementations that know whether asynchronous processing will
>>>>> occur, and hence whether this "block main thread until IOHandler
>>>>> pushed" action is necessary. And therefore we may need to add this
>>>>> kind of logic piece-meal. It's not great, but with my limited view
>>>>> of lldb's architecture, I can't currently think of a better way.
>>>>>
>>>>> Matt
>>>>>
>>>>>
>>>>>
>>>>> Shawn Best wrote:
>>>>>> Thanks for the feedback Greg. I have attached a revised patch
>>>>>> based on your suggestions.
>>>>>>
>>>>>> - renamed m_pushed_IOHandler to m_iohandler_sync
>>>>>>
>>>>>> - got rid of method ClearPushedIOHandlerSync() and make calls
>>>>>> directly to m_iohandler_sync.SetValue(false..)
>>>>>>
>>>>>> - renamed WaitForPushedIOHandlerSync() to SyncIOHandler()
>>>>>>
>>>>>> - only wait in case where m_process_input_reader != NULL
>>>>>>
>>>>>> I put the calls to reset the sync flag in both PrivateEventThread
>>>>>> (after it has seen a public stop), and after the call to
>>>>>> SyncIOHandler() is completed. As far as I can tell it should
>>>>>> work fine, but my preference is normally to reset the flag
>>>>>> immediately before using it instead of relying on it set to
>>>>>> false. Having it internal does clean up the code though.
>>>>>>
>>>>>> I think the last suggestion of moving the sync point to
>>>>>> Debugger::HandleProcessEvent() would defeat the purpose of the
>>>>>> patch since it is called from the EventHandler thread. The race
>>>>>> condition is the main thread returning up the call stack to start
>>>>>> another round of commandIO handling before PushProcessIOHandler()
>>>>>> gets called.
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 1, 2014 at 10:39 AM, Greg Clayton <gclayton at apple.com
>>>>>> <mailto:gclayton at apple.com>> wrote:
>>>>>>
>>>>>> Comments:
>>>>>>
>>>>>> Why should anyone outside of lldb_private::Process have to call
>>>>>> ClearPushedIOHandlerSync() manually? Can we get rid of this
>>>>>> function and just have Process.cpp do it at the right times by
>>>>>> directly calling m_pushed_IOHandler.SetValue(false,
>>>>>> eBroadcastNever);?
>>>>>>
>>>>>> If so then the following fixes apply:
>>>>>> 1 - remove Process::ClearPushedIOHandlerSync() since it will be
>>>>>> done internally within process.
>>>>>> 2 - rename m_pushed_IOHandler to m_iohandler_sync
>>>>>> 3 - rename WaitForPushedIOHandlerSync() to SyncIOHandler
>>>>>>
>>>>>> Other general fixes:
>>>>>> 1 - the WaitForPushedIOHandlerSync should do nothing if there is
>>>>>> no process IOHandler (no stdio or we attached to a process. This
>>>>>> can be done by testing m_process_input_reader with "if
>>>>>> (m_process_input_reader) { ... }"
>>>>>>
>>>>>> I would also like the fix this sync issue by not having to have
>>>>>> every command add a call to
>>>>>> process->WaitForPushedIOHandlerSync(...). Can't we sync this in
>>>>>> the Debugger::HandleProcessEvent()?
>>>>>>
>>>>>> > On Jul 31, 2014, at 2:57 PM, Shawn Best
>>>>>> <sbest at blueshiftinc.com
>>>>>> <mailto:sbest at blueshiftinc.com>> wrote:
>>>>>> >
>>>>>> > oops, with the attachment this time.
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jul 31, 2014 at 2:56 PM, Shawn Best
>>>>>> <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>>> > Thanks everyone for the feedback. I have reworked the
>>>>>> patch to
>>>>>> use Predicate <bool>, it reads much cleaner now.
>>>>>> >
>>>>>> > Shawn.
>>>>>> >
>>>>>> > On Wed, Jul 30, 2014 at 6:34 PM, Greg Clayton
>>>>>> <gclayton at apple.com <mailto:gclayton at apple.com>> wrote:
>>>>>> > You will want to use a Predicate<bool> here in stead of
>>>>>> what you
>>>>>> have since it is exactly what we use a predicate for. The
>>>>>> following:
>>>>>> >
>>>>>> > + bool m_process_running_sync; // used with
>>>>>> WaitForProcessRunning() synchronization
>>>>>> > + std::condition_variable
>>>>>> m_condition_process_running; //
>>>>>> used with WaitForProcessRunning() synchronization
>>>>>> > + std::mutex m_mutex_process_running; // used with
>>>>>> WaitForProcessRunning() synchronization
>>>>>> >
>>>>>> > Is exactly what the Predicate class does: protect a value
>>>>>> with a
>>>>>> mutex and condition.
>>>>>> >
>>>>>> > The above code should be replaced with:
>>>>>> >
>>>>>> > Predicate<bool> m_process_running_sync;
>>>>>> >
>>>>>> > The API on Predicate should do what you want. See the header
>>>>>> file at "lldb/Host/Predicate.h" and also look for other places
>>>>>> that use this class to wait for a value to be equal to another
>>>>>> value, or wait for a value to not be equal to something.
>>>>>> >
>>>>>> > Let me know when you have a patch that uses Predicate and we
>>>>>> will look at that.
>>>>>> >
>>>>>> > Greg
>>>>>> >
>>>>>> > > On Jul 30, 2014, at 4:03 PM, Shawn Best
>>>>>> <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>>> > >
>>>>>> > > I have reworked the patch to use std::condition_variable.
>>>>>> This particular sync mechanism was new to me, I hope I used it
>>>>>> correctly. Is it portable across all target
>>>>>> platforms/compilers?
>>>>>> I tested on linux and OSX.
>>>>>> > >
>>>>>> > > The timeout is pretty small (1ms) but seems ample based
>>>>>> on the
>>>>>> measurements I made.
>>>>>> > >
>>>>>> > >
>>>>>> > > On Tue, Jul 29, 2014 at 9:58 PM, Matthew Gardiner
>>>>>> <mg11 at csr.com <mailto:mg11 at csr.com>> wrote:
>>>>>> > > Cool, let us know how you get on!
>>>>>> > > Matt
>>>>>> > >
>>>>>> > > Shawn Best wrote:
>>>>>> > > Thanks for the feedback guys.
>>>>>> > >
>>>>>> > > Studying the code, I had figured going with a straight int
>>>>>> would in practice be most efficient and not run into
>>>>>> multi-threaded problems, even if initially appearing a bit
>>>>>> risky.
>>>>>> I will rework it to use a std::condition_variable. That will be
>>>>>> more robust and readable.
>>>>>> > >
>>>>>> > > Shawn.
>>>>>> > >
>>>>>> > > On 7/29/2014 10:53 AM, Zachary Turner wrote:
>>>>>> > > Even better would be an std::condition_variable
>>>>>> > >
>>>>>> > >
>>>>>> > > On Mon, Jul 28, 2014 at 10:30 PM, Matthew Gardiner
>>>>>> <mg11 at csr.com <mailto:mg11 at csr.com> <mailto:mg11 at csr.com
>>>>>> <mailto:mg11 at csr.com>>> wrote:
>>>>>> > >
>>>>>> > > Hi Shawn,
>>>>>> > >
>>>>>> > > I use 64-bit linux and I see this issue a lot. It
>>>>>> usually
>>>>>> > > manifests itself as the prompt just not being printed
>>>>>> (or
>>>>>> perhaps
>>>>>> > > it just gets overwritten) - regardless - I invoke a
>>>>>> command, and
>>>>>> > > I don't see an (lldb) prompt when I should. So I'm well
>>>>>> pleased
>>>>>> > > that you are looking at this!
>>>>>> > >
>>>>>> > > Would it not be more robust to use a semaphore than
>>>>>> usleep to
>>>>>> > > synchronise the problematic threads?
>>>>>> > >
>>>>>> > > Although I've not looked too deeply into this particular
>>>>>> issue,
>>>>>> > > whenever I've seen similar races, I found that it's
>>>>>> almost
>>>>>> > > impossible to pick the right value when using a sleep
>>>>>> command. A
>>>>>> > > semaphore, though, should always ensure the waiting
>>>>>> thread
>>>>>> will
>>>>>> > > wake precisely.
>>>>>> > >
>>>>>> > > I'd be happy to help to test such a fix.
>>>>>> > >
>>>>>> > > Matt
>>>>>> > >
>>>>>> > >
>>>>>> > > Shawn Best wrote:
>>>>>> > >
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > I have attached a patch which addresses 3 related
>>>>>> race
>>>>>> > > conditions that cause the command line (lldb) prompt
>>>>>> to get
>>>>>> > > displayed inappropriately and make it appear it
>>>>>> is not
>>>>>> > > working correctly. This issue can be seen on
>>>>>> linux and
>>>>>> > > FreeBSD. I can also artificailly induce the problem
>>>>>> on OSX.
>>>>>> > >
>>>>>> > > The issue happens when the command handler (in
>>>>>> the main
>>>>>> > > thread) issues a command such as run, step or
>>>>>> continue.
>>>>>> > > After the command finishes initiating its
>>>>>> action, it
>>>>>> returns
>>>>>> > > up the call stack and goes back into the main
>>>>>> command loop
>>>>>> > > waiting for user input. Simultaneously, as the
>>>>>> inferior
>>>>>> > > process starts up, the MonitorChildProcess thread
>>>>>> picks up
>>>>>> > > the change and posts to the PrivateEvent thread.
>>>>>> > > HandePrivateEvent() then calls
>>>>>> PushProcessIOHandler()
>>>>>> which
>>>>>> > > will disable the command IO handler and give the
>>>>>> inferior
>>>>>> > > control of the TTY. To observe this on OSX, put a
>>>>>> > >
>>>>>> > > usleep(100);
>>>>>> > >
>>>>>> > > immediately prior the PushProcessIOHandler() in
>>>>>> > > HandlePrivateEvent.
>>>>>> > >
>>>>>> > >
>>>>>> > > My proposed solution is that after a 'run',
>>>>>> 'step', or
>>>>>> > > 'continue' command, insert a synchronization
>>>>>> point and
>>>>>> wait
>>>>>> > > until HandlePrivateEvent knows the inferior
>>>>>> process is
>>>>>> > > running and has pushed the IO handler. One
>>>>>> context switch
>>>>>> > > (<100us) is usually all the time it takes on my
>>>>>> machine. As
>>>>>> > > an additional safety, I have a timeout (currently
>>>>>> 1ms)
>>>>>> so it
>>>>>> > > will never hang the main thread.
>>>>>> > >
>>>>>> > > Any thoughts, or suggestions would be appreciated.
>>>>>> > >
>>>>>> > > Regards,
>>>>>> > > Shawn.
>>>>>> > >
>>>>>> > >
>>>>>> > > To report this email as spam click here
>>>>>> > > <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==>.
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > _______________________________________________
>>>>>> > > lldb-commits mailing list
>>>>>> > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>>> <mailto:lldb-commits at cs.uiuc.edu
>>>>>> <mailto:lldb-commits at cs.uiuc.edu>>
>>>>>> > >
>>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > Member of the CSR plc group of companies. CSR plc
>>>>>> registered in
>>>>>> > > England and Wales, registered number 4187346, registered
>>>>>> office
>>>>>> > > Churchill House, Cambridge Business Park, Cowley Road,
>>>>>> Cambridge,
>>>>>> > > CB4 0WZ, United Kingdom
>>>>>> > > More information can be found at www.csr.com
>>>>>> <http://www.csr.com>
>>>>>> > > <http://www.csr.com>. Keep up to date with CSR on our
>>>>>> technical
>>>>>> > > blog, www.csr.com/blog <http://www.csr.com/blog>
>>>>>> <http://www.csr.com/blog>, CSR people
>>>>>> > > blog, www.csr.com/people <http://www.csr.com/people>
>>>>>> <http://www.csr.com/people>, YouTube,
>>>>>> > > www.youtube.com/user/CSRplc
>>>>>> <http://www.youtube.com/user/CSRplc>
>>>>>> <http://www.youtube.com/user/CSRplc>,
>>>>>> > > Facebook, www.facebook.com/pages/CSR/191038434253534
>>>>>> <http://www.facebook.com/pages/CSR/191038434253534>
>>>>>> > > <http://www.facebook.com/pages/CSR/191038434253534>, or
>>>>>> follow us
>>>>>> > > on Twitter at www.twitter.com/CSR_plc
>>>>>> <http://www.twitter.com/CSR_plc>
>>>>>> > > <http://www.twitter.com/CSR_plc>.
>>>>>> > >
>>>>>> > > New for 2014, you can now access the wide range of
>>>>>> products
>>>>>> > > powered by aptX at www.aptx.com <http://www.aptx.com>
>>>>>> <http://www.aptx.com>.
>>>>>> > > _______________________________________________
>>>>>> > > lldb-commits mailing list
>>>>>> > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>>> <mailto:lldb-commits at cs.uiuc.edu
>>>>>> <mailto:lldb-commits at cs.uiuc.edu>>
>>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> <sbest_iohandler_race_rev_04.diff>_______________________________________________
>>>>>>
>>>>>> > > lldb-commits mailing list
>>>>>> > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > <sbest_iohandler_race_rev_05.diff>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> lldb-commits mailing list
>>>>> lldb-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>> <forever.c>_______________________________________________
>>>> lldb-commits mailing list
>>>> lldb-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>
>
More information about the lldb-commits
mailing list