[Lldb-commits] [PATCH]race condition calling PushProcessIOHandler
Shawn Best
sbest at blueshiftinc.com
Wed Aug 6 18:12:59 PDT 2014
Matt,
You asked earlier about the significance of the 2000us timeout. When I
was testing an earlier patch:
do
{
usleep(0); // force context switch
if(state==stopped)
break;
} while(curTime < timeout)
I put some timers and counters in the loop to see how it was behaving.
Most of the time, a single context switch was enough time to fix the
race condition. Occasionally it would take 3 context switches ( in the
neighborhood of a few 100us) for flag to be set.
The 2000us timeout number was my swag at a number that would give an
order of magnitude more time to cover the race, but minimize the
downside to lldb in case I had not anticipated some condition where it
would actually hit the timeout. I never saw it hitting the timeout, so
it could theoretically be much larger.
Shawn.
On 8/4/2014 10:01 PM, Matthew Gardiner wrote:
> Thanks for this update, Jim!
>
> Hopefully, we'll have it all fixed for when he returns ;-)
>
> Matt
>
>
> jingham at apple.com wrote:
>> Note, Greg is on vacation this week. Dunno if he'll chime in from the
>> beach, but he's sure to have opinions. Just wanted to make sure
>> folks didn't think his silence indicated he didn't...
>>
>> Jim
>>
>>> On Aug 4, 2014, at 5:41 AM, Matthew Gardiner <mg11 at csr.com> wrote:
>>>
>>> Folks,
>>>
>>> I have been testing Shawn's patch some more today.
>>>
>>> (lldb) target create
>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>> ...
>>> (lldb) process launch -s
>>> ...
>>> lldb) process continue
>>> ... lots of output ...
>>>
>>> Control-c pressed
>>> (lldb)
>>>
>>> Above I found that if I launch the process with "stop at entry",
>>> then I continue, then when I control-c to interrupt the prompt
>>> returns correctly. With the above session I found I could process
>>> continue and Ctrl-c repeatedly and the prompt would return as
>>> expected. I also found that using "next" to step the inferior also
>>> resulted in the debugger's prompt returning.
>>>
>>> However, if I launched without -s
>>>
>>> (lldb) target create
>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>> ...
>>> (lldb) process launch
>>> ... lots of output ...
>>>
>>> Control-c pressed
>>>
>>> then the prompt was absent, and hence broken.
>>>
>>> Also if I create my target, set a breakpoint, then launch:
>>>
>>> (lldb) target create
>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>> ...
>>> (lldb) br s -f forever.c -l 12
>>> ...
>>> (lldb) process launch
>>> Process 10292 launching
>>> Process 10292 launched:
>>> '/home/mg11/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf'
>>> (x86_64)
>>> Process 10292 stopped
>>> * thread #1: tid = 10292, 0x0000003675a011f0, name = 'i64-hello.elf'
>>> frame #0:
>>> Process 10292 stopped
>>> * thread #1: tid = 10292, 0x0000000000400548 i64-hello.elf`f2(x=8,
>>> y=0) + 24 at forever.c:12, name = 'i64-hello.elf', stop reason =
>>> breakpoint 1.1
>>> frame #0: 0x0000000000400548 i64-hello.elf`f2(x=8, y=0) + 24 at
>>> forever.c:12
>>> 9
>>> 10 x += 8;
>>> 11 y *= x;
>>> -> 12 buff[0] = y;
>>> 13
>>> 14 x = buff[0] + buff[1];
>>> 15 return x;
>>>
>>> Again, no prompt.
>>>
>>> Attaching to a stopped process using gdb-remote, also results in me
>>> having no prompt:
>>>
>>> (lldb) target create
>>> ~/src/main/devtools/main/heracles/csrgdbserver/test/examples/simple/i64-hello.elf
>>> ...
>>> (lldb) gdb-remote 7777
>>> Process 10327 stopped
>>> * thread #1: tid = 10327, 0x0000003675a011f0, name =
>>> 'i64-hello.elf', stop reason = signal SIGSTOP
>>> frame #0: 0x0000003675a011f0
>>> -> 0x3675a011f0: movq %rsp, %rdi
>>> 0x3675a011f3: callq 0x3675a046e0
>>> 0x3675a011f8: movq %rax, %r12
>>> 0x3675a011fb: movl 0x21eb97(%rip), %eax
>>>
>>> So, as I alluded to previously, there are a lot of scenarios where
>>> the IOHandler prompt bug manifests. Now that I have read the code,
>>> I'm surprised that it does not happen more often and on more
>>> architectures. Out of interest my test inferior basically spins
>>> producing lots of output. It *may* one reason why I can provoke
>>> these bugs so easily. I'm attaching my test code if this helps
>>> others witness all of the issues which I'm seeing. I'll see if there
>>> are any simple tweaks I can make (in the spirit of Shawn's patch) to
>>> fix the above bugs.
>>>
>>> Matt
>>>
>>>
>>> Matthew Gardiner wrote:
>>>> "I think the last suggestion of moving the sync point to
>>>> Debugger::HandleProcessEvent() would defeat the purpose of the
>>>> patch since it is called from the EventHandler thread. The race
>>>> condition is the main thread returning up the call stack to start
>>>> another round of commandIO handling before PushProcessIOHandler()
>>>> gets called."
>>>>
>>>> I agree with this remark of Shawn's. The whole point of this issue
>>>> is to block the main thread i.e. the one which calls
>>>> IOHandlerEditline::GetLine, which is where the (lldb) prompt is
>>>> printed, i.e.
>>>>
>>>> #0 lldb_private::IOHandlerEditline::GetLine
>>>> #1 0x00007ffff770aceb in lldb_private::IOHandlerEditline::Run
>>>> #2 0x00007ffff76f546a in lldb_private::Debugger::ExecuteIOHanders
>>>> #3 0x00007ffff77d78bb in
>>>> lldb_private::CommandInterpreter::RunCommandInterpreter
>>>> #4 0x00007ffff7a47955 in lldb::SBDebugger::RunCommandInterpreter
>>>> #5 0x0000000000409234 in Driver::MainLoop
>>>> #6 0x0000000000409572 in main
>>>>
>>>> I spent quite a while looking at this problem and the surrounding
>>>> architecture this morning. The observed problem occurs when the
>>>> Editline's IOHandler object does not have it's m_active flag
>>>> cleared before it gets it's Run method called again (i.e. after the
>>>> last user command is processed). So we have to block this main
>>>> thread until another IOHandler can be pushed (i.e. the process
>>>> one). It is the act of pushing an IOHandler which has the effect of
>>>> clearing the m_active flag in the previous topmost handler.
>>>>
>>>> Regarding Greg's comment:
>>>>
>>>> "not having to have every command add a call to
>>>> process->WaitForPushedIOHandlerSync(...)."
>>>>
>>>> The problem is that it is only the individual "Command" object
>>>> implementations that know whether asynchronous processing will
>>>> occur, and hence whether this "block main thread until IOHandler
>>>> pushed" action is necessary. And therefore we may need to add this
>>>> kind of logic piece-meal. It's not great, but with my limited view
>>>> of lldb's architecture, I can't currently think of a better way.
>>>>
>>>> Matt
>>>>
>>>>
>>>>
>>>> Shawn Best wrote:
>>>>> Thanks for the feedback Greg. I have attached a revised patch
>>>>> based on your suggestions.
>>>>>
>>>>> - renamed m_pushed_IOHandler to m_iohandler_sync
>>>>>
>>>>> - got rid of method ClearPushedIOHandlerSync() and make calls
>>>>> directly to m_iohandler_sync.SetValue(false..)
>>>>>
>>>>> - renamed WaitForPushedIOHandlerSync() to SyncIOHandler()
>>>>>
>>>>> - only wait in case where m_process_input_reader != NULL
>>>>>
>>>>> I put the calls to reset the sync flag in both PrivateEventThread
>>>>> (after it has seen a public stop), and after the call to
>>>>> SyncIOHandler() is completed. As far as I can tell it should work
>>>>> fine, but my preference is normally to reset the flag immediately
>>>>> before using it instead of relying on it set to false. Having it
>>>>> internal does clean up the code though.
>>>>>
>>>>> I think the last suggestion of moving the sync point to
>>>>> Debugger::HandleProcessEvent() would defeat the purpose of the
>>>>> patch since it is called from the EventHandler thread. The race
>>>>> condition is the main thread returning up the call stack to start
>>>>> another round of commandIO handling before PushProcessIOHandler()
>>>>> gets called.
>>>>>
>>>>>
>>>>> On Fri, Aug 1, 2014 at 10:39 AM, Greg Clayton <gclayton at apple.com
>>>>> <mailto:gclayton at apple.com>> wrote:
>>>>>
>>>>> Comments:
>>>>>
>>>>> Why should anyone outside of lldb_private::Process have to call
>>>>> ClearPushedIOHandlerSync() manually? Can we get rid of this
>>>>> function and just have Process.cpp do it at the right times by
>>>>> directly calling m_pushed_IOHandler.SetValue(false,
>>>>> eBroadcastNever);?
>>>>>
>>>>> If so then the following fixes apply:
>>>>> 1 - remove Process::ClearPushedIOHandlerSync() since it will be
>>>>> done internally within process.
>>>>> 2 - rename m_pushed_IOHandler to m_iohandler_sync
>>>>> 3 - rename WaitForPushedIOHandlerSync() to SyncIOHandler
>>>>>
>>>>> Other general fixes:
>>>>> 1 - the WaitForPushedIOHandlerSync should do nothing if there is
>>>>> no process IOHandler (no stdio or we attached to a process. This
>>>>> can be done by testing m_process_input_reader with "if
>>>>> (m_process_input_reader) { ... }"
>>>>>
>>>>> I would also like the fix this sync issue by not having to have
>>>>> every command add a call to
>>>>> process->WaitForPushedIOHandlerSync(...). Can't we sync this in
>>>>> the Debugger::HandleProcessEvent()?
>>>>>
>>>>> > On Jul 31, 2014, at 2:57 PM, Shawn Best <sbest at blueshiftinc.com
>>>>> <mailto:sbest at blueshiftinc.com>> wrote:
>>>>> >
>>>>> > oops, with the attachment this time.
>>>>> >
>>>>> >
>>>>> > On Thu, Jul 31, 2014 at 2:56 PM, Shawn Best
>>>>> <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>> > Thanks everyone for the feedback. I have reworked the patch to
>>>>> use Predicate <bool>, it reads much cleaner now.
>>>>> >
>>>>> > Shawn.
>>>>> >
>>>>> > On Wed, Jul 30, 2014 at 6:34 PM, Greg Clayton
>>>>> <gclayton at apple.com <mailto:gclayton at apple.com>> wrote:
>>>>> > You will want to use a Predicate<bool> here in stead of what
>>>>> you
>>>>> have since it is exactly what we use a predicate for. The
>>>>> following:
>>>>> >
>>>>> > + bool m_process_running_sync; // used with
>>>>> WaitForProcessRunning() synchronization
>>>>> > + std::condition_variable m_condition_process_running; //
>>>>> used with WaitForProcessRunning() synchronization
>>>>> > + std::mutex m_mutex_process_running; // used with
>>>>> WaitForProcessRunning() synchronization
>>>>> >
>>>>> > Is exactly what the Predicate class does: protect a value
>>>>> with a
>>>>> mutex and condition.
>>>>> >
>>>>> > The above code should be replaced with:
>>>>> >
>>>>> > Predicate<bool> m_process_running_sync;
>>>>> >
>>>>> > The API on Predicate should do what you want. See the header
>>>>> file at "lldb/Host/Predicate.h" and also look for other places
>>>>> that use this class to wait for a value to be equal to another
>>>>> value, or wait for a value to not be equal to something.
>>>>> >
>>>>> > Let me know when you have a patch that uses Predicate and we
>>>>> will look at that.
>>>>> >
>>>>> > Greg
>>>>> >
>>>>> > > On Jul 30, 2014, at 4:03 PM, Shawn Best
>>>>> <sbest at blueshiftinc.com <mailto:sbest at blueshiftinc.com>> wrote:
>>>>> > >
>>>>> > > I have reworked the patch to use std::condition_variable.
>>>>> This particular sync mechanism was new to me, I hope I used it
>>>>> correctly. Is it portable across all target platforms/compilers?
>>>>> I tested on linux and OSX.
>>>>> > >
>>>>> > > The timeout is pretty small (1ms) but seems ample based on
>>>>> the
>>>>> measurements I made.
>>>>> > >
>>>>> > >
>>>>> > > On Tue, Jul 29, 2014 at 9:58 PM, Matthew Gardiner
>>>>> <mg11 at csr.com <mailto:mg11 at csr.com>> wrote:
>>>>> > > Cool, let us know how you get on!
>>>>> > > Matt
>>>>> > >
>>>>> > > Shawn Best wrote:
>>>>> > > Thanks for the feedback guys.
>>>>> > >
>>>>> > > Studying the code, I had figured going with a straight int
>>>>> would in practice be most efficient and not run into
>>>>> multi-threaded problems, even if initially appearing a bit risky.
>>>>> I will rework it to use a std::condition_variable. That will be
>>>>> more robust and readable.
>>>>> > >
>>>>> > > Shawn.
>>>>> > >
>>>>> > > On 7/29/2014 10:53 AM, Zachary Turner wrote:
>>>>> > > Even better would be an std::condition_variable
>>>>> > >
>>>>> > >
>>>>> > > On Mon, Jul 28, 2014 at 10:30 PM, Matthew Gardiner
>>>>> <mg11 at csr.com <mailto:mg11 at csr.com> <mailto:mg11 at csr.com
>>>>> <mailto:mg11 at csr.com>>> wrote:
>>>>> > >
>>>>> > > Hi Shawn,
>>>>> > >
>>>>> > > I use 64-bit linux and I see this issue a lot. It usually
>>>>> > > manifests itself as the prompt just not being printed (or
>>>>> perhaps
>>>>> > > it just gets overwritten) - regardless - I invoke a
>>>>> command, and
>>>>> > > I don't see an (lldb) prompt when I should. So I'm well
>>>>> pleased
>>>>> > > that you are looking at this!
>>>>> > >
>>>>> > > Would it not be more robust to use a semaphore than
>>>>> usleep to
>>>>> > > synchronise the problematic threads?
>>>>> > >
>>>>> > > Although I've not looked too deeply into this particular
>>>>> issue,
>>>>> > > whenever I've seen similar races, I found that it's
>>>>> almost
>>>>> > > impossible to pick the right value when using a sleep
>>>>> command. A
>>>>> > > semaphore, though, should always ensure the waiting
>>>>> thread
>>>>> will
>>>>> > > wake precisely.
>>>>> > >
>>>>> > > I'd be happy to help to test such a fix.
>>>>> > >
>>>>> > > Matt
>>>>> > >
>>>>> > >
>>>>> > > Shawn Best wrote:
>>>>> > >
>>>>> > > Hi,
>>>>> > >
>>>>> > > I have attached a patch which addresses 3 related
>>>>> race
>>>>> > > conditions that cause the command line (lldb) prompt
>>>>> to get
>>>>> > > displayed inappropriately and make it appear it is
>>>>> not
>>>>> > > working correctly. This issue can be seen on
>>>>> linux and
>>>>> > > FreeBSD. I can also artificailly induce the problem
>>>>> on OSX.
>>>>> > >
>>>>> > > The issue happens when the command handler (in the
>>>>> main
>>>>> > > thread) issues a command such as run, step or
>>>>> continue.
>>>>> > > After the command finishes initiating its action, it
>>>>> returns
>>>>> > > up the call stack and goes back into the main
>>>>> command loop
>>>>> > > waiting for user input. Simultaneously, as the
>>>>> inferior
>>>>> > > process starts up, the MonitorChildProcess thread
>>>>> picks up
>>>>> > > the change and posts to the PrivateEvent thread.
>>>>> > > HandePrivateEvent() then calls
>>>>> PushProcessIOHandler()
>>>>> which
>>>>> > > will disable the command IO handler and give the
>>>>> inferior
>>>>> > > control of the TTY. To observe this on OSX, put a
>>>>> > >
>>>>> > > usleep(100);
>>>>> > >
>>>>> > > immediately prior the PushProcessIOHandler() in
>>>>> > > HandlePrivateEvent.
>>>>> > >
>>>>> > >
>>>>> > > My proposed solution is that after a 'run',
>>>>> 'step', or
>>>>> > > 'continue' command, insert a synchronization point
>>>>> and
>>>>> wait
>>>>> > > until HandlePrivateEvent knows the inferior
>>>>> process is
>>>>> > > running and has pushed the IO handler. One context
>>>>> switch
>>>>> > > (<100us) is usually all the time it takes on my
>>>>> machine. As
>>>>> > > an additional safety, I have a timeout (currently
>>>>> 1ms)
>>>>> so it
>>>>> > > will never hang the main thread.
>>>>> > >
>>>>> > > Any thoughts, or suggestions would be appreciated.
>>>>> > >
>>>>> > > Regards,
>>>>> > > Shawn.
>>>>> > >
>>>>> > >
>>>>> > > To report this email as spam click here
>>>>> > > <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==>.
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > _______________________________________________
>>>>> > > lldb-commits mailing list
>>>>> > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>> <mailto:lldb-commits at cs.uiuc.edu
>>>>> <mailto:lldb-commits at cs.uiuc.edu>>
>>>>> > >
>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > Member of the CSR plc group of companies. CSR plc
>>>>> registered in
>>>>> > > England and Wales, registered number 4187346, registered
>>>>> office
>>>>> > > Churchill House, Cambridge Business Park, Cowley Road,
>>>>> Cambridge,
>>>>> > > CB4 0WZ, United Kingdom
>>>>> > > More information can be found at www.csr.com
>>>>> <http://www.csr.com>
>>>>> > > <http://www.csr.com>. Keep up to date with CSR on our
>>>>> technical
>>>>> > > blog, www.csr.com/blog <http://www.csr.com/blog>
>>>>> <http://www.csr.com/blog>, CSR people
>>>>> > > blog, www.csr.com/people <http://www.csr.com/people>
>>>>> <http://www.csr.com/people>, YouTube,
>>>>> > > www.youtube.com/user/CSRplc
>>>>> <http://www.youtube.com/user/CSRplc>
>>>>> <http://www.youtube.com/user/CSRplc>,
>>>>> > > Facebook, www.facebook.com/pages/CSR/191038434253534
>>>>> <http://www.facebook.com/pages/CSR/191038434253534>
>>>>> > > <http://www.facebook.com/pages/CSR/191038434253534>, or
>>>>> follow us
>>>>> > > on Twitter at www.twitter.com/CSR_plc
>>>>> <http://www.twitter.com/CSR_plc>
>>>>> > > <http://www.twitter.com/CSR_plc>.
>>>>> > >
>>>>> > > New for 2014, you can now access the wide range of
>>>>> products
>>>>> > > powered by aptX at www.aptx.com <http://www.aptx.com>
>>>>> <http://www.aptx.com>.
>>>>> > > _______________________________________________
>>>>> > > lldb-commits mailing list
>>>>> > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>> <mailto:lldb-commits at cs.uiuc.edu
>>>>> <mailto:lldb-commits at cs.uiuc.edu>>
>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> <sbest_iohandler_race_rev_04.diff>_______________________________________________
>>>>>
>>>>> > > lldb-commits mailing list
>>>>> > > lldb-commits at cs.uiuc.edu <mailto:lldb-commits at cs.uiuc.edu>
>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>> >
>>>>> >
>>>>> >
>>>>> > <sbest_iohandler_race_rev_05.diff>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> lldb-commits mailing list
>>>> lldb-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>> <forever.c>_______________________________________________
>>> lldb-commits mailing list
>>> lldb-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>
More information about the lldb-commits
mailing list