[Lldb-commits] [PATCH][Review Request] Fix for potentially missed events in Core/Listener

Tue Nov 13 09:51:50 PST 2012

Thanks, Jim.

Your suggested change should definitely handle the problem I saw.  

The explicit unlock inside Listener::FindNextEventInternal still troubles me a bit conceptually because it is vulnerable to callers holding the mutex and causing the mutex not to actually unlock.  I guess that's a fairly small risk since the mutex is a protected member of the class (though you never know what subclasses might do).  Is there a way to check to see if the mutex was actually unlocked?  The pthread_mutex_unlock documentation seems to say it just returns zero if there was no error whether the lock is released or not.  Maybe some more warning comments are the best you can do to protect this code from bozos like me. ;)

-Andy

-----Original Message-----
From: Jim Ingham [mailto:jingham at apple.com] 
Sent: Monday, November 12, 2012 6:48 PM
To: Kaylor, Andrew
Cc: lldb-commits at cs.uiuc.edu
Subject: Re: [Lldb-commits] [PATCH][Review Request] Fix for potentially missed events in Core/Listener

Sorry for being slow to look at this.  

So the "DoOnRemoval" part of the event handling is where, for instance, the commands and conditions for breakpoint events get processed.  So roughly any code that you can run in lldb can get run in the DoOnRemoval implementation for breakpoint events.  Currently nothing that uses DoOnRemoval requires the listener to be servicing new events (which is why your patch didn't cause any testsuite failures...) though the two candidates obvious candidates are:

1) You can call functions in both conditions and commands, and those obviously will require more event traffic.  But the function calling mechanism interposes another listener when calling functions in order to make the function calling event traffic private.  So it doesn't matter that the current process listener isn't servicing events till it gets done with DoOnRemoval.

2) You can issue "continue/step/next" commands in breakpoint commands, which if they stayed in the context of the breakpoint command would obviously require new events.  For now the command processing will exit when one of its subcommands restarts the target.  This is not considered a good thing, but is currently necessary to protect against reentrancy issues in the command interpreter: running the target in a breakpoint command could cause it to hit a breakpoint, which could have its own command, which would re-enter the interpreter, etc...  

This is something I want to fix when I get a chance; many people would love to be able to do:

(lldb) breakpoint command add
> step
> expr something_interesting
> step
> expr something_interesting

etc.

So I think to make #2 work we're going to have to figure out a way to avoid the deadlock you detected without locking around all of GetNextEvent.  What you are trying to do is pretty straightforward, however.  Instead of locking around the GetNextEventInternal, we could let that run, and if we don't find the event we are looking for, then we lock the event queue, and poll for an event.  If we find one we want we consume it, otherwise we set the condition to false and go to wait.  That way we'll pick up anything added during and just after the GetNextEvent call, and only set the condition to false & wait if nothing is there.

Index: source/Core/Listener.cpp
===================================================================

--- source/Core/Listener.cpp	(revision 167779)
+++ source/Core/Listener.cpp	(working copy)
@@ -312,6 +312,11 @@
                 m_cond_wait.SetValue (false, eBroadcastNever);
         }
         
+        // Unlock the event queue here.  We've removed this event and are about to return
+        // it so it should be okay to get the next event off the queue here - and it might
+        // be useful to do that in the "DoOnRemoval".
+        lock.Unlock();
+        
         // Don't call DoOnRemoval if you aren't removing the event...
         if (remove)
             event_sp->DoOnRemoval();
@@ -407,18 +412,21 @@
     while (1)
     {
         // Scope for "event_locker"
-        {
-            // Don't allow new events to be added while we're checking for the
-            // one we want.  Otherwise, setting m_cond_wait to false below
-            // might cause us to miss one.
-            Mutex::Locker event_locker(m_events_mutex);
-
             if (GetNextEventInternal (broadcaster, broadcaster_names, num_broadcaster_names, event_type_mask, event_sp))
                 return true;
 
+        {
             // Reset condition value to false, so we can wait for new events to be
             // added that might meet our current filter
-            m_cond_wait.SetValue (false, eBroadcastNever);
+            // But first poll for any new event that might satisfy our condition, and if so consume it,
+            // otherwise wait.
+            
+            Mutex::Locker event_locker(m_events_mutex);
+            const bool remove = false;
+            if (FindNextEventInternal (broadcaster, broadcaster_names, num_broadcaster_names, event_type_mask, event_sp, remove))
+                continue;
+            else
+                m_cond_wait.SetValue (false, eBroadcastNever);
         }
 
         if (m_cond_wait.WaitForValueEqualTo (true, timeout, &timed_out))

Note that the Listener code has an un-enforced convention that you shouldn't wait for events on the same listener from multiple threads, so we don't need to worry about that.  It is probably clear that that isn't ever going to be a good idea, but we might should enforce it at some point just to be sure...

What do you think?

Jim

On Nov 12, 2012, at 2:20 PM, "Kaylor, Andrew" <andrew.kaylor at intel.com> wrote:

> Ping (with refreshed patch)
>  
> From: lldb-commits-bounces at cs.uiuc.edu [mailto:lldb-commits-bounces at cs.uiuc.edu] On Behalf Of Kaylor, Andrew
> Sent: Friday, October 26, 2012 12:38 PM
> To: lldb-commits at cs.uiuc.edu
> Subject: [Lldb-commits] [PATCH][Review Request] Fix for potentially missed events in Core/Listener
>  
> This patch fixes a problem where the Listener class could get into a deadlocked state because it missed the arrival of a new event.  It was possible for a new event to arrive between the time Listener::WaitForEventsInternal checked the list of available events and the time it set the conditional variable indicating that the event it was waiting for was not available.  The patch closes this loophole by locking the m_events_mutex before calling Listener::FindNextEventInternal and holding it until after it sets the condition variable.
>  
> In addition, I am removing the explicit Unlock in Listener::FindNextEventInternal because with the change above the calling function holds the mutex so it will not actually be unlocked.  The only potential problem I can see with this is if some implementation of Event::DoOnRemoval blocks waiting for an event to be received from another thread by the calling Listener object (which seems like a bad idea).
>  
> -Andy
> <listener-fix.patch>_______________________________________________
> lldb-commits mailing list
> lldb-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits