[lldb-dev] Process attach and pgid of forked process under Linux

Todd Fiala tfiala at google.com
Tue Apr 1 10:46:13 PDT 2014


Hey Andrew,

I think the core I was getting on the assert is during exiting and doesn't
really seem to indicate a serious issue.  I fixed it up with a null check
and added some logging.

See the attached patch.  This is a cumulative patch with my fixes for those
two test failures plus all of your original changes.

As this patch stands, LGTM.  If you want to check it in, I'm fine with it
from a Linux standpoint.  In the common Host.cpp, you now have a getpgid()
call.  I'm not sure that's going to work on a Windows build?  Might need to
#ifdef that or create a macro for it, or something else entirely.  Any
Windows folks able to comment on that?






On Tue, Apr 1, 2014 at 10:25 AM, Todd Fiala <tfiala at google.com> wrote:

> That change did successfully fix the TestAttachResume.py failure.
>
> This one is a bit different and looks like we may have perturbed something
> related to shutdown:
>
> TestHelloWorld.py:
>
> Executing tearDown hook:     def cleanupSubprocesses(self):
>         # Ensure any subprocesses are cleaned up
>         for p in self.subprocesses:
>             if p.poll() == None:
>                 p.terminate()
>             del p
>         del self.subprocesses[:]
>         # Ensure any forked processes are cleaned up
>         for pid in self.forkedProcessPids:
>             if os.path.exists("/proc/" + str(pid)):
>                 os.kill(pid, signal.SIGTERM)
>
>
> python:
> /mnt/ssd/work/git/gen/llvm/tools/lldb/source/Plugins/Process/POSIX/ProcessPOSIX.cpp:427:
> virtual void ProcessPOSIX::SendMessage(const ProcessMessage&): Assertion
> `thread' failed.
> Aborted (core dumped)
>
> I'm having a look at that core now.
>
>
>
> On Tue, Apr 1, 2014 at 10:00 AM, Todd Fiala <tfiala at google.com> wrote:
>
>> I built clean and ran tests error free on r205315.
>>
>> After testing with the patch applied, I'm getting the following failures
>> (3 times in a row):
>>
>> FAIL: LLDB (suite) :: TestHelloWorld.py (Linux
>> tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT
>> 2013 x86_64 x86_64)
>> FAIL: LLDB (suite) :: TestAttachResume.py (Linux
>> tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT
>> 2013 x86_64 x86_64)
>>
>> Looking deeper, it looks like the main.c code in
>> test/functionalities/attach_resume/main.c is assuming c99 mode which isn't
>> the default when using gcc.  I'm clearing that locally (pulling for loop
>> variable declarations out) and will see if that fixes everything else.
>>
>> Back in a bit...
>>
>>
>> On Tue, Apr 1, 2014 at 9:06 AM, Todd Fiala <tfiala at google.com> wrote:
>>
>>> Having a look at this now.
>>>
>>>
>>> On Thu, Mar 27, 2014 at 1:01 AM, Andrew MacPherson <
>>> andrew.macp at gmail.com> wrote:
>>>
>>>> Thanks Todd, and no problem on the unit test, I'll try to include one
>>>> for anything I come across in the future as well.
>>>>
>>>>
>>>> On Wed, Mar 26, 2014 at 10:39 PM, Todd Fiala <tfiala at google.com> wrote:
>>>>
>>>>> Ah nice.  I'll definitely run this again soon.  Thanks for getting a
>>>>> test around it - we can't have enough of that around important expectations.
>>>>>
>>>>>
>>>>> On Wed, Mar 26, 2014 at 2:06 PM, Andrew MacPherson <
>>>>> andrew.macp at gmail.com> wrote:
>>>>>
>>>>>> Ok that makes sense. I've updated the patch I sent yesterday with a
>>>>>> unit test that exposes the problem as well as a couple of other process
>>>>>> attach issues that were fixed in the last few days. It was in fact this
>>>>>> getpgid() issue that was causing problems when trying to create a unit test
>>>>>> earlier as the problem scenario occurs when the unit test script forks a
>>>>>> subprocess with popen().
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 26, 2014 at 7:20 PM, Greg Clayton <gclayton at apple.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> On Mar 25, 2014, at 12:09 PM, Andrew MacPherson <
>>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>>
>>>>>>> > Ok great, I'm attaching a patch here that just uses getpgid() to
>>>>>>> determine the pgid to use with waitpid(). I get the same unit test results
>>>>>>> before and after the patch (there are a few tests that fail consistently
>>>>>>> for me). I tried using a simple -1 with waitpid() but this results in a
>>>>>>> hang in the unit tests. This is probably something worth investigating but
>>>>>>> for now this patch does resolve the issues around waitpid() when attaching.
>>>>>>>
>>>>>>> We should never use -1 with waitpid() as you could end up reaping a
>>>>>>> process that someone else is already trying to reap. You must only try to
>>>>>>> reap the process that you launched using some pid for you process. Many
>>>>>>> things launch processes within LLDB like ProcessGDBRemote launching
>>>>>>> "debugserver" binaries. The ProcessGDBRemote must be notified when and if
>>>>>>> "debugserver" crashes, so if anyone else does waitpid(-1, ...)
>>>>>>> ProcessGDBRemote might never find out if debugserver crashes for some
>>>>>>> reason.
>>>>>>>
>>>>>>> > I mentioned a few issues I ran into with the unit tests in the
>>>>>>> email I just pinged with a Linux detach patch, basically there appears to
>>>>>>> be a bug or limitation somewhere in that doing a "continue" in asynchronous
>>>>>>> mode in a unit test breaks things and this is needed for most
>>>>>>> attach-related debug testing. I can probably look into this (along with the
>>>>>>> couple of failing tests I have) but likely won't be able to this week. I
>>>>>>> haven't filed a bug for the consistently-failing tests on my end since it
>>>>>>> doesn't appear to be happening for others and I haven't had a chance to
>>>>>>> investigate here, let me know if I should file a bug anyway.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Tue, Mar 25, 2014 at 6:30 PM, Todd Fiala <tfiala at google.com>
>>>>>>> wrote:
>>>>>>> > I'd suggest trying the change, running the tests, and see if
>>>>>>> anything new fails.
>>>>>>> >
>>>>>>> > And if this fixes currently broken behavior, add a test to make
>>>>>>> sure we don't regress it.  I'm pretty sure if it breaks something we'll
>>>>>>> know pretty quickly (either via bugs or our own usage).  At which point -
>>>>>>> we should add a test so we don't regress in the future and perhaps add a
>>>>>>> few comments as to the reason.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Tue, Mar 25, 2014 at 9:54 AM, <jingham at apple.com> wrote:
>>>>>>> > Why are you waiting for process groups?  That's not something we
>>>>>>> have to do on Mac OS X.
>>>>>>> >
>>>>>>> > Jim
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mar 25, 2014, at 1:17 AM, Andrew MacPherson <
>>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>> >
>>>>>>> > > Currently under Linux if you attach to a process whose process
>>>>>>> group id is not equal to its process id (such as the child process of a
>>>>>>> fork() call) the calls to waitpid() that pass -1*pid will return ECHILD
>>>>>>> since the pid argument refers to a process group that doesn't exist. These
>>>>>>> calls occur in Host::MonitorChildProcessThreadFunction() and the Linux
>>>>>>> ProcessMonitor.
>>>>>>> > >
>>>>>>> > > Changing -1*pid to simply -1 or to -1*getpgid(pid) resolves the
>>>>>>> issue but it's not clear if this is the right fix as I'm unsure how other
>>>>>>> OSes deal with this scenario.
>>>>>>> > >
>>>>>>> > > Any thoughts?
>>>>>>> > >
>>>>>>> > > Thanks,
>>>>>>> > > Andrew
>>>>>>> > > _______________________________________________
>>>>>>> > > lldb-dev mailing list
>>>>>>> > > lldb-dev at cs.uiuc.edu
>>>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > lldb-dev mailing list
>>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Todd Fiala |   Software Engineer |     tfiala at google.com |
>>>>>>> 650-943-3180
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> <waitpid-getpgid.patch>_______________________________________________
>>>>>>> > lldb-dev mailing list
>>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>
>>
>>
>>
>> --
>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>
>
>
>
> --
> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>



-- 
Todd Fiala | Software Engineer | tfiala at google.com | 650-943-3180
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140401/814ccaa0/attachment.html>
-------------- next part --------------
diff --git a/source/Host/common/Host.cpp b/source/Host/common/Host.cpp
index 197be19..35bb94a 100644
--- a/source/Host/common/Host.cpp
+++ b/source/Host/common/Host.cpp
@@ -169,7 +169,7 @@ MonitorChildProcessThreadFunction (void *arg)
     const bool monitor_signals = info->monitor_signals;
 
     assert (info->pid <= UINT32_MAX);
-    const ::pid_t pid = monitor_signals ? -1 * info->pid : info->pid;
+    const ::pid_t pid = monitor_signals ? -1 * getpgid(info->pid) : info->pid;
 
     delete info;
 
diff --git a/source/Plugins/Process/Linux/ProcessMonitor.cpp b/source/Plugins/Process/Linux/ProcessMonitor.cpp
index 3dec6de..554d54c 100644
--- a/source/Plugins/Process/Linux/ProcessMonitor.cpp
+++ b/source/Plugins/Process/Linux/ProcessMonitor.cpp
@@ -1767,7 +1767,7 @@ ProcessMonitor::StopThread(lldb::tid_t tid)
         int status = -1;
         if (log)
             log->Printf ("ProcessMonitor::%s(bp) waitpid...", __FUNCTION__);
-        lldb::pid_t wait_pid = ::waitpid (-1*m_pid, &status, __WALL);
+        lldb::pid_t wait_pid = ::waitpid (-1*getpgid(m_pid), &status, __WALL);
         if (log)
             log->Printf ("ProcessMonitor::%s(bp) waitpid, pid = %" PRIu64 ", status = %d", __FUNCTION__, wait_pid, status);
 
diff --git a/source/Plugins/Process/POSIX/ProcessPOSIX.cpp b/source/Plugins/Process/POSIX/ProcessPOSIX.cpp
index 5718181..2cbf741 100644
--- a/source/Plugins/Process/POSIX/ProcessPOSIX.cpp
+++ b/source/Plugins/Process/POSIX/ProcessPOSIX.cpp
@@ -383,6 +383,8 @@ ProcessPOSIX::DoDidExec()
 void
 ProcessPOSIX::SendMessage(const ProcessMessage &message)
 {
+    Log *log (ProcessPOSIXLog::GetLogIfAllCategoriesSet (POSIX_LOG_PROCESS));
+
     Mutex::Locker lock(m_message_mutex);
 
     Mutex::Locker thread_lock(m_thread_list.GetMutex());
@@ -424,8 +426,14 @@ ProcessPOSIX::SendMessage(const ProcessMessage &message)
         break;
 
     case ProcessMessage::eExitMessage:
-        assert(thread);
-        thread->SetState(eStateExited);
+        if (thread != nullptr)
+            thread->SetState(eStateExited);
+        else
+        {
+            if (log)
+                log->Warning ("ProcessPOSIX::%s eExitMessage for TID %" PRIu64 " failed to find a thread to mark as eStateExied, ignoring", __FUNCTION__, message.GetTID ());
+        }
+
         // FIXME: I'm not sure we need to do this.
         if (message.GetTID() == GetID())
         {
diff --git a/test/functionalities/attach_resume/Makefile b/test/functionalities/attach_resume/Makefile
new file mode 100644
index 0000000..0375c31
--- /dev/null
+++ b/test/functionalities/attach_resume/Makefile
@@ -0,0 +1,8 @@
+LEVEL = ../../make
+
+C_SOURCES := main.c
+LD_EXTRAS := -lpthread
+
+EXE := AttachResume
+
+include $(LEVEL)/Makefile.rules
diff --git a/test/functionalities/attach_resume/TestAttachResume.py b/test/functionalities/attach_resume/TestAttachResume.py
new file mode 100644
index 0000000..b79633d
--- /dev/null
+++ b/test/functionalities/attach_resume/TestAttachResume.py
@@ -0,0 +1,95 @@
+"""
+Test process attach/resume.
+"""
+
+import os, time
+import unittest2
+import lldb
+from lldbtest import *
+import lldbutil
+
+exe_name = "AttachResume"  # Must match Makefile
+
+class AttachResumeTestCase(TestBase):
+
+    mydir = TestBase.compute_mydir(__file__)
+
+    @dwarf_test
+    def test_attach_continue_interrupt_detach(self):
+        """Test attach/continue/interrupt/detach"""
+        self.buildDwarf()
+        self.process_attach_continue_interrupt_detach()
+
+    def process_attach_continue_interrupt_detach(self):
+        """Test attach/continue/interrupt/detach"""
+
+        exe = os.path.join(os.getcwd(), exe_name)
+
+        popen = self.spawnSubprocess(exe)
+        self.addTearDownHook(self.cleanupSubprocesses)
+
+        self.runCmd("process attach -p " + str(popen.pid))
+
+        self._state = 0
+        def process_events():
+            event = lldb.SBEvent()
+            while self.dbg.GetListener().GetNextEvent(event):
+                self._state = lldb.SBProcess.GetStateFromEvent(event)
+
+        # using process.GetState() does not work: llvm.org/pr16172
+        def wait_for_state(s, timeout=5):
+            t = 0
+            period = 0.1
+            while self._state != s:
+                process_events()
+                time.sleep(period)
+                t += period
+                if t > timeout:
+                    return False
+            return True
+
+        self.setAsync(True)
+
+        self.runCmd("c")
+        self.assertTrue(wait_for_state(lldb.eStateRunning),
+            'Process not running after continue')
+
+        self.runCmd("process interrupt")
+        self.assertTrue(wait_for_state(lldb.eStateStopped),
+            'Process not stopped after interrupt')
+
+        # be sure to continue/interrupt/continue (r204504)
+        self.runCmd("c")
+        self.assertTrue(wait_for_state(lldb.eStateRunning),
+            'Process not running after continue')
+
+        self.runCmd("process interrupt")
+        self.assertTrue(wait_for_state(lldb.eStateStopped),
+            'Process not stopped after interrupt')
+
+        # check that this breakpoint is auto-cleared on detach (r204752)
+        self.runCmd("br set -f main.c -l 12")
+
+        self.runCmd("c")
+        self.assertTrue(wait_for_state(lldb.eStateRunning),
+            'Process not running after continue')
+
+        self.assertTrue(wait_for_state(lldb.eStateStopped),
+            'Process not stopped after breakpoint')
+        self.expect('br list', 'Breakpoint not hit',
+            patterns = ['hit count = 1'])
+
+        self.runCmd("c")
+        self.assertTrue(wait_for_state(lldb.eStateRunning),
+            'Process not running after continue')
+
+        # make sure to detach while in running state (r204759)
+        self.runCmd("detach")
+        self.assertTrue(wait_for_state(lldb.eStateDetached),
+            'Process not detached after detach')
+
+if __name__ == '__main__':
+    import atexit
+    lldb.SBDebugger.Initialize()
+    atexit.register(lambda: lldb.SBDebugger.Terminate())
+    unittest2.main()
diff --git a/test/functionalities/attach_resume/main.c b/test/functionalities/attach_resume/main.c
new file mode 100644
index 0000000..564e71b
--- /dev/null
+++ b/test/functionalities/attach_resume/main.c
@@ -0,0 +1,32 @@
+#include <unistd.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <pthread.h>
+
+void *start(void *data)
+{
+    int i;
+    size_t idx = (size_t)data;
+    for (i=0; i<30; i++)
+    {
+        if ( idx == 0 )
+            usleep(1);
+        sleep(1);
+    }
+    return 0;
+}
+
+int main(int argc, char const *argv[])
+{
+    static const size_t nthreads = 16;
+    pthread_attr_t attr;
+    pthread_t threads[nthreads];
+    size_t i;
+
+    pthread_attr_init(&attr);
+    for (i=0; i<nthreads; i++)
+        pthread_create(&threads[i], &attr, &start, (void *)i);
+
+    for (i=0; i<nthreads; i++)
+        pthread_join(threads[i], 0);
+}


More information about the lldb-dev mailing list