[lldb-dev] Process attach and pgid of forked process under Linux

Todd Fiala tfiala at google.com
Tue Apr 1 10:25:29 PDT 2014


That change did successfully fix the TestAttachResume.py failure.

This one is a bit different and looks like we may have perturbed something
related to shutdown:

TestHelloWorld.py:

Executing tearDown hook:     def cleanupSubprocesses(self):
        # Ensure any subprocesses are cleaned up
        for p in self.subprocesses:
            if p.poll() == None:
                p.terminate()
            del p
        del self.subprocesses[:]
        # Ensure any forked processes are cleaned up
        for pid in self.forkedProcessPids:
            if os.path.exists("/proc/" + str(pid)):
                os.kill(pid, signal.SIGTERM)


python:
/mnt/ssd/work/git/gen/llvm/tools/lldb/source/Plugins/Process/POSIX/ProcessPOSIX.cpp:427:
virtual void ProcessPOSIX::SendMessage(const ProcessMessage&): Assertion
`thread' failed.
Aborted (core dumped)

I'm having a look at that core now.



On Tue, Apr 1, 2014 at 10:00 AM, Todd Fiala <tfiala at google.com> wrote:

> I built clean and ran tests error free on r205315.
>
> After testing with the patch applied, I'm getting the following failures
> (3 times in a row):
>
> FAIL: LLDB (suite) :: TestHelloWorld.py (Linux tfiala2.mtv.corp.google.com3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
> FAIL: LLDB (suite) :: TestAttachResume.py (Linux
> tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT
> 2013 x86_64 x86_64)
>
> Looking deeper, it looks like the main.c code in
> test/functionalities/attach_resume/main.c is assuming c99 mode which isn't
> the default when using gcc.  I'm clearing that locally (pulling for loop
> variable declarations out) and will see if that fixes everything else.
>
> Back in a bit...
>
>
> On Tue, Apr 1, 2014 at 9:06 AM, Todd Fiala <tfiala at google.com> wrote:
>
>> Having a look at this now.
>>
>>
>> On Thu, Mar 27, 2014 at 1:01 AM, Andrew MacPherson <andrew.macp at gmail.com
>> > wrote:
>>
>>> Thanks Todd, and no problem on the unit test, I'll try to include one
>>> for anything I come across in the future as well.
>>>
>>>
>>> On Wed, Mar 26, 2014 at 10:39 PM, Todd Fiala <tfiala at google.com> wrote:
>>>
>>>> Ah nice.  I'll definitely run this again soon.  Thanks for getting a
>>>> test around it - we can't have enough of that around important expectations.
>>>>
>>>>
>>>> On Wed, Mar 26, 2014 at 2:06 PM, Andrew MacPherson <
>>>> andrew.macp at gmail.com> wrote:
>>>>
>>>>> Ok that makes sense. I've updated the patch I sent yesterday with a
>>>>> unit test that exposes the problem as well as a couple of other process
>>>>> attach issues that were fixed in the last few days. It was in fact this
>>>>> getpgid() issue that was causing problems when trying to create a unit test
>>>>> earlier as the problem scenario occurs when the unit test script forks a
>>>>> subprocess with popen().
>>>>>
>>>>>
>>>>> On Wed, Mar 26, 2014 at 7:20 PM, Greg Clayton <gclayton at apple.com>wrote:
>>>>>
>>>>>>
>>>>>> On Mar 25, 2014, at 12:09 PM, Andrew MacPherson <
>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>
>>>>>> > Ok great, I'm attaching a patch here that just uses getpgid() to
>>>>>> determine the pgid to use with waitpid(). I get the same unit test results
>>>>>> before and after the patch (there are a few tests that fail consistently
>>>>>> for me). I tried using a simple -1 with waitpid() but this results in a
>>>>>> hang in the unit tests. This is probably something worth investigating but
>>>>>> for now this patch does resolve the issues around waitpid() when attaching.
>>>>>>
>>>>>> We should never use -1 with waitpid() as you could end up reaping a
>>>>>> process that someone else is already trying to reap. You must only try to
>>>>>> reap the process that you launched using some pid for you process. Many
>>>>>> things launch processes within LLDB like ProcessGDBRemote launching
>>>>>> "debugserver" binaries. The ProcessGDBRemote must be notified when and if
>>>>>> "debugserver" crashes, so if anyone else does waitpid(-1, ...)
>>>>>> ProcessGDBRemote might never find out if debugserver crashes for some
>>>>>> reason.
>>>>>>
>>>>>> > I mentioned a few issues I ran into with the unit tests in the
>>>>>> email I just pinged with a Linux detach patch, basically there appears to
>>>>>> be a bug or limitation somewhere in that doing a "continue" in asynchronous
>>>>>> mode in a unit test breaks things and this is needed for most
>>>>>> attach-related debug testing. I can probably look into this (along with the
>>>>>> couple of failing tests I have) but likely won't be able to this week. I
>>>>>> haven't filed a bug for the consistently-failing tests on my end since it
>>>>>> doesn't appear to be happening for others and I haven't had a chance to
>>>>>> investigate here, let me know if I should file a bug anyway.
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 25, 2014 at 6:30 PM, Todd Fiala <tfiala at google.com>
>>>>>> wrote:
>>>>>> > I'd suggest trying the change, running the tests, and see if
>>>>>> anything new fails.
>>>>>> >
>>>>>> > And if this fixes currently broken behavior, add a test to make
>>>>>> sure we don't regress it.  I'm pretty sure if it breaks something we'll
>>>>>> know pretty quickly (either via bugs or our own usage).  At which point -
>>>>>> we should add a test so we don't regress in the future and perhaps add a
>>>>>> few comments as to the reason.
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 25, 2014 at 9:54 AM, <jingham at apple.com> wrote:
>>>>>> > Why are you waiting for process groups?  That's not something we
>>>>>> have to do on Mac OS X.
>>>>>> >
>>>>>> > Jim
>>>>>> >
>>>>>> >
>>>>>> > On Mar 25, 2014, at 1:17 AM, Andrew MacPherson <
>>>>>> andrew.macp at gmail.com> wrote:
>>>>>> >
>>>>>> > > Currently under Linux if you attach to a process whose process
>>>>>> group id is not equal to its process id (such as the child process of a
>>>>>> fork() call) the calls to waitpid() that pass -1*pid will return ECHILD
>>>>>> since the pid argument refers to a process group that doesn't exist. These
>>>>>> calls occur in Host::MonitorChildProcessThreadFunction() and the Linux
>>>>>> ProcessMonitor.
>>>>>> > >
>>>>>> > > Changing -1*pid to simply -1 or to -1*getpgid(pid) resolves the
>>>>>> issue but it's not clear if this is the right fix as I'm unsure how other
>>>>>> OSes deal with this scenario.
>>>>>> > >
>>>>>> > > Any thoughts?
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > > Andrew
>>>>>> > > _______________________________________________
>>>>>> > > lldb-dev mailing list
>>>>>> > > lldb-dev at cs.uiuc.edu
>>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > lldb-dev mailing list
>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Todd Fiala |   Software Engineer |     tfiala at google.com |
>>>>>> 650-943-3180
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> <waitpid-getpgid.patch>_______________________________________________
>>>>>> > lldb-dev mailing list
>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>>
>>>
>>>
>>
>>
>> --
>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>
>
>
>
> --
> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>



-- 
Todd Fiala | Software Engineer | tfiala at google.com | 650-943-3180
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140401/ff1f3efc/attachment.html>


More information about the lldb-dev mailing list