[lldb-dev] Process attach and pgid of forked process under Linux

Andrew MacPherson andrew.macp at gmail.com
Wed Apr 2 00:10:39 PDT 2014


Hey Todd,

Sounds great. Sorry about the C99 issue, I was using clang which I'm
guessing behaves differently there. For the waitpid + getpgid in Host.cpp,
that section is #ifdef'd out on Windows so should be safe there too, I've
committed the patch now. Thanks!

Andrew


On Tue, Apr 1, 2014 at 7:46 PM, Todd Fiala <tfiala at google.com> wrote:

> Hey Andrew,
>
> I think the core I was getting on the assert is during exiting and doesn't
> really seem to indicate a serious issue.  I fixed it up with a null check
> and added some logging.
>
> See the attached patch.  This is a cumulative patch with my fixes for
> those two test failures plus all of your original changes.
>
> As this patch stands, LGTM.  If you want to check it in, I'm fine with it
> from a Linux standpoint.  In the common Host.cpp, you now have a getpgid()
> call.  I'm not sure that's going to work on a Windows build?  Might need to
> #ifdef that or create a macro for it, or something else entirely.  Any
> Windows folks able to comment on that?
>
>
>
>
>
>
> On Tue, Apr 1, 2014 at 10:25 AM, Todd Fiala <tfiala at google.com> wrote:
>
>> That change did successfully fix the TestAttachResume.py failure.
>>
>> This one is a bit different and looks like we may have perturbed
>> something related to shutdown:
>>
>> TestHelloWorld.py:
>>
>> Executing tearDown hook:     def cleanupSubprocesses(self):
>>         # Ensure any subprocesses are cleaned up
>>         for p in self.subprocesses:
>>             if p.poll() == None:
>>                 p.terminate()
>>             del p
>>         del self.subprocesses[:]
>>         # Ensure any forked processes are cleaned up
>>         for pid in self.forkedProcessPids:
>>             if os.path.exists("/proc/" + str(pid)):
>>                 os.kill(pid, signal.SIGTERM)
>>
>>
>> python:
>> /mnt/ssd/work/git/gen/llvm/tools/lldb/source/Plugins/Process/POSIX/ProcessPOSIX.cpp:427:
>> virtual void ProcessPOSIX::SendMessage(const ProcessMessage&): Assertion
>> `thread' failed.
>> Aborted (core dumped)
>>
>> I'm having a look at that core now.
>>
>>
>>
>> On Tue, Apr 1, 2014 at 10:00 AM, Todd Fiala <tfiala at google.com> wrote:
>>
>>> I built clean and ran tests error free on r205315.
>>>
>>> After testing with the patch applied, I'm getting the following failures
>>> (3 times in a row):
>>>
>>> FAIL: LLDB (suite) :: TestHelloWorld.py (Linux
>>> tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT
>>> 2013 x86_64 x86_64)
>>> FAIL: LLDB (suite) :: TestAttachResume.py (Linux
>>> tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT
>>> 2013 x86_64 x86_64)
>>>
>>> Looking deeper, it looks like the main.c code in
>>> test/functionalities/attach_resume/main.c is assuming c99 mode which isn't
>>> the default when using gcc.  I'm clearing that locally (pulling for loop
>>> variable declarations out) and will see if that fixes everything else.
>>>
>>> Back in a bit...
>>>
>>>
>>> On Tue, Apr 1, 2014 at 9:06 AM, Todd Fiala <tfiala at google.com> wrote:
>>>
>>>> Having a look at this now.
>>>>
>>>>
>>>> On Thu, Mar 27, 2014 at 1:01 AM, Andrew MacPherson <
>>>> andrew.macp at gmail.com> wrote:
>>>>
>>>>> Thanks Todd, and no problem on the unit test, I'll try to include one
>>>>> for anything I come across in the future as well.
>>>>>
>>>>>
>>>>> On Wed, Mar 26, 2014 at 10:39 PM, Todd Fiala <tfiala at google.com>wrote:
>>>>>
>>>>>> Ah nice.  I'll definitely run this again soon.  Thanks for getting a
>>>>>> test around it - we can't have enough of that around important expectations.
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 26, 2014 at 2:06 PM, Andrew MacPherson <
>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>
>>>>>>> Ok that makes sense. I've updated the patch I sent yesterday with a
>>>>>>> unit test that exposes the problem as well as a couple of other process
>>>>>>> attach issues that were fixed in the last few days. It was in fact this
>>>>>>> getpgid() issue that was causing problems when trying to create a unit test
>>>>>>> earlier as the problem scenario occurs when the unit test script forks a
>>>>>>> subprocess with popen().
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 26, 2014 at 7:20 PM, Greg Clayton <gclayton at apple.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 25, 2014, at 12:09 PM, Andrew MacPherson <
>>>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>>>
>>>>>>>> > Ok great, I'm attaching a patch here that just uses getpgid() to
>>>>>>>> determine the pgid to use with waitpid(). I get the same unit test results
>>>>>>>> before and after the patch (there are a few tests that fail consistently
>>>>>>>> for me). I tried using a simple -1 with waitpid() but this results in a
>>>>>>>> hang in the unit tests. This is probably something worth investigating but
>>>>>>>> for now this patch does resolve the issues around waitpid() when attaching.
>>>>>>>>
>>>>>>>> We should never use -1 with waitpid() as you could end up reaping a
>>>>>>>> process that someone else is already trying to reap. You must only try to
>>>>>>>> reap the process that you launched using some pid for you process. Many
>>>>>>>> things launch processes within LLDB like ProcessGDBRemote launching
>>>>>>>> "debugserver" binaries. The ProcessGDBRemote must be notified when and if
>>>>>>>> "debugserver" crashes, so if anyone else does waitpid(-1, ...)
>>>>>>>> ProcessGDBRemote might never find out if debugserver crashes for some
>>>>>>>> reason.
>>>>>>>>
>>>>>>>> > I mentioned a few issues I ran into with the unit tests in the
>>>>>>>> email I just pinged with a Linux detach patch, basically there appears to
>>>>>>>> be a bug or limitation somewhere in that doing a "continue" in asynchronous
>>>>>>>> mode in a unit test breaks things and this is needed for most
>>>>>>>> attach-related debug testing. I can probably look into this (along with the
>>>>>>>> couple of failing tests I have) but likely won't be able to this week. I
>>>>>>>> haven't filed a bug for the consistently-failing tests on my end since it
>>>>>>>> doesn't appear to be happening for others and I haven't had a chance to
>>>>>>>> investigate here, let me know if I should file a bug anyway.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Tue, Mar 25, 2014 at 6:30 PM, Todd Fiala <tfiala at google.com>
>>>>>>>> wrote:
>>>>>>>> > I'd suggest trying the change, running the tests, and see if
>>>>>>>> anything new fails.
>>>>>>>> >
>>>>>>>> > And if this fixes currently broken behavior, add a test to make
>>>>>>>> sure we don't regress it.  I'm pretty sure if it breaks something we'll
>>>>>>>> know pretty quickly (either via bugs or our own usage).  At which point -
>>>>>>>> we should add a test so we don't regress in the future and perhaps add a
>>>>>>>> few comments as to the reason.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Tue, Mar 25, 2014 at 9:54 AM, <jingham at apple.com> wrote:
>>>>>>>> > Why are you waiting for process groups?  That's not something we
>>>>>>>> have to do on Mac OS X.
>>>>>>>> >
>>>>>>>> > Jim
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mar 25, 2014, at 1:17 AM, Andrew MacPherson <
>>>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > > Currently under Linux if you attach to a process whose process
>>>>>>>> group id is not equal to its process id (such as the child process of a
>>>>>>>> fork() call) the calls to waitpid() that pass -1*pid will return ECHILD
>>>>>>>> since the pid argument refers to a process group that doesn't exist. These
>>>>>>>> calls occur in Host::MonitorChildProcessThreadFunction() and the Linux
>>>>>>>> ProcessMonitor.
>>>>>>>> > >
>>>>>>>> > > Changing -1*pid to simply -1 or to -1*getpgid(pid) resolves the
>>>>>>>> issue but it's not clear if this is the right fix as I'm unsure how other
>>>>>>>> OSes deal with this scenario.
>>>>>>>> > >
>>>>>>>> > > Any thoughts?
>>>>>>>> > >
>>>>>>>> > > Thanks,
>>>>>>>> > > Andrew
>>>>>>>> > > _______________________________________________
>>>>>>>> > > lldb-dev mailing list
>>>>>>>> > > lldb-dev at cs.uiuc.edu
>>>>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > lldb-dev mailing list
>>>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Todd Fiala |   Software Engineer |     tfiala at google.com |
>>>>>>>> 650-943-3180
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> <waitpid-getpgid.patch>_______________________________________________
>>>>>>>> > lldb-dev mailing list
>>>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>
>>
>>
>>
>> --
>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>
>
>
>
> --
> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140402/5660cc14/attachment.html>


More information about the lldb-dev mailing list