[lldb-dev] Process attach and pgid of forked process under Linux

Todd Fiala tfiala at google.com
Wed Apr 2 07:50:27 PDT 2014


Great! Saw it go in, glad you were able to track that down.

Thanks, Andrew.
On Apr 2, 2014 12:10 AM, "Andrew MacPherson" <andrew.macp at gmail.com> wrote:

> Hey Todd,
>
> Sounds great. Sorry about the C99 issue, I was using clang which I'm
> guessing behaves differently there. For the waitpid + getpgid in Host.cpp,
> that section is #ifdef'd out on Windows so should be safe there too, I've
> committed the patch now. Thanks!
>
> Andrew
>
>
> On Tue, Apr 1, 2014 at 7:46 PM, Todd Fiala <tfiala at google.com> wrote:
>
>> Hey Andrew,
>>
>> I think the core I was getting on the assert is during exiting and
>> doesn't really seem to indicate a serious issue.  I fixed it up with a null
>> check and added some logging.
>>
>> See the attached patch.  This is a cumulative patch with my fixes for
>> those two test failures plus all of your original changes.
>>
>> As this patch stands, LGTM.  If you want to check it in, I'm fine with it
>> from a Linux standpoint.  In the common Host.cpp, you now have a getpgid()
>> call.  I'm not sure that's going to work on a Windows build?  Might need to
>> #ifdef that or create a macro for it, or something else entirely.  Any
>> Windows folks able to comment on that?
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 1, 2014 at 10:25 AM, Todd Fiala <tfiala at google.com> wrote:
>>
>>> That change did successfully fix the TestAttachResume.py failure.
>>>
>>> This one is a bit different and looks like we may have perturbed
>>> something related to shutdown:
>>>
>>> TestHelloWorld.py:
>>>
>>> Executing tearDown hook:     def cleanupSubprocesses(self):
>>>         # Ensure any subprocesses are cleaned up
>>>         for p in self.subprocesses:
>>>             if p.poll() == None:
>>>                 p.terminate()
>>>             del p
>>>         del self.subprocesses[:]
>>>         # Ensure any forked processes are cleaned up
>>>         for pid in self.forkedProcessPids:
>>>             if os.path.exists("/proc/" + str(pid)):
>>>                 os.kill(pid, signal.SIGTERM)
>>>
>>>
>>> python:
>>> /mnt/ssd/work/git/gen/llvm/tools/lldb/source/Plugins/Process/POSIX/ProcessPOSIX.cpp:427:
>>> virtual void ProcessPOSIX::SendMessage(const ProcessMessage&): Assertion
>>> `thread' failed.
>>> Aborted (core dumped)
>>>
>>> I'm having a look at that core now.
>>>
>>>
>>>
>>> On Tue, Apr 1, 2014 at 10:00 AM, Todd Fiala <tfiala at google.com> wrote:
>>>
>>>> I built clean and ran tests error free on r205315.
>>>>
>>>> After testing with the patch applied, I'm getting the following
>>>> failures (3 times in a row):
>>>>
>>>> FAIL: LLDB (suite) :: TestHelloWorld.py (Linux
>>>> tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18
>>>> PDT 2013 x86_64 x86_64)
>>>> FAIL: LLDB (suite) :: TestAttachResume.py (Linux
>>>> tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18
>>>> PDT 2013 x86_64 x86_64)
>>>>
>>>> Looking deeper, it looks like the main.c code in
>>>> test/functionalities/attach_resume/main.c is assuming c99 mode which isn't
>>>> the default when using gcc.  I'm clearing that locally (pulling for loop
>>>> variable declarations out) and will see if that fixes everything else.
>>>>
>>>> Back in a bit...
>>>>
>>>>
>>>> On Tue, Apr 1, 2014 at 9:06 AM, Todd Fiala <tfiala at google.com> wrote:
>>>>
>>>>> Having a look at this now.
>>>>>
>>>>>
>>>>> On Thu, Mar 27, 2014 at 1:01 AM, Andrew MacPherson <
>>>>> andrew.macp at gmail.com> wrote:
>>>>>
>>>>>> Thanks Todd, and no problem on the unit test, I'll try to include one
>>>>>> for anything I come across in the future as well.
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 26, 2014 at 10:39 PM, Todd Fiala <tfiala at google.com>wrote:
>>>>>>
>>>>>>> Ah nice.  I'll definitely run this again soon.  Thanks for getting a
>>>>>>> test around it - we can't have enough of that around important expectations.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 26, 2014 at 2:06 PM, Andrew MacPherson <
>>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>>
>>>>>>>> Ok that makes sense. I've updated the patch I sent yesterday with a
>>>>>>>> unit test that exposes the problem as well as a couple of other process
>>>>>>>> attach issues that were fixed in the last few days. It was in fact this
>>>>>>>> getpgid() issue that was causing problems when trying to create a unit test
>>>>>>>> earlier as the problem scenario occurs when the unit test script forks a
>>>>>>>> subprocess with popen().
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 26, 2014 at 7:20 PM, Greg Clayton <gclayton at apple.com>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mar 25, 2014, at 12:09 PM, Andrew MacPherson <
>>>>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> > Ok great, I'm attaching a patch here that just uses getpgid() to
>>>>>>>>> determine the pgid to use with waitpid(). I get the same unit test results
>>>>>>>>> before and after the patch (there are a few tests that fail consistently
>>>>>>>>> for me). I tried using a simple -1 with waitpid() but this results in a
>>>>>>>>> hang in the unit tests. This is probably something worth investigating but
>>>>>>>>> for now this patch does resolve the issues around waitpid() when attaching.
>>>>>>>>>
>>>>>>>>> We should never use -1 with waitpid() as you could end up reaping
>>>>>>>>> a process that someone else is already trying to reap. You must only try to
>>>>>>>>> reap the process that you launched using some pid for you process. Many
>>>>>>>>> things launch processes within LLDB like ProcessGDBRemote launching
>>>>>>>>> "debugserver" binaries. The ProcessGDBRemote must be notified when and if
>>>>>>>>> "debugserver" crashes, so if anyone else does waitpid(-1, ...)
>>>>>>>>> ProcessGDBRemote might never find out if debugserver crashes for some
>>>>>>>>> reason.
>>>>>>>>>
>>>>>>>>> > I mentioned a few issues I ran into with the unit tests in the
>>>>>>>>> email I just pinged with a Linux detach patch, basically there appears to
>>>>>>>>> be a bug or limitation somewhere in that doing a "continue" in asynchronous
>>>>>>>>> mode in a unit test breaks things and this is needed for most
>>>>>>>>> attach-related debug testing. I can probably look into this (along with the
>>>>>>>>> couple of failing tests I have) but likely won't be able to this week. I
>>>>>>>>> haven't filed a bug for the consistently-failing tests on my end since it
>>>>>>>>> doesn't appear to be happening for others and I haven't had a chance to
>>>>>>>>> investigate here, let me know if I should file a bug anyway.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Tue, Mar 25, 2014 at 6:30 PM, Todd Fiala <tfiala at google.com>
>>>>>>>>> wrote:
>>>>>>>>> > I'd suggest trying the change, running the tests, and see if
>>>>>>>>> anything new fails.
>>>>>>>>> >
>>>>>>>>> > And if this fixes currently broken behavior, add a test to make
>>>>>>>>> sure we don't regress it.  I'm pretty sure if it breaks something we'll
>>>>>>>>> know pretty quickly (either via bugs or our own usage).  At which point -
>>>>>>>>> we should add a test so we don't regress in the future and perhaps add a
>>>>>>>>> few comments as to the reason.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Tue, Mar 25, 2014 at 9:54 AM, <jingham at apple.com> wrote:
>>>>>>>>> > Why are you waiting for process groups?  That's not something we
>>>>>>>>> have to do on Mac OS X.
>>>>>>>>> >
>>>>>>>>> > Jim
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Mar 25, 2014, at 1:17 AM, Andrew MacPherson <
>>>>>>>>> andrew.macp at gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > > Currently under Linux if you attach to a process whose process
>>>>>>>>> group id is not equal to its process id (such as the child process of a
>>>>>>>>> fork() call) the calls to waitpid() that pass -1*pid will return ECHILD
>>>>>>>>> since the pid argument refers to a process group that doesn't exist. These
>>>>>>>>> calls occur in Host::MonitorChildProcessThreadFunction() and the Linux
>>>>>>>>> ProcessMonitor.
>>>>>>>>> > >
>>>>>>>>> > > Changing -1*pid to simply -1 or to -1*getpgid(pid) resolves
>>>>>>>>> the issue but it's not clear if this is the right fix as I'm unsure how
>>>>>>>>> other OSes deal with this scenario.
>>>>>>>>> > >
>>>>>>>>> > > Any thoughts?
>>>>>>>>> > >
>>>>>>>>> > > Thanks,
>>>>>>>>> > > Andrew
>>>>>>>>> > > _______________________________________________
>>>>>>>>> > > lldb-dev mailing list
>>>>>>>>> > > lldb-dev at cs.uiuc.edu
>>>>>>>>> > > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>>> >
>>>>>>>>> > _______________________________________________
>>>>>>>>> > lldb-dev mailing list
>>>>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Todd Fiala |   Software Engineer |     tfiala at google.com |
>>>>>>>>> 650-943-3180
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> <waitpid-getpgid.patch>_______________________________________________
>>>>>>>>> > lldb-dev mailing list
>>>>>>>>> > lldb-dev at cs.uiuc.edu
>>>>>>>>> > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>>
>>
>>
>>
>> --
>> Todd Fiala | Software Engineer |  tfiala at google.com |  650-943-3180
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20140402/2e4bf222/attachment.html>


More information about the lldb-dev mailing list