[Lldb-commits] [PATCH] D83815: [lldb/Test] Use a process group for	subprocesses
    Jonas Devlieghere via Phabricator via lldb-commits 
    lldb-commits at lists.llvm.org
       
    Wed Jul 15 08:34:10 PDT 2020
    
    
  
JDevlieghere abandoned this revision.
JDevlieghere added a comment.
In D83815#2152688 <https://reviews.llvm.org/D83815#2152688>, @labath wrote:
> hmm... I have a lot thoughts here..
>
> - `setsid` is overkill. If you want to create process group, create a process group (`setpgid`), not a session.
> - this solution does not seem very windows-friendly. In fact, I'd be surprised if it works there at all.
> - going for `os.kill` will also not work with remote test suite runs. In fact, the way it is implemented now means we could start killing random processes on the host.
> - are these really zombies (dead but not waited-for processes) or just live processes stuck in infinite loops and similar? Because if they are zombies (mmm... brainz), then I doubt sending signals to them will help -- they're already dead. The problem is that someone is not reaping them. I'm not sure what can be done about that, as normally init should take care of this...
> - even if they are live processes, I am doubtful that this will make a difference. I don't think many of the processes we spawn create additional subprocesses. In fact, the only one I know of is `TestChangeProcessGroup.py`, which also creates a new process group... so this test would need special handling anyway.
> - creating lots of process groups can actually cause us to leak _more_ processes in case the test run is interrupted with ^C or similar (SIGINT goes to the foreground process group)
You make some compelling arguments :-) Given that we were already creating a process group on fork, I'll keep the change on line 902 that kills the process group instead of just the pid.
> Overall, I think we should try to understand the problem a bit better before we start shotgun debugging this. Which tests are creating these zombies? What's special about them? Do they always do that or only occasionally? etc.
I and many people on the team have already tried that and so far we've come up with nothing. We don't know which tests are leaking the zombies. It doesn't happen when you just run the test suite over and over again. But by the end of the week you suddenly have a few hundred... I have a machine that I only SSH in and rarely reboot. It's not uncommon for it to have more than a 1000 zombies.
> PS: When we were running a mac bot (it seems only macs are vulnerable to zombie infestations), we configured it to reboot itself each midnight. Not the most elegant solution, but if the problem really is about init refusing to reap zombies, then it may be the only thing that we can do...
GreenDragon has a "zombie detector" which periodically checks the number and reboots when it exceeds a certain value.
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83815/new/
https://reviews.llvm.org/D83815
    
    
More information about the lldb-commits
mailing list