[PATCH] D47210: [lit] Fix the `--max-time` flag feature which was completely broken.

Dan Liew via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Jun 9 17:07:57 PDT 2018


delcypher added a comment.

In https://reviews.llvm.org/D47210#1118125, @rnk wrote:

> In https://reviews.llvm.org/D47210#1117993, @delcypher wrote:
>
> > Okay. Is there a portable and reliable way to catch `pool.terminate()` being called from the parent process in the worker processes? Performing process traversal and kill in the parent process is likely to be racey so it's better if we can do it in the worker processes.
>
>
> I don't think so. I think terminate is intended to be unclean, i.e. you might get SIGKILL like behavior on some platforms.


Looking at the implementation though it looks like `multiprocessing.Pool.terminate()` calls `.terminate()` on `multiprocessing.Process` which looks like it sends `SIGTERM` to worker processes.
So it looks like I could handle killing child processes in the worker processes, at least on POSIX systems anyway.

>> If we did the process and kill work in the parent process then it might be very unreliable because the only place we can do it is just before calling `pool.terminate()`. We can't do it after because the child processes have already lost their parent and so won't be found by traversal of the process tree.
>>  If do it before calling `pool.terminate()` there's a risk the pool might (depending on how it's implemented) respawn worker processes if it detects they died and start redoing work.
> 
> It might not be as racy as you think. pids are not available for reuse until after the parent process has waited on them. This is how you get "zombie" processes in Unix, I think. So, taking the interrupt in the parent, iterating, killing each in turn, then waiting, shouldn't be racy: either the child worker will finish before it receives the kill signal, or it will take the kill signal. Then we wait on it.
> 
> You will need a way to cancel the creation of new workers. Does the .close() method do that?

No it doesn't. All this does is prevent new tasks being added to the pool.

This was the problem (creation of new workers) I was concerned with. The implementation of `multiprocessing.Pool` (at least for Python 2.7 on macOS) spawns a thread to periodically keep the pool alive by respawning missing worker processes.
I can certainly do some hacks that abuses knowledge of the internal workings of `multiprocessing.Pool` to make sure the pool doesn't respawn when I kill processes but this would be very fragile.


https://reviews.llvm.org/D47210





More information about the llvm-commits mailing list