[PATCH] D47210: [lit] Fix the `--max-time` flag feature which was completely broken.

Thu May 31 11:58:51 PDT 2018

delcypher added a comment.

In https://reviews.llvm.org/D47210#1116629, @rnk wrote:

> + at thakis, who has been prototyping some lit optimizations, I think.
>
> > Another approach is to depend on the pustil module again and have it walk through all child processes of lit and kill them. This is the simplest short term solution. This probably the right thing to do because --max-time doesn't seem that widely used. Otherwise someone would have already tried to fix --max-time before me.
>
> That sounds like a good way to go.
>
> The multiprocessing shutdown behavior leads a lot to be desired, and it's basically impossible to write reliable tests for it. I regularly have problems interrupting lit test execution here on Windows. It worked better when I used mintty, but I switched back to cmd after updating to Windows 10. I think improving all that behavior is a separate issue that's out of scope for this, and probably a higher priority.

Okay. Is there a portable and reliable way to catch `pool.terminate()` being called from the parent process in the worker processes? Performing process traversal and kill in the parent process is likely to be racey so it's better if we can do it in the worker processes.

If we did the process and kill work in the parent process then it might be very unreliable because the only place we can do it is just before calling `pool.terminate()`. We can't do it after because the child processes have already lost their parent and so won't be found by traversal of the process tree.
If do it before calling `pool.terminate()` there's a risk the pool might (depending on how it's implemented) respawn worker processes if it detects they died and start redoing work.

https://reviews.llvm.org/D47210