[Lldb-commits] [PATCH] D13124: test runner: switch to pure-Python timeout mechanism
Pavel Labath via lldb-commits
lldb-commits at lists.llvm.org
Thu Sep 24 08:50:34 PDT 2015
labath added a comment.
In http://reviews.llvm.org/D13124#252791, @tfiala wrote:
> In http://reviews.llvm.org/D13124#252564, @labath wrote:
> > I don't want to stand in the way of progress (and I do think that getting rid of the timeout dependency is progress), but this implementation regresses in a couple of features compared to using timeout:
> > - timeout tries (with moderate success) to cleanup children spawned by the main process. your implementation will (afaik) kill only the main process. This is especially important for build bots, since leaking processes will starve the bot's resources after a while (and we have had problems with this on our darwin build bot).
> Fair enough. Either yesterday or the day before, I put up a change that ensures the dotest inferiors are in a separate process group. I could (on non-Windows) send that process group a SIGQUIT. The reason I prefer not to use SIGQUIT is that it is catchable, and thus can be ignored. Once something can be ignored, the runner needs to be able to handle the timed out child really not ending with a SIGQUIT. That means either more logic around timing out on the "wait for death", or possibly not reaping. In the case of a test runner, I'd probably put the focus on making sure the test runner makes it through ahead of getting the coredump from a process that times out.
None of the processes we run catch SIGQUIT, so it's mostly safe, but we can do the QUIT+sleep+KILL combo to be safe.
The problem with the race conditions is that they are very hard to reproduce. I often need to run the test suite at full speed dozens of times to be able to catch it happening, and then the only way of diagnosing the issue is digging through the core files.
Thanks for working on this once again :)
More information about the lldb-commits