[PATCH] D47210: [lit] Fix the `--max-time` flag feature which was completely broken.

Dan Liew via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 29 18:45:25 PDT 2018


delcypher added a comment.

@rnk Hmm this isn't quite ready to land. I've just noticed that the `shtest-max-global-time-multiprocess.py` leaves the `1_infinite_loop.py` process (and the shell - bash in my case - that spawned it) lying around. This will cause problems on the bots so we can't commit this.

My guess is calling `pool.terminate()` followed by `pool.wait()` doesn't actually do proper clean up. This problem doesn't seem very easy to fix. When a user presses CTRL+C this calls `abort_now()`. On POSIX systems this sends SIGKILL to process ID 0 which causes all processes in the same group to be killed.
We can't use that strategy to fix this problem. Even if we do everything we want lit to do (e.g. write out an XML report) and then call `abort_now()` by using `atexit.register(abort_now)` this doesn't work. This breaks the testing because the lit process testing a running copy of lit are in the same process group.
Calling `abort_now()` in the inner running copy of lit also kills the outer running copy. This leads to the running the lit test suite failing.

One approach to fixing this could be to have lit run each test in its own process group. Then when we need to kill things we kill each group. This approach means we can kill lit's child processes without killing the main lit process itself. It also potentially means we could fix
the per-test-timeout feature to kill tests this way and thus avoiding our dependency on the `psutil` module. This approach has some downsides though.

- Won't work on Windows. We'd need to come up with something else.
- SIGKILL'ing the process group of the main lit process will no longer kill all child processes. I'm not sure if this a problem or not.

Another approach is to depend on the `pustil` module again and have it walk through all child processes of lit and kill them. This is the simplest short term solution. This probably the right thing to do because `--max-time` doesn't seem that widely used. Otherwise someone would have already tried to fix `--max-time` before me.

What do you think?


https://reviews.llvm.org/D47210





More information about the llvm-commits mailing list