[PATCH] D46858: Signal handling should be signal-safe

Mon May 14 20:58:33 PDT 2018

jfb added a comment.

(Will address code changes separately)

In https://reviews.llvm.org/D46858#1098806, @dexonsmith wrote:

> I wonder if there's some way to write a test for this.  E.g., a unit test that registers a signal handler for SIGUSR1 and then does what used to be necessary to trigger a crash.

Hmm I don't think so, at least not a reliable test: most of the time the test wouldn't hit a dangerous interleaving. We could try to make the bad concurrent interleaving more likely, or loop a bunch so it's more likely to hit the race because it tries more, but I think fundamentally it would be a flaky "success" test, and mostly a test that does nothing. Were the code to start being broken, it would become a flaky failure.

In https://reviews.llvm.org/D46858#1098833, @steven_wu wrote:

> I think it is easier to unregister all the signal handlers in the beginning of the llvm_shutdown() since you should be done with all the work when you call llvm_shutdown().

We might want to do this, but I'm worried that llvm_shutdown (and the ManagedStatic dtors it calls) could be used for more than just memory reclamation. If that's the case then it could crash, and then that wouldn't call signal handlers for e.g. stack traces on platforms that wanted it. If it starts taking a while then the CTRL-C handler wouldn't run (which might be fine... Presumably the tmp files are gone?).

I'm OK doing this, but I want to point out potential caveats and see what others think. It would definitely simplify some of the issues.

> This can also allow safe access ManagedStatics in signal handler, which I don't know if we have more of those other than the lock you are trying to fix. The only difference might be that if an interrupt is triggered when freeing ManagedStatics, llvm signal handler will not be used.

Using ManagedStatic from a signal handler is fraught with peril:

- It'll call new on first use. We can guard on isConstructed(), but that's problematic if llvm_shutdown is running (assuming we keep signals on when llvm_shutdown runs). We can force ourselves to always call new when the handler is first installed, but that's a bit fragile (but totally OK).
- We'll then get a mutex, and we can't grab that mutex from the signal handler, because it the thread handling the signal held the mutex we get a (single-threaded!) deadlock.
- We could spin a dedicated signal handler thread, but that seems like overkill.
- We could spin a dedicated "do signal unsafe work" thread and post the work to it from the handler, but that's also overkill and would also require lock-free magic (plus waiting for the thread to finish its work before exiting).

It seems pretty simple to have lock-free stuff instead. ManagedStatic gives the feeling that everything is OK, just use a std::vector and all, but it's really not! Imagine that vector was in the process of growing and moving its content when the signal handler kicks in... You'd observe invalid state for sure. Lock-free code, while tricky and slightly ugly, makes it pretty obvious that everything is not OK, and you can't just use std::vector (or even std::atomic<std::pair<int, int>> because that's not even a thing).

Repository:
  rL LLVM

https://reviews.llvm.org/D46858