[PATCH] D22742: [WIP] Fix `-jobs=<N>` where <N> > 1 on macOS.

Mon Jul 25 20:51:47 PDT 2016

delcypher added a comment.

In https://reviews.llvm.org/D22742#495674, @kcc wrote:

> > > I am not happy with this implementation as it seems like a gross
>
> > 
>
>
> Gross indeed. 
>  At the very least, move all the OSX-specific code to an OSX-specific file.

I can do that but first I'd like to decide on fix this properly.

> Also, if you had to write that much code, surely it deserves a test :)

I wanted to write a test but I don't know how to do so in a reliable way from llvm-lit. The only way I know of doing this is by running `ps -af` in parallel and checking that the multiple invocations of the fuzzer binary are all using non zero cpu time. Writing something that does that using bash shell code if probably very fragile.

> > > This allows pressing CTRL+C in the terminal to do the right thing on macOS.

> 

> > 

> 

> 

> What exactly do you mean by the "right thing"?

Without the extra code when running in `RunInMultipleProcesses()` pressing CTRL+C in a shell
will immediately kill the parent process on macOS and can leave around zombie processes due
`waitpid()` not being called by the parent process to clean up the children. This is an artifact of my implementation of `NonLockingSystem()`.

> Do we need to do that "right thing" at the cost of such extra complexity?

This extra complexity is due to my implementation of `NonLockingSystem()` not blocking SIGINT and SIGQUIT.
This was a design decision on my part to avoid blocking and unblocking those signals in a racey way.
This is why `system()` on macOS has a mutex because calling `system()` from multiple threads is not safe. This
patch blocks and unblocks the signal in the main thread before creating the threads and restores the signal handling
after all threads have finished which is safe but very ugly.

One way of avoiding this "extra complexity" is simply to have `NonLockingSystem()` also block and unblock SIGINT and SIGQUIT itself
and just accept that doing this is racey to prevent implementation details leaking into other parts of LibFuzzer's code (i.e. what is happening in this patch).

Another way of cleaning this up is to do basically what the patch is doing now (setting and restoring the signal handlers in the main thread) but refactor doing that into helper methods.

The linux code also has the same race but I guess glibc's version of `system()` doesn't lock so you haven't run into the issue I did.

What is more important to you, race freedom or simpler/cleaner code?

https://reviews.llvm.org/D22742