[llvm] r278544 - [LibFuzzer] Fix `-jobs=<N>` where <N> > 1 and the number of workers is > 1 on macOS.

Fri Aug 19 07:27:50 PDT 2016

On 18 August 2016 at 19:48, Kostya Serebryany <kcc at google.com> wrote:
>
>
> On Thu, Aug 18, 2016 at 11:29 AM, Dan Liew <dan at su-root.co.uk> wrote:
>>
>> >> Perhaps the 1 second time is
>> >> too short for some buildbots if they are under load? We could extend
>> >> the time we wait (to something > 1 second) but we'd have to probably
>> >> increase `-max_total_time=4` to something larger. This wouldn't be a
>> >> real fix though as the test would still be racey.
>> >
>> >
>> > I think it can be rewritten to be non-racey.
>> > Just run it with -max_total_time=2 -jobs=2 and let it and all children
>> > exit
>> > -- then check that the files exist and are good.
>>
>> I agree that doing that would be non-racey however that would be
>> testing a slightly different property (that all jobs eventually run)
>> rather than what the test is currently trying to check (that LibFuzzer
>> can spawn multiple copies of itself running in parallel). The change
>> you proposing to the test would fail to catch the problem that I fixed
>> on macOS (LibFuzzer would not run multiple jobs in parallel).
>
>
> Ok...
> You can probably modify your test to do grep "Running 2 workers" instead of

That's actually slightly worse. That currently only prints that
message when `Flags.jobs > 0 && Flags.workers == 0` and that is false
for the test (we need to be independent from the number of CPUs on the
host so the number of workers and jobs are set to be the same, i.e.
2).

Even if we modified LibFuzzer to print the message in all cases it has
the same problem as the previous solution you proposed, namely that
the test would fail to detect the problem I fixed on macOS. This
message would print before calling `ExecuteCommand()` and we wouldn't
be actually checking that the child processes were launched in
parallel.

I think we either need to

* Abandon testing that the jobs actually run in parallel (what I
wanted to test) and instead check that all jobs run and eventually
finish without actually checking if the jobs actually ran in parallel.
OR
* Make the times used in the existing test more tolerant to the system
being under load by sleeping for longer and running the jobs for
longer.

Yes this does suck. I don't know of a way to observe that two
LibFuzzer jobs are actually running in parallel in a way that isn't
racey.