[PATCH] D66107: [libFuzzer] Improve -merge= process to account for REDUCED corpus units.

Mon Aug 12 13:10:23 PDT 2019

Dor1s created this revision.
Dor1s added reviewers: morehouse, metzman, hctim.
Herald added subscribers: Sanitizers, mgrang, delcypher.
Herald added projects: LLVM, Sanitizers.
Dor1s updated this revision to Diff 214697.
Dor1s added a comment.

fix a typo

Dor1s added a comment.

Hi everyone,

This CL is a proof-of-concept to start the discussion. If you approve the approach and/or suggest any improvements, I'll go ahead and polish it + update the tests. Please take a look :)

Context: https://github.com/google/clusterfuzz/pull/815#issuecomment-520087538

Without this change, libFuzzer does not account for REDUCED inputs
during -merge, if the larger of two equivalent inputs resides in the output
corpus directory. A user may overcome this limitation by creating an empty
directory for the merge output corpus, as libFuzzer does prefer smaller units
when merging corpora from the other dirs. However, that way libFuzzer will not
provide useful incremental stats (such as `X new files with Y new features
added; Z new coverage edges`) to the user.

This change aims to close that gap and make `-merge=` process overwrite the
existing units in the output corpus directory if a shorter unit from any other
corpus dir gives the same coverage. The high level idea is the following:

1. Emit `SIGNATURE` value for every unit in the merge control file. The value is calculated as a hash of all coverage edges and features for a given unit. By using a hash we do not significantly increase the size of the merge control file and also avoid numerous comparisons of arrays of numbers.

2. During the actual merge process, we map all the signatures for the units in the output corpus directory to the corresponding file names and sizes. Then, when looping through the rest of the units, we check if any of them has a signature value matching any of the signatures in the output corpus dir, and, if yes, we record a pair of file names to be returned to the `FuzzerDriver`.

3. In current implementation, the `FuzzerDriver` goes through the vector of file pairs and does file replacing. A potential improvement here might be to combine both `NewFiles` and `ReplacedFiles` into a single vector containing pairs of file names. For `NewFiles`, the destination file name will be empty.

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D66107

Files:
  lib/fuzzer/FuzzerDefs.h
  lib/fuzzer/FuzzerDriver.cpp
  lib/fuzzer/FuzzerFork.cpp
  lib/fuzzer/FuzzerMerge.cpp
  lib/fuzzer/FuzzerMerge.h
  lib/fuzzer/FuzzerSHA1.cpp
  lib/fuzzer/FuzzerSHA1.h
  lib/fuzzer/tests/FuzzerUnittest.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D66107.214697.patch
Type: text/x-patch
Size: 13015 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190812/386d0b96/attachment.bin>