[PATCH] D49621: [libFuzzer] Initial implementation of weighted mutation leveraging during runtime.

Tue Jul 31 11:26:23 PDT 2018

Dor1s added inline comments.

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:23
+const double kDefaultMutationWeight = 1;
+const double kDefaultMutationStat = 1 / (100 * 1000);

----------------
metzman wrote:
> Dor1s wrote:
> > kodewilliams wrote:
> > > Dor1s wrote:
> > > > metzman wrote:
> > > > > Please add a comment to explain the significance of `100` and `1000` (frankly i don't know what the purpose is of either since we don't actually round anything).
> > > > +1, what is it for?
> > > It is just there to represent a usefulness ratio that is near to but not entirely useless. So that it still gets weight instead of calculating to 0.
> > That doesn't seem right to me. Let's say we have a tough target -- where it's hard to reach any new coverage. In fact, we have a lot of them, when we use a full corpus.
> > 
> > So, after running for a while, it finally finds a couple useful mutations, all of them get weights, say, 10^(-6) (again, totally possible), while all "useless" mutations get the default weight of 10^(-5). That would mean, "useless" mutations would be chosen much more often than "useful" ones.
> > 
> > IMO, the better approach would be something like what we've discussed in the past:
> > 
> > 1) make random decision whether we use weighted mutations or default selection, 80% vs 20% should be fine. Once start testing, can change to 90 vs 10 or any other proportion
> > 2) if weighted mutations was chosen, call WeightedIndex(); otherwise, use default case
> > 
> > That approach would be universal as it doesn't use any magic numbers which correspond to a particular target, as your 100*1000 can behave very differently with targets having different speed.
> I think it will be harder to determine if this technique is useful with the 80/20 strategy.
> If the concern is that the weight is too high, why don't we use the smallest positive double as the default?
> I think it will be harder to determine if this technique is useful with the 80/20 strategy.

Why will it be harder? That percentage would just control the factor of how strongly our strategy affects fuzzing process. We should still be able to see either a negative a positive impact. Changing the distribution (e.g. use 90/10 after 80/20) would multiple that impact a little bit.

> If the concern is that the weight is too high, why don't we use the smallest positive double as the default?

I can't think of a value that would work well for both cases when useful mutations have stats like 10^(-3) and 10^(-6). We should avoid relying on anything that depends on fuzz targets speed / complexity of finding a new input. We already have magic threshold of 10000, let's not add more of those.

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:527
   for (int Iter = 0; Iter < 100; Iter++) {
-    auto M = &Mutators[Rand(Mutators.size())];
+    if (Options.UseWeightedMutations)
+      M = &Mutators[WeightedIndex()];
----------------
With my proposal it will be something like:

```
// Even when use weighted mutations, fallback to the default selection in 20% of cases.
if (Options.UseWeightedMutations && Rand(100) < 80) 
  M = &Mutators[WeightedIndex()];
else
  M = &Mutators[Rand(Mutators.size())];
```

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D49621