[PATCH] D49621: [libFuzzer] Initial implementation of weighted mutation leveraging during runtime.

Tue Jul 31 15:57:18 PDT 2018

kodewilliams added inline comments.

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:23
+const double kDefaultMutationWeight = 1;
+const double kDefaultMutationStat = 1 / (100 * 1000);

----------------
Dor1s wrote:
> metzman wrote:
> > Dor1s wrote:
> > > kodewilliams wrote:
> > > > Dor1s wrote:
> > > > > metzman wrote:
> > > > > > Please add a comment to explain the significance of `100` and `1000` (frankly i don't know what the purpose is of either since we don't actually round anything).
> > > > > +1, what is it for?
> > > > It is just there to represent a usefulness ratio that is near to but not entirely useless. So that it still gets weight instead of calculating to 0.
> > > That doesn't seem right to me. Let's say we have a tough target -- where it's hard to reach any new coverage. In fact, we have a lot of them, when we use a full corpus.
> > > 
> > > So, after running for a while, it finally finds a couple useful mutations, all of them get weights, say, 10^(-6) (again, totally possible), while all "useless" mutations get the default weight of 10^(-5). That would mean, "useless" mutations would be chosen much more often than "useful" ones.
> > > 
> > > IMO, the better approach would be something like what we've discussed in the past:
> > > 
> > > 1) make random decision whether we use weighted mutations or default selection, 80% vs 20% should be fine. Once start testing, can change to 90 vs 10 or any other proportion
> > > 2) if weighted mutations was chosen, call WeightedIndex(); otherwise, use default case
> > > 
> > > That approach would be universal as it doesn't use any magic numbers which correspond to a particular target, as your 100*1000 can behave very differently with targets having different speed.
> > I think it will be harder to determine if this technique is useful with the 80/20 strategy.
> > If the concern is that the weight is too high, why don't we use the smallest positive double as the default?
> > I think it will be harder to determine if this technique is useful with the 80/20 strategy.
> 
> Why will it be harder? That percentage would just control the factor of how strongly our strategy affects fuzzing process. We should still be able to see either a negative a positive impact. Changing the distribution (e.g. use 90/10 after 80/20) would multiple that impact a little bit.
> 
> 
> > If the concern is that the weight is too high, why don't we use the smallest positive double as the default?
> 
> I can't think of a value that would work well for both cases when useful mutations have stats like 10^(-3) and 10^(-6). We should avoid relying on anything that depends on fuzz targets speed / complexity of finding a new input. We already have magic threshold of 10000, let's not add more of those.
> 
@metzman @Dor1s PTAL. Made changes and ran tests. Test in question is no longer flaky (ran about 50 times) and ran two experiments locally. In both cases, the corpus grew more with the option enabled :) so maybe a good sign.

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:595
+  for (size_t i = 0; i < Stats.size(); i++)
+    Stats[i] = (Mutators[i].UsefulCount * 1.0) / Mutators[i].TotalCount;
+}
----------------
Dor1s wrote:
> here and on line 603, what's the point of multiplying by 1.0? 
Both UsefulCount and TotalCount are uint64_t but Stats is a vector of doubles, so I multiply by 1.0 to make sure it gets calculated as double.

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D49621