[PATCH] D49621: [libFuzzer] Initial implementation of weighted mutation leveraging during runtime.

Mon Jul 30 17:27:24 PDT 2018

Dor1s added inline comments.

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:23
+const double kDefaultMutationWeight = 1;
+const double kDefaultMutationStat = 1 / (100 * 1000);

----------------
kodewilliams wrote:
> Dor1s wrote:
> > metzman wrote:
> > > Please add a comment to explain the significance of `100` and `1000` (frankly i don't know what the purpose is of either since we don't actually round anything).
> > +1, what is it for?
> It is just there to represent a usefulness ratio that is near to but not entirely useless. So that it still gets weight instead of calculating to 0.
That doesn't seem right to me. Let's say we have a tough target -- where it's hard to reach any new coverage. In fact, we have a lot of them, when we use a full corpus.

So, after running for a while, it finally finds a couple useful mutations, all of them get weights, say, 10^(-6) (again, totally possible), while all "useless" mutations get the default weight of 10^(-5). That would mean, "useless" mutations would be chosen much more often than "useful" ones.

IMO, the better approach would be something like what we've discussed in the past:

1) make random decision whether we use weighted mutations or default selection, 80% vs 20% should be fine. Once start testing, can change to 90 vs 10 or any other proportion
2) if weighted mutations was chosen, call WeightedIndex(); otherwise, use default case

That approach would be universal as it doesn't use any magic numbers which correspond to a particular target, as your 100*1000 can behave very differently with targets having different speed.

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:607
+      MutationWeights[i] =
+          (Stats[i] * 1000 / SumOfStats) + kDefaultMutationWeight;
+    }
----------------
kodewilliams wrote:
> Dor1s wrote:
> > I don't understand why we need `+ kDefaultMutationWeight`, as well as why we multiply the value by `1000`.
> I add the default to ensure that mutations  that are any amount more useful than the default get weighted higher in the distribution. No longer multiplying by 1000.
That won't work well for all targets. If mutations stats are very low numbers, that default mutation weight constant will bias the selection towards more uniform distribution, rather than giving more favor to more useful mutations.

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:43
+          {&MutationDispatcher::Mutate_ShuffleBytes, "ShuffleBytes", 1, 1},
           {&MutationDispatcher::Mutate_ChangeASCIIInteger, "ChangeASCIIInt", 0,
+           1},
----------------
metzman wrote:
> Why do `ChangeASCIIInt` and `ChangeBinInt` start with a `UsefulCount` of `0` when everything else starts with `1`?
+1

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:595
+  for (size_t i = 0; i < Stats.size(); i++)
+    Stats[i] = (Mutators[i].UsefulCount * 1.0) / Mutators[i].TotalCount;
+}
----------------
here and on line 603, what's the point of multiplying by 1.0? 

================
Comment at: lib/fuzzer/FuzzerMutate.cpp:607
+size_t MutationDispatcher::WeightedIndex() {
+  Random Rand(time(0));
+  std::discrete_distribution<size_t> SelectIndex(MutationWeights.begin(),
----------------
metzman wrote:
> Kode, I don't think this is what Max meant. I believe that here you need to use the rand obtained by calling `GetRand`.
> It's possible, though unlikely, that not doing this was causing the test flakiness you were seeing before. 
> I would run the previously flaky test in a loop and ensure that the tests pass every time, since if there is any flake we don't know about, it will be more painful to get rid of later rather than now.
Yes, don't create a new Random object, use an existing one from FuzzerDriver. That's crucial for keeping deterministic behavior, as Jonathan has pointed out.

Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D49621