[PATCH] D49621: [libFuzzer] Initial implementation of weighted mutation leveraging during runtime.

Jonathan Metzman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 20 17:34:17 PDT 2018


metzman added a comment.

Left a few comments with high level concerns and some less important ones as well.
Did you use clang-format on this? If not I may have to start using vscode myself since clang-format had no suggested changes to this. Good job.



================
Comment at: lib/fuzzer/FuzzerFlags.def:159
 FUZZER_FLAG_INT(print_mutation_stats, 0, "Experimental")
+FUZZER_FLAG_INT(use_weighted_mutations, 0, "Experimental")
----------------
Please put some kind of explanation here.


================
Comment at: lib/fuzzer/FuzzerLoop.cpp:546
   TotalNumberOfRuns++;
+  assert(TotalNumberOfRuns > 0);
+  if (TotalNumberOfRuns % 10000 == 0 && Options.UseWeightedMutations)
----------------
Is this assertion useful?


================
Comment at: lib/fuzzer/FuzzerLoop.cpp:547
+  assert(TotalNumberOfRuns > 0);
+  if (TotalNumberOfRuns % 10000 == 0 && Options.UseWeightedMutations)
+    MD.AssignMutationWeights();
----------------
Let's not use magic numbers. Please use a constant instead of 10,000.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:523
     auto M = &Mutators[Rand(Mutators.size())];
+    if (UseMutationWeights) M = &Mutators[WeightedIndex()];
     size_t NewSize = (this->*(M->Fn))(Data, Size, MaxSize);
----------------
Better not to clobber the original mutator. I would do:
```
if (UseMutationWeights)
  ...
else
  ...
```

instead of what we have here.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:587
 
+void MutationDispatcher::GetMutationStats(Vector<double> &Stats) {
+  Stats.resize(Mutators.size(), 0);
----------------
I think this should be `SetMutationStats` since it sets something rather than returns something.
I would also add a comment explaining what this method does since it changes the vector it accepts.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:591
+    double UsefulPercentage =
+        M.TotalCount ? (100.0 * M.UsefulCount) / M.TotalCount : 0;
+    Stats.push_back(UsefulPercentage);
----------------
Although we use percentages in when printing stats, I'm not sure we need to do it here (and in other parts of this CL), since the only other thing that will see this number is your own code.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:602
+  for (const double &Stat : Stats) SumOfStats += Stat;
+  // Convert stats into weights
+  for (size_t i = 0; i < Mutators.size(); i++) {
----------------
nit: Please add a period to the end of this comment and all others that are missing one.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:605
+    // Set to least weighted (1) if usefulness is zero so far
+    if (Stats[i] == 0) MutationWeights[i] = 1;
+    // 1000 due to 3 decimal places
----------------
I don't think I like the fact that 1 is the default weight. But maybe I misunderstand what its implications are.

I think because 1 is the default weight, a totally useless mutation would be weighted higher than a mutation that is useful less than 1/1000 times.

Less importantly:

Let's use a constant instead of `1` a magic number.

Also, I think it would be better if the body of this if were on its own line (for consistency with the else) but clang-format agrees with you.
I don't think you should change this unless matt or kcc says to.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:606
+    if (Stats[i] == 0) MutationWeights[i] = 1;
+    // 1000 due to 3 decimal places
+    else
----------------
I think the placement of this comment is awkward. Maybe put on the same line as `else`.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:608
+    else
+      MutationWeights[i] = static_cast<size_t>((Stats[i] * 1000) / SumOfStats);
+  }
----------------
I'm somewhat unsure why we divide by `SumOfStats`, can you explain this to me please?


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:610
+  }
+  for (const size_t Weight : MutationWeights) SumOfWeights += Weight;
+  UseMutationWeights = true;
----------------
Think this loop can be eliminated and that SumOfWeights can be calculated in the previous loop.
Actually, won't it always be `1000`?


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:615
+size_t MutationDispatcher::WeightedIndex() {
+  // Will return even the lowest weighted mutations, but ones with greater
+  // weights have a higher probability of being returned.
----------------
Helpful comment! I think it can be improved in 2 ways:
1. Move it outside the function. 
2. Change the order so that it says the main purpose first and then says what happens to lowest weighted mutations.


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:619
+  for (size_t i = 0; i < MutationWeights.size(); i++) {
+    if (RandomMark - MutationWeights[i] <= 0)
+      return i;
----------------
I don't think this line does what you want. Did you have any warnings compiling your code?

`RandomMark` and `MutationWeights` are both unsigned values so `RandomMark - MutationWeights[i]` will never be less than `0`.
I think comparing them directly will work better than subtracting them (which I think is harder to understand).


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:622
+    else
+      RandomMark -= MutationWeights[i];
+  }
----------------
Is the purpose of this to ensure that we return an index on the last iteration?
I need to think about this technique more but I am not sure it is fair (ie: things with equal weight have equal chance of being picked).


================
Comment at: lib/fuzzer/FuzzerMutate.cpp:624
+  }
+  // Fallback, should never get to this point.
+  return 0;
----------------
I would add an `assert(false)` so that it is obvious if it does happen.


================
Comment at: lib/fuzzer/FuzzerMutate.h:99
 
+  // Gets tally of mutations resulting in new coverage for usefulness metric.
   void RecordUsefulMutations();
----------------
IMO the word "Records" is better than "Gets" which implies something returned. 
I also think this sentence is easier to understand with a comma after "coverage"


================
Comment at: lib/fuzzer/FuzzerMutate.h:171
+
+  // For experimental runtime mutation stats leveraging.
+  Vector<size_t> MutationWeights;
----------------
super nit: I think the language here can be slightly improved. Maybe this instead:
```
// Used to weight mutations based on usefulness.
```


================
Comment at: lib/fuzzer/FuzzerMutate.h:172
+  // For experimental runtime mutation stats leveraging.
+  Vector<size_t> MutationWeights;
+  bool UseMutationWeights = false;
----------------
I don't think `MutationWeights` should store `size_t` since it isn't really storing sizes.
This `size_t` seems to have infected a lot of other code in this CL so removing it will be nice.


================
Comment at: lib/fuzzer/FuzzerMutate.h:173
+  Vector<size_t> MutationWeights;
+  bool UseMutationWeights = false;
 };
----------------
I don't find it great to use `UseMutationWeights` to special case the case when we haven't assigned any weights yet.
It would be cleaner  if GetMutationStats and AssignMutationWeights were called before any mutations happened, so that we don't need to special case but everything works as if we did.


================
Comment at: test/fuzzer/fuzzer-weightedmutations.test:4
+# Weighted mutations only trigger after first 10,000 runs, hence flag.
+RUN: not %run %t-WeightedMutationsTest -use_weighted_mutations=1 -runs=100000 2>&1 | FileCheck %s
+
----------------
How long does it take to do 100k runs? Does this UnitTest take long?

Also, what behavior does this test verify? The only thing I can tell is that weighted mutations doesn't crash anything. 
I think it would be nice to have a test of something more substantial if possible.


Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D49621





More information about the llvm-commits mailing list