[PATCH] D73776: Entropic: Boosting LibFuzzer Performance

Max Moroz via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 18 22:15:10 PDT 2020


Dor1s added inline comments.


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:69
       delete II;
+    RareFeatures.clear();
   }
----------------
since this is a vector, I don't think we need to manually clear it in the destructor


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:120
+    // Assign maximal energy to the new seed.
+    II.Energy = RareFeatures.empty() ? 1.0 : logl(RareFeatures.size());
+    II.SumIncidence = RareFeatures.size();
----------------
nit: seems like `log` should be sufficient, as Energy is `double`, not `long double`


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:269
+
+      for (uint32_t Idx2 : RareFeatures) {
+        if (GlobalFeatureFreqs[Idx2] >=
----------------
nit: I'd rather use `auto` or `decltype(RareFeatures)::value_type` to avoid type mismatch if we ever change `RareFeatures` definition and forget to change the type here


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:278
+      // Remove most abundant rare feature.
+      RareFeatures.erase(remove(RareFeatures.begin(), RareFeatures.end(),
+                                ST_mostAbundantRareFeatureIdx),
----------------
assuming this code gets executed quite often, and the order inside `RareFeatures` isn't important, we can avoid erase-remove and do something like:

```
RareFeatures[index_from_the_loop] = RareFeatures.back();
RareFeatures.resize(RareFeatures.size() - 1);
```

but the loop on line 269 would have to use index in the vector (from 1 to `< RareFeatures.size()`) instead of the iterator

feel free to ignore though, it's just a suggestion which may or may not be a good one :)


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:292
+    // Add rare feature and handle collisions.
+    RareFeatures.push_back(Idx);
+    GlobalFeatureFreqs[Idx] = 0;
----------------
since we always do this, `resize()` in my suggestion above might not be even needed, but that's a minor thing


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:299
+    // undiscovered feature.
+    for (auto II : Inputs) {
+      if (II->Energy > 0.0) {
----------------
can this and the loop on the line 294 be combined?


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:300
+    for (auto II : Inputs) {
+      if (II->Energy > 0.0) {
+        II->SumIncidence += 1;
----------------
please consider adding a comment why we skip inputs with zero energy


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:355
+    // Update local frequencies.
+    if (II) {
+      if (II->FeatureFreqs.empty()) {
----------------
invert the condition to reduce indentation:

```
if (!II)
  return;
```


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:357
+      if (II->FeatureFreqs.empty()) {
+        II->FeatureFreqs.push_back(std::pair<uint32_t, uint16_t>(Idx32, 1));
+      } else {
----------------
can do return after this line and avoid `else` with extra indentation below


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:357
+      if (II->FeatureFreqs.empty()) {
+        II->FeatureFreqs.push_back(std::pair<uint32_t, uint16_t>(Idx32, 1));
+      } else {
----------------
Dor1s wrote:
> can do return after this line and avoid `else` with extra indentation below
what is `1` here? would it make sense to have a static const variable with a descriptive name, e.g. `kDefaultSomething`?


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:362
+            std::lower_bound(II->FeatureFreqs.begin(), II->FeatureFreqs.end(),
+                             std::pair<uint32_t, uint16_t>(Idx32, 0));
+
----------------
same point, what is `0`? a constant with a descriptive name or (less preferred) comment would be really helpful for others who come to read the code in future


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:380
+  // of the seed. Since we do not know the entropy of a seed that has
+  // never been executed we assign fresh seeds maximum entropy and
+  // let II->Energy approach the true entropy from above.
----------------
>From the code below it seems like `Energy` represents entropy and the max value is 0, which we reduce depending on the actual feature frequencies. Is that correct understanding?


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:445
+    // i.e., randomly do not skip.
+    if (!DistributionNeedsUpdate && Rand(kSparseEnergyUpdates))
+      return;
----------------
it seems like the `Rand(kSparseEnergyUpdates)` clause is applicable to the `Entropic` case only, is that correct? Do we really need it in the vanilla case?


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:473
+      for (size_t i = 0; i < N; i++) {
+        Weights[i] =
+            Inputs[i]->NumFeatures == 0 ||
----------------
could you please rewrite this in a more readable if-else form?


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:475
+            Inputs[i]->NumFeatures == 0 ||
+                    Inputs[i]->NumExecutedMutations / 20 >
+                        NumExecutedMutations / Inputs.size() ||
----------------
why `20`? a constant with a descriptive name or a comment would be appreciated


================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:487
+        if (Inputs[i]->HasFocusFunction) {
+          VanillaSchedule = true;
+          break;
----------------
so if there is at least one input that touches focus function, we will be always wasting time in this loop (starting on the  line 472) and then falling back to the `VanillaSchedule` case? In such case I think we should just check `Options.FocusFunction` and use vanilla schedule if it's set, just because almost always there will be input(s) touching the focus function


================
Comment at: compiler-rt/lib/fuzzer/FuzzerLoop.cpp:712
   }
+  II.NeedsUpdate = true;
 }
----------------
does it always need update, even when new coverage wasn't observed?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73776/new/

https://reviews.llvm.org/D73776





More information about the llvm-commits mailing list