[PATCH] D73776: Entropic: Boosting LibFuzzer Performance

Fri Jan 31 14:26:01 PST 2020

kcc added a comment.

It's exciting that such a small change can bring such a great improvement. 
Thanks for the contribution.

I left several comments in the code and here are some higher level comments:

- put the new functionality under a flag, off by default, so that we can properly A/B test it once it's committed.

If it proves to be superior we'll enable the flag by default and then remove the old algorithm and remove the flag.

- try to unit-test the most interesting functionality (see fuzzer/tests/FuzzerUnittest.cpp)
- write a top-level comment explaining the algorithm.
- try to avoid unordered_map on the hot code: sadly, the STL hash table is almost always the wrong choice of a data structure for hot code.

Once there is a top-level comment I may be able to suggest a better data structure. 
But I expect it to be something like an array of pairs, organized somehow (sorted? heap?)
How many rare features do we need per input? Maybe just have a fixed size array?

- don't use size_t where you can get away with uint32_t - memory footprint is important.
- don't use long double (unless you have to -- in which case explain why)
- there are mentions of local vs global frequencies. Please explain in a comment what that is (corpus vs individual input?) and reflect this in the variable names (like, RareFeaturesFreqMap => LocalFreqMap)

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:45
+  double Energy;
+  long double Sum_Y;
+  std::unordered_map<size_t, size_t> RareFeaturesFreqMap;
----------------
why not just 'double'? 

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:46
+  long double Sum_Y;
+  std::unordered_map<size_t, size_t> RareFeaturesFreqMap;
 };
----------------
Any chance to have a planer data structure here? (sorted array or some such)
Also, why size_t? 
The feature is 4 bytes and the frequency can probably be 4 or even 2 bytes. (2 bytes is hard for alignment). 

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:53
+public:
+  size_t g_NumExecutedMutations = 0;
   InputCorpus(const std::string &OutputCorpus) : OutputCorpus(OutputCorpus) {
----------------
why is this public? 
Also, why the g_ prefix? 

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:61
+    for (auto II : Inputs) {
+      II->RareFeaturesFreqMap.clear();
       delete II;
----------------
do you need this? 
the following line should take care of it

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:86
+  size_t NumSingletons() const {
+    size_t singletons = 0;
+    for (uint32_t Idx : RareFeatures)
----------------
Please try to follow the coding style, i.e. capitalize the names:
  Singletons

Also, the term Singleton may be a bit too overloaded. 
How about 
    NumFeaturesCoveredByASingleInput
or some such?

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:250
+    // Firstly, remove most abundant feature until we are "under the line".
+    while (RareFeatures.size() > 100 && MostAbundant_RareFeature > 0xFF) {
+      size_t st_mostAbundant_RareFeature_Idx =
----------------
try not to use constants like this. 
At the very least, use
  const size_t kSomething = 100;
  ..
  if (xxx.size() > kSomething)

if there is any reason to play with different value, you may want to use a flag instead

Similar for 0xFF. 

Probably, the best here is to replace these two constants with function parameters so that this function becomes unit-testable. 
Which reminds me: please consider adding unit tests to the key functionality like this one. 

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:328

+  void UpdateFeatureFrequency(InputInfo *II, size_t Idx) {
+    Idx = Idx % kFeatureSetSize;
----------------
This is the hotspot, right? 
My guess is that using a hash map here causes most of the slowdown, need a faster data structure... 
Need to think... 
It will help if you have describe the algorithm in a comment. 

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:378

+  void UpdateEnergy(InputInfo *II) {
+    if (!II->NeedsUpdate || II->Energy == 0.0)
----------------
a top-level comment explaining the computations would be nice

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:391
+    // locally discovered, globally rare features
+    for (auto it = II->RareFeaturesFreqMap.begin();
+         it != II->RareFeaturesFreqMap.end(); ++it) {
----------------
why not 
   for (auto It : I->RareFeaturesFreqMap)
?

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:425
+    // i.e., randomly do not skip.
+    if (!DistributionNeedsUpdate && random() % 10000)
+      return;
----------------
use () just in case, replace 10000 with kSomething
don't use random(), pass (Random &Rand) instead

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:438
+    // normalize.
+    bool aggressiveSchedule = rand() % 100 > 20;
+
----------------
don't use libc's rand(),  pass (Random &Rand) instead

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:500
+  bool DistributionNeedsUpdate = true;
+  size_t FeaturesFreqMap[kFeatureSetSize];
+  Vector<size_t> RareFeatures;
----------------
does this have to be 8-byte per feature? 

================
Comment at: compiler-rt/lib/fuzzer/FuzzerLoop.cpp:607

-void Fuzzer::ReportNewCoverage(InputInfo *II, const Unit &U) {
+void Fuzzer::ReportNewCoverage(bool reduced, InputInfo *II, const Unit &U) {
   II->NumSuccessfullMutations++;
----------------
do you need this change? 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73776/new/

https://reviews.llvm.org/D73776