[PATCH] D73776: Entropic: Boosting LibFuzzer Performance

Mon Feb 3 16:59:25 PST 2020

kcc added a comment.

Sounds good. Max and I will do the next round(s) of review.

Re RareFeaturesFreqMap, I would consider two more options:

1. a vector of pairs (feature, freq), ordered by feature. Updating the frequency would be log(N) and removing a feature would be N*log(N), but it may still be better than unordered map.
2. same as a above, but just use a fixed-size array of some small size, e.g. 8. My assumption is that most inputs have very few, if any, rare features, and in cases where a given input has more than 8 rare features, it doesn't matter if we drop some of them.

option 2 is an optimization of option 1, so can just go with option 1 and see if it's too slow.

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:53
+public:
+  size_t g_NumExecutedMutations = 0;
   InputCorpus(const std::string &OutputCorpus) : OutputCorpus(OutputCorpus) {
----------------
marcel wrote:
> kcc wrote:
> > why is this public? 
> > Also, why the g_ prefix? 
> `g_NumExecutedMutations` contains the total number of inputs generated. It is updated in FuzzerLoop.c in `MutateAndTestOne`. I can drop the `g_` prefix.
yea, please drop g_. 

================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:500
+  bool DistributionNeedsUpdate = true;
+  size_t FeaturesFreqMap[kFeatureSetSize];
+  Vector<size_t> RareFeatures;
----------------
marcel wrote:
> kcc wrote:
> > does this have to be 8-byte per feature? 
> Each entry is upper-bounded by the total number of generated inputs `g_NumExecutedMutations` which likely won't fit into `uint32_t`. That's why I chose size_t.
> 
> However, we really only need abundance information for features with an abundance below `MostAbundant_RareFeature`. If memory footprint is a concern, we can go down to `uint16_t` at the cost of an overflow check in the hot code in `UpdateFeatureFrequency`. See top-level comment for more details.
Yea, I'd prefer uint16_t and a saturated add. Just to save some RAM

================
Comment at: compiler-rt/lib/fuzzer/FuzzerLoop.cpp:607

-void Fuzzer::ReportNewCoverage(InputInfo *II, const Unit &U) {
+void Fuzzer::ReportNewCoverage(bool reduced, InputInfo *II, const Unit &U) {
   II->NumSuccessfullMutations++;
----------------
marcel wrote:
> kcc wrote:
> > do you need this change? 
> Unrelated. This is just fixing a problem where LibFuzzer prints REDUCE more often than it should.
I'd prefer to not mix unrelated changes in one diff -- makes the code review quadratic. 
Please contribute this one separately (I am not 100% sure I understand it)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73776/new/

https://reviews.llvm.org/D73776