[PATCH] D73776: Entropic: Boosting LibFuzzer Performance
Kostya Serebryany via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 31 14:26:01 PST 2020
kcc added a comment.
It's exciting that such a small change can bring such a great improvement.
Thanks for the contribution.
I left several comments in the code and here are some higher level comments:
- put the new functionality under a flag, off by default, so that we can properly A/B test it once it's committed.
If it proves to be superior we'll enable the flag by default and then remove the old algorithm and remove the flag.
- try to unit-test the most interesting functionality (see fuzzer/tests/FuzzerUnittest.cpp)
- write a top-level comment explaining the algorithm.
- try to avoid unordered_map on the hot code: sadly, the STL hash table is almost always the wrong choice of a data structure for hot code.
Once there is a top-level comment I may be able to suggest a better data structure.
But I expect it to be something like an array of pairs, organized somehow (sorted? heap?)
How many rare features do we need per input? Maybe just have a fixed size array?
- don't use size_t where you can get away with uint32_t - memory footprint is important.
- don't use long double (unless you have to -- in which case explain why)
- there are mentions of local vs global frequencies. Please explain in a comment what that is (corpus vs individual input?) and reflect this in the variable names (like, RareFeaturesFreqMap => LocalFreqMap)
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:45
+ double Energy;
+ long double Sum_Y;
+ std::unordered_map<size_t, size_t> RareFeaturesFreqMap;
----------------
why not just 'double'?
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:46
+ long double Sum_Y;
+ std::unordered_map<size_t, size_t> RareFeaturesFreqMap;
};
----------------
Any chance to have a planer data structure here? (sorted array or some such)
Also, why size_t?
The feature is 4 bytes and the frequency can probably be 4 or even 2 bytes. (2 bytes is hard for alignment).
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:53
+public:
+ size_t g_NumExecutedMutations = 0;
InputCorpus(const std::string &OutputCorpus) : OutputCorpus(OutputCorpus) {
----------------
why is this public?
Also, why the g_ prefix?
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:61
+ for (auto II : Inputs) {
+ II->RareFeaturesFreqMap.clear();
delete II;
----------------
do you need this?
the following line should take care of it
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:86
+ size_t NumSingletons() const {
+ size_t singletons = 0;
+ for (uint32_t Idx : RareFeatures)
----------------
Please try to follow the coding style, i.e. capitalize the names:
Singletons
Also, the term Singleton may be a bit too overloaded.
How about
NumFeaturesCoveredByASingleInput
or some such?
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:250
+ // Firstly, remove most abundant feature until we are "under the line".
+ while (RareFeatures.size() > 100 && MostAbundant_RareFeature > 0xFF) {
+ size_t st_mostAbundant_RareFeature_Idx =
----------------
try not to use constants like this.
At the very least, use
const size_t kSomething = 100;
..
if (xxx.size() > kSomething)
if there is any reason to play with different value, you may want to use a flag instead
Similar for 0xFF.
Probably, the best here is to replace these two constants with function parameters so that this function becomes unit-testable.
Which reminds me: please consider adding unit tests to the key functionality like this one.
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:328
+ void UpdateFeatureFrequency(InputInfo *II, size_t Idx) {
+ Idx = Idx % kFeatureSetSize;
----------------
This is the hotspot, right?
My guess is that using a hash map here causes most of the slowdown, need a faster data structure...
Need to think...
It will help if you have describe the algorithm in a comment.
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:378
+ void UpdateEnergy(InputInfo *II) {
+ if (!II->NeedsUpdate || II->Energy == 0.0)
----------------
a top-level comment explaining the computations would be nice
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:391
+ // locally discovered, globally rare features
+ for (auto it = II->RareFeaturesFreqMap.begin();
+ it != II->RareFeaturesFreqMap.end(); ++it) {
----------------
why not
for (auto It : I->RareFeaturesFreqMap)
?
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:425
+ // i.e., randomly do not skip.
+ if (!DistributionNeedsUpdate && random() % 10000)
+ return;
----------------
use () just in case, replace 10000 with kSomething
don't use random(), pass (Random &Rand) instead
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:438
+ // normalize.
+ bool aggressiveSchedule = rand() % 100 > 20;
+
----------------
don't use libc's rand(), pass (Random &Rand) instead
================
Comment at: compiler-rt/lib/fuzzer/FuzzerCorpus.h:500
+ bool DistributionNeedsUpdate = true;
+ size_t FeaturesFreqMap[kFeatureSetSize];
+ Vector<size_t> RareFeatures;
----------------
does this have to be 8-byte per feature?
================
Comment at: compiler-rt/lib/fuzzer/FuzzerLoop.cpp:607
-void Fuzzer::ReportNewCoverage(InputInfo *II, const Unit &U) {
+void Fuzzer::ReportNewCoverage(bool reduced, InputInfo *II, const Unit &U) {
II->NumSuccessfullMutations++;
----------------
do you need this change?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D73776/new/
https://reviews.llvm.org/D73776
More information about the llvm-commits
mailing list