[llvm-dev] Some feedback on Libfuzzer

Kostya Serebryany via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 8 11:07:43 PDT 2015


More replies below.

If you feel some of your questions left unanswered, please ping or file a
bug.

On Sat, Sep 5, 2015 at 5:50 AM, Greg Stark via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> HI think I have a fairly nicely integrated Libfuzzer based fuzzer in
> Postgres now. I can run things like:
>
> SELECT fuzz(100000,'select regexp_matches(''foo/bar/baz'',$1,''g'')')
>
> Which makes it convenient to fuzz arbitrary public functions available
> in SQL. (I haven't figured out what interface to make for fuzzing
> internal functions which take char buffers that can have nuls. The SQL
> interface will only be able to handle valid utf8 encoded strings which
> contain no nuls.)
>
>
> I have some feedback of things that are a bit awkard or that I miss
> from AFL. Some of this may actually be there but I'm just not using it
> right?
>
> 1) One minor things, it's a bit of a pain to construct the argv when
> you're not invoking it on the command line.


So, you want the code Fuzzer::FuzzingOptions (FuzzerInternal.h) to be
accessible to a user?
I thought about it and may do it one day.
File a bug if you want to track it.
(Not my first priority though).


> Not a big deal but it
> would be nice to bypass that and just allow the caller to set the
> variables directly. Some of the parameters are not entirely clear
> either -- I'm not clear what the distinction is between -runs and
> -iterations


-iterations is an artifact from the past.
removed in 247030.


> and I'm not clear whether the timeout is for the whole run
> or individual tests


 individual tests

(it's not doing anything in my case which is
> probably due to Postgres having its own ALRM handler).
>

Yea, probably.


>
> 2) I've caught a bug which causes enormous stack growth. AFL enforces
> strict memory and time limits on the tests which is awfully
> convenient. I can implement those myself in my fuzzer function (and in
> fact in the Postgres context it would be best if I did) but some
> simple AFL-style protection would be appreciated, as it is it takes a
> *looong* time to fail and doesn't leave behind the crashing test.


That's strange.
Most likely this is the same problem as above: Postgres redefines the ALRM
handler?
libFuzzer should be able to detect a long running test and report it.


> It
> would be nice if Libfuzzer took a page out of the sanitize code's
> tricks and kept an mmapped file where it wrote the current test being
> run. If the file is never synced then it shouldn't cause any syscalls
> or I/O until the program crashes and the file descriptor is closed.
>
Let's resolve the above problem first, maybe this will not be needed.

>
> My thinking is I need to set an RLIMIT_STACK setting and then install
> a SEGV handler which will longjmp back to the top level and return to
> the fuzzer. That will be risky since it's in theory impossible to
> restore any state the SEGV caused but in practice if it's always
> caused by a stack overflow might be safe. I would also like to have an
> ALRM handler but that requires calling alarm() on every call and I'm
> not sure if the setitimer in Libfuzzer can be disabled or if it'll
> interfere with that. Maybe there's a better approach, I could call
> setitimer and if I see more than n ALRMs during the execution decide
> it's a fault. Again it would be nice if Libfuzzer provided that
> itself.
>

I am confused.  Libfuzzer does set an alarm.

>
> 3) When it writes the minimal test corpus it seems to keep older tests
> around too. I guess the intent is to pass two directories,


correct.
If you want to minimize the corpus do it like this:
./fuzzer NEW_EMPTY_DIR OLD_CORPUS
The docs were a bit vague, I've tried to improve them in 247033..

I rarely use this option myself because libFuzzer does corpus minimization
at startup.
It may still be useful if you want e.g. to commit the corpus to test
repository
or to share the corpus with other fuzzers.

one which
> starts empty and is intended to receive the results and one which is
> maintained as the working tree? I'm not sure how to use this mode.
>
> 4) The actually fuzzing seems to be less effective than AFL at finding
> good cases.


That's not entirely unexpected, AFL is extremely algorithmically advanced.
We are trying to catch up :)

Note that I've just added support for AFL-style dictionaries,
which may help in your case.
http://llvm.org/docs/LibFuzzer.html#dictionaries


> In particular I've found I have to use only_ascii mode or
> else it spends all the time looking at encoding errors on random
> binary inputs. Even in only_ascii mode it seems insistent on putting a
> ^L in a *lot* of tests even when the function being tested always ends
> with the same error if one is present.
>

Hmm.. I simply rely on isspace/isprint
I may of course change it to not emit ^L in ascii mode,
but another way for you is to replace ^L with a space in your target
function.


>
> I'm hoping to try DFA mode and hoping it will help with this but all
> the "experimental" warnings in the docs scare me. Is it just that
> there's room for improvement or is there any downside to running in
> that mode?
>

You mean, the data flow feedback mode enabled by -use_traces=1 (and
-fsanitize-coverage=trace-cmp)?
This is really a prototypish thing so far.
I've seen several cases in the wild where it breaks though a wall (where
the regular mode does not find new coverage for days),
but it's not anywhere near to be complete.
By all means, try it as one of the strategies, but don't solely rely on it.


>
> Another thing I'm not clear whether it's not implemented yet or
> there's just no feedback yet is the test for variable coverage. AFL
> runs the same test repeatedly to test whether the coverage is
> repeatable which can be an important thing to know whether your
> testing is actually well implemented or whether you're failing to
> clean up state sufficiently between runs.
>

Hmm.. Interesting. I don't think we have anything to check
if the target function produces stable coverage.
You probably can run the fuzzer 100 times with -runs=0 -seed=1 and see if
it produces the same INITED coverage.
Is that what you need?



>
> 5) I'm currently running 1M iterations per call then calling it again
> (in a new process). It would be convenient if I could call it again in
> the same process and in fact it would be most convenient if I could
> make my code call the fuzzer repeatedly for, say, 1k invocations. I
> could check for C-c once ever 1k calls and do any other cleanup,
> checking for memory leaks, etc at that time.
>

I'll need more explanations here.


>
> It would also be nice to be able to ask for the minimal corpus back in
> memory along with meta information like coverage, runtime, etc so I
> could, say, store them in the database :)
>

All this is doable, just not my priority for now.
If you come up with a simple patch -- you are more than welcome.

(For large/complex patches now is not the best time though)


> 6) The crashing and slow tests are written to the current directory.
> It would be nice to be able to provide a directory for them to go
> into.


In my todo, file a separate bug if you want to track it.


> Also, it would be nice to provide a callback or some other way
> to override this. I could generate the whole SQL reproduction instead
> of just having the binary data to pass and have to remember what
> function I was testing.
>
A bit more involved, but doable, of course.
You can probably also do it on your side and let us know how it works.



> In general the feedback is a bit unclear. It seems to print binary
> strings in several different escape styles, sometimes using \x (though
> it's not clear how many hex digits follow) sometimes using 0x and
> sometimes using base64:
>
> #755 NEW    cov: 14667 bits: 476 units: 6 exec/s: 20 L: 4 \xa\x5\xcb*
> 0xa,0x5,0xcb,0x2a,
>
> Test unit written to crash-b0f4bc53c8f72fd53ef0a6c1f46115bd7bd8fe50
> Base64:
> IZqM9rA71To7KDonOlb8pCEoJ3Mn2sAnO1I3XwYoITtxO0exSjwo7u4nKZ8hnilHeQo6GDshTI4pKipWLa8KXg==
>
> Of these only the base64 is convenient for writing reproductions
> (though a callback would be most convenient) but it's not so
> convenient for watching the progress. And for many lines it seems to
> print no test data which is definitely not helpful for watching
> progress:
>
> #2028804 NEW    cov: 15330 bits: 6511 units: 127 exec/s: 5880 L: 39
> #2045447 NEW    cov: 15330 bits: 6512 units: 128 exec/s: 5877 L: 47
>


That's a trade of between more output and fewer output.
You can watch the corpus itself if you want to see all cases.

The output like "0xa,0x5,0xcb,0x2a," is useful if you want to paste this
data back to a C program as a char array.
base64 is the simplest way to have a file repro.
Escaped text is printed only for very small units, just FYI.
Making things nicer in in my TODO, but someone will always dislike the
style
and I really don't want to spend time (at this point) making the output
more customizable.


> Also, all this feedback is currently going into the server log. I
> would like to capture it and report it to the client. I'm currently
> basically just doing my own progress feedback this way but it's
> missing the information about coverage and number of units found. It
> can only show the number of tests done and things like memory usage
> etc.
>
> 7) If I open up the corpus files in emacs and accidentally hit any key
> then emacs saves an autosave file but then deletes it when I undo the
> accidental edit -- which causes Libfuzzer to pretty much immediately
> crash with:
>
> Can not stat: /var/tmp/corpus/.#16813d894b330e26fdf4520793501dfffc830eb9;
> exiting
>

I've seen a similar problem too (not with emacs).
I'll try to fix it.
Again, file a bug if you want to track this.

>
> I would suggest ignoring auto-save and backup files (.#* and *~) but
> in any case this doesn't seem like it should be a fatal error. Just
> warn about the disappearing file and move on to the next one.
>
> --
> greg
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/5d163753/attachment-0001.html>


More information about the llvm-dev mailing list