[cfe-dev] Introducing clang-triage: A bot to test clang with fuzzed inputs

Mon Jan 5 16:10:28 PST 2015

On Mon, Jan 05, 2015 at 11:11:06AM -0700, John Regehr wrote:
> Sami, this is very cool.  Here are a few suggestions/comments/requests:

Thanks!

> Are you adding these into the LLVM bugzilla?

I added many in December:
http://llvm.org/bugs/buglist.cgi?quicksearch=fuzz&list_id=65362

But no, I have not added all.

> A lot of your crashers are syntactically invalid, but it often is the case
> that crashes on valid code are more interesting than crashes on invalid
> code.  Can you separate out the ones for which you have a valid test case?
> These might just be the ones that GCC can compile.

I suspect that not very many cases produced are going to be valid
code, or at least at this stage parser bugs are clearly prevalent.
Maybe it will be possible to reach deeper bugs once the parser is more
robust, I don't know. Recent releases of afl-fuzz even include some
kind of support for extra dictionaries, i.e. making the fuzzer insert
whole tokens like "template" or "typename", which presumably might
increase its chances of producing something that looks like more
legitimate code.

Basically the fuzzer treats the source code just like any binary, and
the results generally are a mess of text and binary garbage. I'm
actually surprised that it manages to find any crashes that after
reducing do not contain binary. I think the more interesting crashing
cases are mostly produced by randomly combining (crossover) different
clang test cases and cases discovered by fuzzing and combining that
trigger new execution paths. I think if GCC accepted those it would be
mostly by chance, unless it's a lot more tolerant towards binary
garbage than Clang.

After reducing the cases often look somewhat more sensible, but
truncated, since with incremental parsing the smallest crashing input
is often going to end in the middle of some block. GCC clearly would
not compile those either.

In the very least I believe it's safe to say the fuzzer is heavily
biased towards finding parser bugs as long as they are relatively
common. Whether it can go deeper than that is to be seen.

> Your mail mentions C-Reduce crashing.  Please report bugs to us when this
> happens.

I dumped those cases where creduce has failed, together with the
dumb-reduced cases causing the same crash on Clang (some since then
fixed, I think) and the assertion failures they caused. I used CReduce
with llvm 3.5. With CReduce using clang, I don't find it that
surprising that it fails on some of the inputs that also cause Clang
to crash.

    http://sli.dy.fi/~sliedes/creduce-failures.tar.gz

> If you have time, it would be helpful to add Csmith into the mix here.  I
> would be very interested to learn how Csmith and AFL work together.

A bit hard to see how they would work together. IIUC Csmith tries hard
to produce code that is also a semantically valid C program; afl-fuzz
tries to mangle the input to maximize code coverage in the compiler
and knows nothing about code correctness. Anyway, I believe there's
still likely to be a lot of bugs to be found starting from the clang
test cases; together they contain large amounts of quite different,
even exotic constructs. I've only started to run afl-fuzz; it doesn't
get far through its work queue (I think less than 1%, and the queue
obviously grows) until it hits its limit of 5000 crashing cases found
- though it still takes a few days to get that far; running clang that
many times is resource intensive. Obviously that's the low-hanging,
rather easy-to-find fruit.

> You should definitely turn on LLVM optimizations when doing this testing,
> unless there's some specific reason why that isn't a good idea.

I think it will be a good idea once it no longer hits that many parser
bugs. The drawback is that with -O0 I can do something like 300
executions/second on a 4-core machine, and with that speed I guess it
would already take at least months to go through afl-fuzz's queue
starting from the clang test suite.

	Sami
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150106/6162b18e/attachment.sig>