[cfe-dev] Introducing clang-triage: A bot to test clang with fuzzed inputs

Mon Jan 5 16:36:32 PST 2015

On Mon, Jan 5, 2015 at 4:10 PM, Sami Liedes <sami.liedes at iki.fi> wrote:

> On Mon, Jan 05, 2015 at 11:11:06AM -0700, John Regehr wrote:
> > Sami, this is very cool.  Here are a few suggestions/comments/requests:
>
> Thanks!
>
> > Are you adding these into the LLVM bugzilla?
>
> I added many in December:
> http://llvm.org/bugs/buglist.cgi?quicksearch=fuzz&list_id=65362
>
> But no, I have not added all.
>
> > A lot of your crashers are syntactically invalid, but it often is the
> case
> > that crashes on valid code are more interesting than crashes on invalid
> > code.  Can you separate out the ones for which you have a valid test
> case?
> > These might just be the ones that GCC can compile.
>
> I suspect that not very many cases produced are going to be valid
> code, or at least at this stage parser bugs are clearly prevalent.
> Maybe it will be possible to reach deeper bugs once the parser is more
> robust, I don't know. Recent releases of afl-fuzz even include some
> kind of support for extra dictionaries, i.e. making the fuzzer insert
> whole tokens like "template" or "typename", which presumably might
> increase its chances of producing something that looks like more
> legitimate code.
>
> Basically the fuzzer treats the source code just like any binary, and
> the results generally are a mess of text and binary garbage. I'm
> actually surprised that it manages to find any crashes that after
> reducing do not contain binary. I think the more interesting crashing
> cases are mostly produced by randomly combining (crossover) different
> clang test cases and cases discovered by fuzzing and combining that
> trigger new execution paths. I think if GCC accepted those it would be
> mostly by chance, unless it's a lot more tolerant towards binary
> garbage than Clang.
>
> After reducing the cases often look somewhat more sensible, but
> truncated, since with incremental parsing the smallest crashing input
> is often going to end in the middle of some block. GCC clearly would
> not compile those either.
>
> In the very least I believe it's safe to say the fuzzer is heavily
> biased towards finding parser bugs as long as they are relatively
> common. Whether it can go deeper than that is to be seen.
>
> > Your mail mentions C-Reduce crashing.  Please report bugs to us when this
> > happens.
>
> I dumped those cases where creduce has failed, together with the
> dumb-reduced cases causing the same crash on Clang (some since then
> fixed, I think) and the assertion failures they caused. I used CReduce
> with llvm 3.5. With CReduce using clang, I don't find it that
> surprising that it fails on some of the inputs that also cause Clang
> to crash.
>
>     http://sli.dy.fi/~sliedes/creduce-failures.tar.gz
>
> > If you have time, it would be helpful to add Csmith into the mix here.  I
> > would be very interested to learn how Csmith and AFL work together.
>
> A bit hard to see how they would work together. IIUC Csmith tries hard
> to produce code that is also a semantically valid C program; afl-fuzz
> tries to mangle the input to maximize code coverage in the compiler
> and knows nothing about code correctness. Anyway, I believe there's
> still likely to be a lot of bugs to be found starting from the clang
> test cases; together they contain large amounts of quite different,
> even exotic constructs. I've only started to run afl-fuzz; it doesn't
> get far through its work queue (I think less than 1%, and the queue
> obviously grows) until it hits its limit of 5000 crashing cases found
> - though it still takes a few days to get that far; running clang that
> many times is resource intensive. Obviously that's the low-hanging,
> rather easy-to-find fruit.
>
> > You should definitely turn on LLVM optimizations when doing this testing,
> > unless there's some specific reason why that isn't a good idea.
>
> I think it will be a good idea once it no longer hits that many parser
> bugs. The drawback is that with -O0 I can do something like 300
> executions/second on a 4-core machine, and with that speed I guess it
> would already take at least months to go through afl-fuzz's queue
> starting from the clang test suite.
>

I'm wondering how much we can improve on that 300 executions/second. My
guess is that a lot of time is constant-overhead startup code. A back of
the envelope calculation:

300 executions/second * 300 bytes/source file (small files) ~ 100 000
bytes/second.
4 cores * 3 giga instructions/second ~ 10 000 000 000 instructions/second.

So that's about 1 million instructions per byte, which seems excessive.

The first thing that comes to mind to hack around that would be write a
quick tool that uses clang as a library and have afl-fuzz just send it IPC
messages asking it to parse files; the server then forks off a child to
parse, avoiding all the startup overhead and option parsing and stuff
inside clang.

-- Sean Silva

>
>         Sami
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150105/ce9a7143/attachment.html>